Collaborative Research (CLEANER): Cyberinfrastructure Needs for a Model Environmental Field Facility in Baltimore, Maryland as Part of an Engineering Analysis Network

PIs: C. Welty (UMBC), Michael P. McGuire (UMBC), Michael Piasecki (Drexel University)

Funding Source: National Science Foundation

Project Objectives and Scope
The objectives of this work were to: (1) assess the current state of the Baltimore Ecosystem Study (BES) as a model Environmental Field Facility; (2) identify other initiatives or similar networks for comparison; and (3) formulate key characteristics that should be addressed for a successful development of an Environmental Field Facility (EFF) to be part of an Engineering Analysis Network (EAN) and the Collaborative Large-Scale Engineering Analysis Network for Environmental Research (CLEANER, http://cleaner.ncsa.uiuc.edu/home/)..

Process Used
We assessed the current state of the BES sensor network and cyberinfrastructure by examination of the BES web site and data archives, and by conducting interviews with the LTER network office, BES PIs, data managers, and field technicians. We obtained information from several manufacturers on nitrate sensor systems including telemetry for the purpose of pricing and design of a prototype sensor network. We reviewed a number of documents obtained from the LTER network office, among them the Ecological Metadata Language (EML) specifications, as well as metadata tools. We also contacted individuals at the LTER network office to learn about the network aspects of the sites and how the system is intended to function.

In the process of filling in our knowledge gaps on systems such as sensor hardware, visualization tools, data management options, and other elements of cyberinfrastructure, we discovered many new partners along the way who were eager to collaborate on various components of realizing the vision of an “end-to-end” system for the BES LTER application. This resulted in our writing of numerous grant proposals with a variety of partners, which also aided in refining our thinking for design of an end-to-end system for an environmental field facility.

Findings and Results
The BES is currently collecting field data to characterize geochemistry, biodiversity, climate and meteorology, demography, soil and stream conditions in the Gwynns Falls watershed using a mix of manual, semi-automated, and automated methods. As an example of manual data collection, long-term stream chemistry data are obtained via weekly site visits where field conditions are recorded and a water quality sample is taken by hand-dipping bottles into the stream. The stream samples are analyzed at IES and UMBC laboratories for a variety of constituents. Measured constituent values and field conditions are then manually entered into a Microsoft SQL Server database housed at IES. As an example of semi-automated data collection, soil temperature data are collected via data-logging sensors and the data are downloaded once every six weeks by a technician taking a laptop computer into the field. These data are then stored in Microsoft Excel format. As an example of an automated method, sensors on a meteorological station record hourly data and then transmit data to a PC at the BES field office via cellular modem, which is stored in Microsoft Excel files and copied to a server to provide data redundancy.

The BES uses a distributed data management system. Each PI in the BES is responsible for managing his/her own data. This results in a system where spatial and temporal data are stored in a variety of file formats at a number of locations. The Open Research System (ORS) serves as a data clearinghouse, and PIs are encouraged to submit their data into the searchable repository. With the exception of four of the eleven USGS stream gauges, none of the collected data is available for viewing in real-time. The BES also features access to metadata, a thematic categorization of other data holdings available for download, a display of data access policies (both network and site specific), a collection and summary of research publications, and summaries about objectives, goals, people, and partners. The site achieves minimum requirements for information and data access.

Our analysis of BES data and metadata handling suggests that the site could make the following improvements to either act as a node in a network of nodes (within a LTER) or as a node that can interoperate with external networks: (1) Move toward a central metadata standard, across all sites; (2) Move toward a the network-wide geodatabase design (right now it is site-specific); (3) Move toward solely network wide data access policies (currently each site controls its own data holdings, i.e. there are site-specific access policies in addition to network policies), (4) Move toward comparable and coherent data sets across sites; (5) Improve documentation of data (metadata), how to submit and to search, and publish a metadata profile. (6) Move toward a programmatic search that allows one to query the system for data holdings other than visual inspection of the portal.

The analysis supported by this CLEANER planning grant has led to a design of a prototype end-to-end system for data collection, storage, analysis, and visualization for the BES environmental field facility. An example design of a nitrate sensor network including telemetry would cost on the order of $200K plus ongoing technician salary for maintenance; many other constituents could also be automatically measured, with sensor prices being dependent on constituent-specific measurement technology. The envisioned data management system will house measurement data from external data sources, archived BES data sources, and the sensor network in an object-relational database system. The measurement data will be spatially referenced and connected to a Geodatabase (ESRI ArcSDE) with spatial layers such as digital elevation models, high-resolution imagery, landscape delineations, soils, geology, and hydrography. The combination of measurement and spatial data will form the operational database for the system. The design includes a data warehouse that will be built from the operational database. The star schema design of the data warehouse will include measurement data fact tables and a number of spatial and temporal dimension tables. Stored queries and materialized views will be used to create data cubes from the dimension tables. An OLAP server will then allow slicing, dicing, roll-up, and drill-down operations to be performed on the data cubes. Knowledge discovery applications will allow for multi-resolution exploratory data analysis and data mining. A number of applications will be based directly on data from the operational database. Input files for water quality models such as HSPF, SWMM, and SWAT will be based on results of queries on the operational database. Statistical analysis applications will also connect directly to the operational database. The design also includes data access applications such as real-time sensor network visualization, data discovery, and access to raw spatial and measurement data, as well as applications for data management such as QA/QC, data validation, and backup/recovery. The applications will be available to web clients and workstation clients in a spatiotemporal data analysis and visualization laboratory located at the BES site. Data and metadata will also be published to the CLEANER and LTER networks.

Recommendations Relevant to the CLEANER Planning Process
A version of the end-to-end design described above should be considered for each CLEANER EFF and this should be implemented across the network. It is of paramount importance that the CLEANER network (1) adopt a centrally controlled standard for data publication (format and metadata) and enforce it; (2) ensure that a node is part of a network by exposing data holdings such that they appear as one, i.e., to adopt a single design for an observational database, data model, and description model, etc.; (3) adopt consistent policies with regard to QA/QC, access control, use of software (OTS and freeware); (4) expose technologies such that a programmatic access can be achieved from a wide audience (e.g.. via web-services); (5) prepare an extensive set of documentation that addresses expectations, technologies used, user guides, and examples; and (6) maintain a means to allow a continued advancement, upgrade, and development of the information system.

Publications, presentations, and other outreach activities
Publications
Chen, Z., A. Gangopadhyay, G. Karabatis, M. McGuire, and C. Welty. Semantic Integration and Knowledge Discovery for Environmental Research, Special Issue of Journal of Database Management on "Defining, Eliciting and Using Data Semantics for Emerging Domains", accepted for publication, November 18, 2005.

Zhiyuan Chen, A. Gangopadhyay, S. Holden, G. Karabatis, M. McGuire. “Semantic Integration of Government Data for Water Quality Management.” Government Information Quarterly Symposium on Interorganizational Information Integration. (in review).

Presentations
Welty, C., M. McGuire, and M. Piasecki. “Assessment of the Baltimore Ecosystem Study as a Prototype Environmental Field Facility for an Engineering Analysis Network”, presented at the Annual Science Meeting of the Baltimore Ecosystem Study, October 21-22, 2004.

Welty, C., M. Piasecki and M. McGuire. "Cyberinfratructure Needs for a Model EFF in Baltimore as Part of an EAN." Presented at National Science Foundation, March 6, 2006. pdf of presentation

McGuire, M. and A. Gangopadhyay. “Modeling, Visualizing, and Mining Hydrological Spatial Hierarchies for Water Quality Management ” Presented at ASPRS 2006 Annual Conference, Reno, Nevada, May 1 - 5, 2006.

McGuire, M. C. Welty, A. Gangopadhyay, G. Karabatis, Z. Chen, Designing an End-to-End System for Data Storage, Analysis, and Visualization for an Urban Environmental Observatory. Presented at the 2006 Joint Assembly the American Geophysical Union, May 23-26, Eos Trans. AGU, 87(36), Jt. Assem. Suppl., Abstract : H43C-23.

Welty, C. “An End-to-End Vision for Sensor Networks and Cyberinfrastructure in the Baltimore Ecosystem Study”, presented at the quarterly meeting of the Baltimore Ecosytem Study, June 27, 2006.

Welty, C. “Design of an Environmental Observatory for Urban Water Management”, presented at “Cities of the Future: Creating Blue Water in Green Cities”, The Johnson Foundation Wingspread Conference Center, July 12-14, 2006.

Other Outreach
The Baltimore Ecosytem Study holds quarterly science meetings for which a thematic topic is chosen as a focal point to generate discussion. For the summer quarterly meeting held on June 27, 2006, we co-convened with Katalin Szlavecz (Johns Hopkins U.) a meeting focused on sensors and cyberinfrastructure. The meeting program can be found here.

C. Welty is a member of the CLEANER Science Committee and has written contributions related to urban EFFs for this committee’s science plan. M. Piasecki is a member of the CLEANER CI Advisory Committee and has contributed substantially to the committee's CI Report. Relevant findings of the BES study as well as his knowledge about the CUAHSI HIS CI developments have been incorporated into this report.

Outgrowth Research Proposals Submitted for Funding
The planning grant has led to development of a roadmap for building an end-to-end system for collection, analysis, and visualization of environmental data using the Baltimore Ecosystem Study as the domain application. In order to implement a the system, partnership with individuals from many disciplines is required. Toward this end, we have participated in developing the following research proposals during the course of this project:

NSF IGERT: Water in the Urban Environment (UMBC, IES, USFS, Howard U.) (funded)

NSF/Engineering Research Center (ERC) on Mid-Infrared Technologies for Health and the
Environment (MIRTHE) (Princeton, JHU, TAMU, Rice, UMBC, CUNY) (funded)

USGS/NBII and National Park Service, Center for Urban Ecology: An Integrated Spatiotemporal Data
Warehouse for Knowledge Discovery in Environmental Data (UMBC) (funded)

NASA: A Water Cycle Solutions Network (George Mason U, NASA Goddard, UC Irvine, U Arizona,
UMBC, MIT, UNH, Hydromet DSS) (funded)

NSF/Hydrologic Science/Env Eng: Quantifying Urban Groundwater in Environmental Field Facilities: A Missing Link in Understanding How the Built Environment Affects the Hydrologic Cycle (UMBC, Princeton, UNC, UVA, Temple, USFS, USGS) (funded)

NSF/SEI+II (BIO): Data Integration, Navigation, and Pattern Discovery in Environmental Research (UMBC, IES) (not funded)

NSF/CEO:P Collaborative Research: Demonstration of CyberIntegrator Technology for
Real-Time Community Modeling of Complex Environmental Systems (UIUC, U Iowa, UMBC, IES) (not funded)

NSF/CEO:P A Prototype System for Multi-Disciplinary Shared Cyberinfrastructure – Chesapeake
Bay Environmental Observatory (CBEO) (SDSC, Drexel, CRC, JHU, UMCES, Howard U.) (funded)

NSF/Hydrology/Env Eng: Demonstration and Development of a Test-Bed Digital Observatory for the Susquehanna River Basin and Chesapeake Bay (Drexel, Penn State, JHU) (funded)

NSF/GEO/IF: CUAHSI Hydrologic Information System (UT Austin, SDSC, Duke, Drexel, Utah State) (funded)

Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. 0414206.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.