Biological Databases and Informatics Grant DBI-0317483
Datasets covering large spatial and temporal ranges have been key to the development of knowledge in many areas of ecology. For example, the International Biological Programme of the 1960s allowed ecologists and evolutionary biologists to test, for the first time, questions of convergent evolution of community structure at an intercontinental scale (McIntosh 1985). Many years of data are often needed to understand the effects of environmental degradation (Hawkins et al. 2002), and the interactions of organisms with their abiotic environments (Haddad et al. 2002).
Repeat sampling of communities at sites over many years is widely recognized as an important source of information for use in analyzing a variety of ecological phenomena (e.g. Fitter and Fitter 2002, Peñuelas and Filella 2002, Walther et al. 2002). Scientists use these data to: 1) learn about temporal and spatial variability of community composition; 2) track long-term trends in community composition and phenology, and investigate their causes; 3) determine which sampling protocols are most efficient at capturing local species diversity. Managers also use this type of information and analysis in developing resource management plans (e.g., records of fish populations over many years have been used to judge the effectiveness of marine reserves (Russ and Alcala 1996)).
There is a critical need for data management techniques and tools associated with long-term biological projects and datasets. The need for new approaches for handling increasingly data-rich fields of study and for integrating disparate studies when looking for patterns associated with species diversity and climate change has been recognized by a broad spectrum of scientists (e.g. ESA Long Term Ecological Research1, National Biological Information Infrastructure2, the World Data Center3). Although there has been demand for (and analysis of) important long-term datasets in consistent formats, (Barry et al 1995, Donovan and Flather 2002, Parmesan and Yohe 2003, Root et al. 2003), internet datasets are generally too spotty and variable in format and content to be widely useful for analysis, integrated research and land management guidance. However, large biological datasets, such as those documenting the spread of invasive species4, or the Breeding Bird Survey5, are increasingly being posted on the web as a way of making the data widely available for analysis, integrated research and land management guidance. The practice of web-publication of data creates a need for standardized data formats (metadata standards) so that researchers may correctly identify how to use any given dataset. With the proper metadata specifications the potential to integrate large amounts of data for synthetic analysis from internet sources can become a reality.