Cold Spring Harbor, NY — A rose by any other name would smell as sweet—but it might confound scientists interested in understanding the chemical components of its fragrance or discovering where its ancestors grew in the wild.
That’s because in biology, an organism’s scientific (taxonomic) name is the key to finding information about it. This data—on the genetic, ecological, and agricultural particulars of every known plant—is held in repositories scattered all over the globe, at places as diverse as university labs, museums, and private-sector corporations. Some of the information is hidden within spreadsheets stored on the computers of individual plant scientists.
There is, in other words, lots of room for confusion—resulting from multiple listings (under different names) of the same species.
Enter a web-based resource: the Taxonomic Name Resolution Service, or TNRS (tnrs.iplantcollaborative.org). Today, the third and most complete version of TNRS to date went live on the Web, the work of computer scientists, botanists, and biologists participating in an National Science Foundation (NSF)-funded project called the iPlant Collaborative, in conjunction with Missouri Botanical Gardens and Botanical Information and Ecology Networks (BIEN).
It turns out that up to 30% of the names in major biological databases are incorrect in some way, according to TNRS scientists. Error rates that high greatly reduce confidence in the accuracy of science and limit the ability of the public and business to discover and utilize information about plants.
iPlant—a virtual collaborative co-led by Doreen Ware, Ph.D., of the U.S. Department of Agriculture’s Agricultural Research Service and an Adjunct Associate Professor at Cold Spring Harbor Laboratory (CSHL)—has made great progress in solving the problem. The latest version of TNRS resolves plant taxonomic names—often lists containing thousands of names—by passing them through a process of exact matching, parsing to break names into their component parts and “fuzzy matching” to search for near matches.
Key work on the TNRS was performed by Zhenyuan Lu and Sheldon McKay of CSHL, as well as by Brian Enquist and Brad Boyle from BIEN and the University of Arizona and Bill Piel from Yale’s Peabody Museum. “TNRS is a critical tool to help plant scientists integrate data from diverse sources in virtually every field of plant research,” says Lu.
Beginning with Linnaeus
In 1753, Carl Linnaeus, a Swedish botanist and zoologist, published Species Plantarum, which introduced a Latin-based naming system to the world and laid foundations for how subsequent scientists made sense of the immense diversity of life on earth. The Linnaean system is still in operation today, the basis for communication among ecologists studying tropical diversity, crop scientists searching for means of optimizing yields, and so-called systematists who strive to chart the Tree of Life.
Yet for many types of research the first crucial step is to resolve any differences among the taxonomic names of the plants being studied. Consider the tomato, a plant with a troubled taxonomic past. Originally named Solanum lycopersicum by Linnaeus, the tomato was soon transferred to the genus Lycopersicon, and long referred to as both Lycopersicon lycopersicum and Lycopersicon esculentum. Recent DNA research has shown that the tomato indeed belongs in Solanum, meaning that Linnaeus’ original name must be restored. Anyone conducting research on the tomato must search for all three names to access the complete data and previous research associated with this economically important species.
The most important feature of TNRS version 3.0 is the ability to hierarchically resolve names against multiple taxonomic sources. Four taxonomic sources are now available: Tropicos®, The National Center for Biotechnology Information’s (NCBI) Taxonomy Database, The United States Department of Agriculture’s (USDA) Plants Database, and The Global Compositae Checklist.
With the addition of the new taxonomic name sources, the TNRS has expanded the geographic range of plant species names it can resolve far beyond the Americas. The plant species available for comparison will continue to grow as the botany community contributes additional sources of names.
Members of the botany community are invited to contact iPlant about contributing their taxonomic sources to the TNRS. The TNRS source code has been released with an open source license and developers are encouraged to expand it to resolve taxonomic names of other groups of organisms.
Web destinations relevant to this release:
TNRS v. 3.0: http://tnrs.iplantcollaborative.org
Botanical Information and Ecology Network: http://bien.nceas.ucsb.edu/bien/
Missouri Botanical Gardens: http://www.missouribotanicalgarden.org/
About the iPlant Collaborative
iPlant is a virtual organization led from the University of Arizona (UA), the University of Texas at Austin, Cold Spring Harbor Laboratory (CSHL) in New York, with participants from institutions around the country and the world. iPlant is funded by a grant from the National Science Foundation and the UA part of the organization operates as a part of the BIO5 Institute at the UA.
iPlant collaborated with Brian Enquist and Brad Boyle from BIEN and the University of Arizona, Zhenyuan Lu and Sheldon McKay from Cold Spring Harbor Laboratory, and Bill Piel from Yale’s Peabody Museum to address the name resolution problem. The Missouri Botanical Garden provided vital access to the contents of the Tropicos® database during early development. The TNRS builds on the work of Dmitry Mozzherin of the Marine Biological Laboratory at Woods Hole MA, whose name parser from the Global Names project was modified to parse submitted names into constituent parts for the matching process, and Tony Rees of the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Australia, whose TaxaMatch algorithm was adapted to perform fuzzy matching of erroneous names. iPlant and the Global Names initiative, also funded by NSF, are collaborating to tackle the remaining challenges in taxonomic name standardization.
Founded in 1890, Cold Spring Harbor Laboratory has shaped contemporary biomedical research and education with programs in cancer, neuroscience, plant biology and quantitative biology. Home to eight Nobel Prize winners, the private, not-for-profit Laboratory employs 1,100 people including 600 scientists, students and technicians. The Meetings & Courses Program annually hosts more than 12,000 scientists. The Laboratory’s education arm also includes an academic publishing house, a graduate school and the DNA Learning Center with programs for middle and high school students and teachers. For more information, visit www.cshl.edu