Newsstand Menu

CSHL is part of international team that sequences the ‘chocolate’ genome

cocoa pods
Pods containing cocoa beans, from which chocolate is made. The genome sequence of the cacao tree is providing information that may make this treat even tastier.  Photo by Medicaster, courtesy Wikipedia Commons.

T. cacao, source of world’s finest chocolate, reveals some of its genetic secrets

Cold Spring Harbor, NY — An international team that includes scientists at Cold Spring Harbor Laboratory (CSHL) has succeeded in producing a draft genome of the cacao tree variety whose beans yield what most experts consider the world’s finest chocolate.

CSHL Professor W. Richard McCombie, a pioneer in genome sequencing who helped assemble the very first plant genome, that of the mustard plant Arabidopsis thaliana, as well as the genomes of rice and maize, took part in the cacao genome sequencing effort, results of which have been published in the journal Nature Genetics. CSHL has benefited from a National Science Foundation grant made specifically to facilitate work on complex projects of this type, particularly a grant to purchase state-of-the-art instrumentation for sequencing, McCombie noted.

“These new instruments are allowing us to totally change the way we approach complex genomes,” he said. “The more technology advances and we learn better to use it, the faster you will see advances like this.”

The team selected a cacao tree variety called Theobroma cacao, or Belizean Criollo, that was domesticated an estimated 3000 years ago by the Maya people. Today, many commercial growers prefer to grow hybrid cacao trees that produce chocolate of lower quality but are more resistant to disease. Now that scientists have the full genome of T. cacao, they will be able to identify and incorporate genetic elements specific to this prize variety in other varieties, to improve their quality in various ways.

For instance, the team has identified genes that account for various aspects of texture and flavor of beans produced by T. cacao: those influencing the production of flavonoids, natural antioxidants and terpenoids, hormones, pigments and aromas. Altering the genes for these chemicals might produce chocolate with better flavors, aromas and even healthier chocolate. The team also found two types of disease resistance genes, which may prove useful.

Altogether, the scientists identified 28,798 genes in T. cacao that code for proteins. They assigned 88 percent or 23,529 of these protein-coding genes to one of the 10 chromosomes in the Criollo cacao tree. Scientifically, perhaps the most interesting fact to emerge from the sequencing work, which was accomplished in a very short period of time due to new technology, is that only about one-fifth of the cacao genome is composed of transposons, or mobile genetic elements. Called “jumping genes” when they were first observed by CSHL Nobel laureate Barbara McClintock in the 1940s, transposons are bits of genetic sequence that detach from one site in the genome and move, unpredictably, to another. The paucity of transposons in cacao relative to many other plants suggests the plant has evolved more slowly.

The cacao sequencing project was co-led by Claire Lanaud of CIRAD (an acronym for Agronomy Research for Development), in France, and Mark Guiltinan of Pennsylvania State University, and included scientists from 17 other institutions in addition to Cold Spring Harbor Laboratory.

Written by: Peter Tarr, Senior Science Writer | | 516-367-8455


“The genome of Theobroma cacao” was published online in Nature Genetics on December 26. The paper can be read at:

Stay informed

Sign up for our newsletter to get the latest discoveries, upcoming events, videos, podcasts, and a news roundup delivered straight to your inbox every month.

  Newsletter Signup

About Cold Spring Harbor Laboratory

Founded in 1890, Cold Spring Harbor Laboratory has shaped contemporary biomedical research and education with programs in cancer, neuroscience, plant biology and quantitative biology. Home to eight Nobel Prize winners, the private, not-for-profit Laboratory employs 1,100 people including 600 scientists, students and technicians. The Meetings & Courses Program annually hosts more than 12,000 scientists. The Laboratory’s education arm also includes an academic publishing house, a graduate school and the DNA Learning Center with programs for middle and high school students and teachers. For more information, visit