Junk. Barren. Non-functioning. Dark matter. That’s how scientists had described the 98% of human genome that lies between our 21,000 genes, ever since our DNA was first sequenced about a decade ago. The disappointment in those descriptors was intentional and palpable.
It had been believed that the human genome — the underpinnings of the blueprint for the talking, empire-building, socially evolved species that we are — would be stuffed with sophisticated genes, coding for critical proteins of unparalleled complexity. But when all was said and done, and the Human Genome Project finally determined the entire sequence of our DNA in 2001, researchers found that the 3 billion base pairs that comprised our mere 21,000 genes made up a paltry 2% of the entire genome. The rest, geneticists acknowledged with unconcealed embarrassment, was an apparent biological wasteland.
But it turns out they were wrong. In an impressive series of more than 30 papers published in several journals, including Nature, Genome Research, Genome Biology, Science and Cell, scientists now report that these vast stretches of seeming “junk” DNA are actually the seat of crucial gene-controlling activity — changes that contribute to hundreds of common diseases. The new data come from the Encyclopedia of DNA Elements project, or ENCODE, a $123 million endeavor begun by the National Human Genome Research Institute (NHGRI) in 2003, which includes 442 scientists in 32 labs around the world.
(MORE: Decoding Cancer: Scientists Release 520 Tumor Genomes from Pediatric Patients)
ENCODE has revealed that some 80% of the human genome is biochemically active. “What is remarkable is how much of [the genome] is doing at least something. It has changed my perception of the genome,” says Ewan Birney, ENCODE’s lead analysis coordinator from the European Bioinformatics Institute.
Rather than being inert, the portions of DNA that do not code for genes contain about 4 million so-called gene switches, transcription factors that control when our genes turn on and off and how much protein they make, not only affecting all the cells and organs in our body, but doing so at different points in our lifetime. Somewhere amidst that 80% of DNA, for example, lie the instructions that coax an uncommitted cell in a growing embryo to form a brain neuron, or direct a cell in the pancreas to churn out insulin after a meal, or guide a skin cell to bud off and replace a predecessor that has sloughed off.
“What we learned from ENCODE is how complicated the human genome is, and the incredible choreography that is going on with the immense number of switches that are choreographing how genes are used,” Eric Green, director of NHGRI, told reporters during a teleconference discussing the findings. “We are starting to answer fundamental questions like what are the working parts of the human genome, the parts list of the human genome and what those parts do.”
(MORE: Why Genetic Tests Don’t Help Doctors Predict Your Risk of Disease)
If the Human Genome Project established the letters of the human genome, ENCODE is providing the narrative of the genetic novel by fashioning strings of DNA into meaningful molecular words that together tell the story not just of how we become who we are, but how we get sick as well.
Ever since the human genome was mapped, scientists have been mining it for clues to the genetic triggers and ultimately the treatments for a variety of diseases — heart disease, diabetes, schizophrenia, autism, to name just a few. But hundreds of so-called genome-wide association studies (GWAS) that have compared the DNA of healthy individuals to those with specific diseases revealed that the relevant changes in DNA were occurring not in the genes themselves, but in the non-coding genetic black holes. Until now, researchers didn’t fully understand what these non-coding regions did; if variations in these areas were not part of a known gene, they couldn’t tell what impact, if any, the genetic change had.
ENCODE, which provides a map of those genetic switches, will now allow scientists to determine what exactly those variants do; it’s likely that their function in regulating and controlling key genes can now be traced and studied — and hopefully manipulated to treat whatever disease they contribute to. “We need to revisit the interpretation of those studies,” Dr. John Stamatoyannopoulos, associate professor medicine and genome sciences at University of Washington, said during the teleconference. “In many cases those studies concluded that 10 or 15 variants might be important for a particular disease. ENCODE data points to the fact that this is probably a significant underestimate, that there may be dozens, even hundreds of variants landing in switches so there is a tremendous amount of information still hidden within those studies that needs to be reanalyzed in the context of the new data.”
(MORE: Older Fathers’ Linked to Kids’ Autism, Schizophrenia Risk)
Eager to put their new found scientific knowledge to work, scientists have already begun some of those studies. At Washington University, Stamatoyannopoulos and his colleagues found that gene changes identified by GWAS as involved in 17 different types of cancer seem to affect nearly two dozen transcription factors that translate raw DNA into the RNA that turns into functional proteins. This common molecular thread may lead to new treatments that control the function of these transcription factors in not just one but all 17 cancers, including ovarian, colon and breast diseases. “This indicates that many cancers may have a shared underlying genetic predisposition,” he told reporters. “So we can make connections between diseases and genome control circuitry to understand relationships where previously there was no evidence of any connection between the diseases.”
ENCODE may shed significant light on our most common chronic diseases, including diabetes, heart disease and hypertension, which result from a complex recipe of dysfunction, not just in single genes like, but in a variety of hormones, enzymes and other metabolic factors. Changes in the way some genes are turned on or off may explain the bulk of these conditions, and ultimately make them more treatable. “By and large, we believe rare diseases may be caused by mutations in the protein [or gene-]coding region,” says Green, while the “more common, complicated diseases may be traced to genetic changes in the switches.”
In another example of ENCODE’s power, Birney says the genetic encyclopedia has also identified a new family of regulators that affect Crohn’s disease, an autoimmune disorder that causes the body’s immune cells to turn on intestinal cells. The finding could lead to novel, potentially more effective therapies. “I’ve had more clinical researchers come to my door in the past two years than in the previous 10,” Birney said. “It’s going to be really good fun producing lots of insights into disease over the next couple of years.”
(MORE: Scientist Creates Life. That’s a Good Thing, Right?)
Not only does ENCODE open doors to new therapies, it also furthers our basic understanding of human development. At the heart of many genetic researchers’ investigations is the desire to understand how each cell in our body, from those that make up our hair to those that reside in our toenails, can contain our entire genome yet still manage to look and function in such widely divergent ways. ENCODE’s scientists knew that certain regulatory mechanisms dictated when and where certain genes were expressed and in what amount in order to give rise to the diversity of cells and tissues that make up the human body, but even they were surprised by just how intricate the choreography turned out to be. “Most people are surprised that there is more DNA encoding regulatory control elements, or switch elements for genes, than for the genes themselves,” Michael Snyder, director of the center for genomics and personalized medicine at Stanford University and a member of the ENCODE team, told Healthland.
In keeping with the open-access model established by the Human Genome Project, ENCODE’s data is available in its entirety to researchers for free on the consortium’s website. The database will undoubtedly fuel a renewed interest in genome-based approaches to both diagnosing and treating disease. Despite initial excitement in the field, in the years since the genome was mapped, gene-guided treatments and gene-therapy approaches to treating disease have proven difficult to bring to the clinic; part of the challenge, geneticists now say, may have been related to the fact that they didn’t fully understand how to control the genes that were affected by disease.
“I am pretty sure this is the science for this century,” Birney said. “We are going to work out how we make humans, starting from the simple instruction manual.” And perhaps we’ll figure out how to make humans healthier as well.
MORE: Synthetic Cell: The Top 10 Everything of 2010
Alice Park is a writer at TIME. Find her on Twitter at @aliceparkny. You can also continue the discussion on TIME’s Facebook page and on Twitter at @TIME.