Did you know that scientists have determined the complete DNA sequence of humans! Yes, it’s true, through an ambitious project called the Human Genome Project (HGP). Also, did you know that just like your fingerprint, you also have a DNA fingerprint that is unique to you! Want to know more about these concepts? Let’s find out.
Human Genome Project
The Human Genome Project (HGP) was a mega project launched in the year 1990. The advances in genetic engineering techniques have made this project possible. The aims of this project reveal the magnitude and the requirements of this project.
The human genome (i.e. the complete set of genes) has approximately 3 x 109 base pairs. If the cost of sequencing is US $ 3 per base pair, then the cost of the entire project would be approximately US $ 9 billion! Moreover, let’s say the sequencing data were to be stored in books. Then if each page has 1000 letters and each book has 1000 pages, we will need 3300 such books to store the genetic information from a single cell!
This large amount of data would also need computational devices with high speed to store, retrieve and analyze the data. Therefore, HGP aided the rapid development of another field in biology – Bioinformatics.
Goals Of HGP
- Identify all the genes in the human genome (approximately 20,000-25,000 genes).
- Provide a complete and accurate sequence of the 3 billion base pairs that make up the human genome.
- Store all the sequencing data in databases.
- Develop new tools to obtain and analyze data and make the information widely available.
- Necessitate technology transfer to other sectors like industries.
- Address the ethical, legal and social implications of the project on society.
HGP was a 13-year project, coordinated by the National Institute of Health (NIH) and the U.S. Department of Energy. It involved contributions from other countries too such as Japan, Germany, China, France etc. The benefits of this project are that it can lead to revolutionary new ways to diagnose, treat and prevent human diseases.
Besides the human genome, information about the genomes of non-human organisms can also be very helpful. We can understand their natural capabilities and apply them towards solving problems in human healthcare, agriculture, energy production etc. Therefore, scientists have also sequenced many non-human organisms such as bacteria, yeast, fruit fly, plants etc.
Methodologies Of HGP
HGP involved two major approaches:
- Expressed Sequence Tags (ESTs) – This approach focussed on identifying all genes expressed as RNA.
- Sequence Annotation – This blind approach involved sequencing the whole genome (coding and non-coding) and later assigning functions to the different regions.
DNA sequencing involves the following steps:
- First, total DNA is isolated from a cell and converted into random, small-size fragments since it is difficult to sequence long pieces of DNA. These fragments are then cloned into a suitable host (bacteria or yeast) using special vectors such as Bacterial Artificial Chromosomes (BAC) or Yeast Artificial Chromosomes (YAC). This amplifies each DNA fragment so that it can be sequenced easily.
- Next, the fragments are sequenced using automated DNA sequencers. These sequences work on the principle of Frederick Sanger’s method.
- Special computer-based programs are used to arrange and align the DNA sequences based on overlapping regions present in them.
- Subsequently, the sequences are annotated and assigned to each chromosome.
This is a time-consuming process. Therefore, the sequence of chromosome 1 (the last chromosome to be sequenced) was completed only in May 2006.
Salient Features of Human Genome
- There are 3164.7 million nucleotide bases in the human genome.
- An average gene has 3000 bases. However, sizes vary greatly, with the largest human gene being ‘dystrophin’ that has 2.4 million bases.
- The original estimate of the number of genes was 80,000 to 1,40,000 genes. However, HGP gave an estimate of about 30,000 genes. About 99.9% nucleotide bases are the same in all people.
- For over 50% of the discovered genes, the functions are unknown.
- Less than 2% of the genome codes for proteins.
- Repeated sequences form a large part of the human genome.
- Stretches of DNA sequences that are repeated many times (sometimes 100 to 1000 times) are repetitive sequences. Although they don’t code for proteins, they shed light on chromosome structure, evolution, and dynamics.
- Chromosome 1 has the most number of genes (2968), and chromosome Y has the least (231).
- HGP has identified 1.4 million locations with single base DNA differences in humans. This information will revolutionize the identification of disease-associated sequences and tracking of human history.
Applications of HGP and Future Challenges
The need to derive meaningful knowledge from genomic sequences and better understand biological systems will drive future research. This enormous task will require the coordinated effort of scientists from various fields.
A major impact of HGP is providing a radically new approach in biological research. Earlier, researchers studied one or a few genes at a time. Now, with new technologies and whole genome sequences, they can study all the genes in a genome i.e. all the transcripts in a tissue or organ. They can also study how thousands of genes work together in networks to make a system function.
As we know, 99.9% of nucleotide bases are the same in all humans. However, there are some differences in DNA sequences among people, which make them unique. This is their DNA fingerprint. How do we determine these differences? If we compare the whole DNA sequences of two individuals, it’ll take way too long. DNA fingerprinting is a quicker way to compare the sequences of two individuals.
This technique involves identifying differences in the repetitive DNA regions. The peaks on a density gradient centrifugation help to separate the repetitive part from the bulk DNA. Here, the bulk DNA forms a major peak, while the small peaks are called satellite DNA.
Satellite DNA is classified into micro-satellites and mini-satellites based on multiple factors such as – base composition (A:T rich or G:C rich), number of repetitive units, length of segment etc. These sequences, don’t code for any protein but are abundant in the human genome. They also show a high degree of polymorphism i.e. differences in DNA sequence and therefore, form the basis of DNA fingerprinting.
DNA from every tissue such as hair follicle, saliva, skin, bone etc show the same degree of polymorphism. Thus, these are very important as an identification tool in forensic applications. Moreover, since polymorphisms are passed on from parents to children, this fingerprinting technique is also the basis of paternity testing.
Let’s understand exactly what polymorphisms are.
Polymorphisms are variations at the genetic level that arise due to mutations. In an individual, new mutations can arise either in somatic cells or germ cells i.e. cells that generate sperm and ovum. If the germ cell mutation doesn’t affect the individual’s ability to reproduce, then it is passed on to the next generation and thus, spreads in the population.
DNA polymorphism is an inheritable mutation observed at a high frequency in a population. The probability of these variations is higher on non-coding DNA since mutations in them will not impact an individual’s reproductive ability. This then passes from generation to generation and is one of the basis of variation in human evolution. Polymorphisms can be changes in a single nucleotide or large scale changes.
Alec Jeffreys initially developed the technique of DNA fingerprinting using a satellite DNA that shows a very high degree of polymorphism, as a probe. It is called Variable Number of Tandem Repeats (VNTR). VNTR belongs to the class of mini-satellites. Here, a small DNA sequence is arranged in many copies. The copy number varies between individuals and the number of repeats shows a high degree of polymorphism.
The technique of DNA fingerprinting involves Southern blot hybridization using radiolabelled VNTR as a probe. The steps are:
- Sample collection
- DNA isolation.
- DNA digestion using restriction endonucleases.
- Separation of DNA fragments using electrophoresis.
- Blotting (transferring) of separated DNA fragments on to synthetic membranes like nylon or nitrocellulose.
- Hybridization with the labelled VNTR probe.
- Detection of the hybridized DNA fragments using autoradiography.
The size of VNTR ranges from 0.1 to 20 kilobases. Therefore, the autoradiogram results show bands of multiple sizes. These bands give a characteristic pattern which differs between individuals except for monozygotic twins. Further, polymerase chain reaction (PCR) increases the sensitivity of fingerprinting i.e. DNA from a single cell is enough to perform fingerprinting.Apart from forensic science and paternity testing, this technique is also useful in determining population and genetic diversities. Therefore, many different probes are used currently to generate DNA fingerprints.
Solved Example For You
Question: Which of the following statements is false?
- Less than 2% of the human genome codes for proteins.
- Chromosome Y has the most genes.
- An average gene in the human genome has 3000 bases.
- Repeated sequences make up a large part of the human genome.
Solution: Statement ‘b’ is false because chromosome Y has the fewest genes (231).