### DNA Structure & Packaging #### 1. The DNA - Long polymer of deoxyribonucleotides. - **Length:** Defined by number of nucleotides (or base pairs, bp). - Example: Bacteriophage $\phi$X174 (5386 nucleotides), Bacteriophage $\lambda$ (48502 bp), *E. coli* (4.6 × 10^6 bp), Human haploid content (3.3 × 10^9 bp). #### 2. Structure of Polynucleotide Chain - **Components of Nucleotide:** - Nitrogenous base (Purines: Adenine (A), Guanine (G); Pyrimidines: Cytosine (C), Uracil (U), Thymine (T)). - Pentose sugar (Ribose for RNA, Deoxyribose for DNA). - Phosphate group. - **Linkages:** - **N-glycosidic linkage:** Nitrogenous base to 1'C of pentose sugar $\rightarrow$ nucleoside. - **Phosphoester linkage:** Phosphate group to 5'C of nucleoside $\rightarrow$ nucleotide (or deoxynucleotide). - **3'-5' Phosphodiester linkage:** Connects nucleotides to form polynucleotide chain. - **Polynucleotide Chain Features:** - Free 5' phosphate at one end (5'-end). - Free 3' hydroxyl at the other end (3'-end). - Backbone formed by sugar and phosphates; nitrogenous bases project inwards. - **RNA Specific Features:** - Additional -OH group at 2'-position of ribose sugar. - Uracil (U) instead of Thymine (T). - **Friedrich Meischer (1869):** Identified DNA as 'Nuclein'. - **Watson & Crick (1953) Double Helix Model:** - Based on X-ray diffraction data (Maurice Wilkins & Rosalind Franklin). - **Chargaff's Rule:** A/T ratio and G/C ratio are constant and equal to one in double-stranded DNA. - **Complementary Strands:** Sequence of one strand predicts the other; each strand can template a new strand. - **Salient Features:** - Two polynucleotide chains. - Sugar-phosphate backbone, bases project inside. - **Anti-parallel polarity:** One chain 5' $\rightarrow$ 3', other 3' $\rightarrow$ 5'. - **Base pairing via H-bonds:** A=T (2 H-bonds), G≡C (3 H-bonds). - Purine always opposite pyrimidine $\rightarrow$ uniform distance between strands. - Right-handed coiling. - Pitch of helix: 3.4 nm (10 bp/turn); distance between bp: 0.34 nm. - Plane of one base pair stacks over the other: confers stability. - **Central Dogma (Francis Crick):** Genetic information flows from DNA $\rightarrow$ RNA $\rightarrow$ Protein. #### 3. Packaging of DNA Helix - **DNA length:** 2.2 meters in human cell; nucleus dimension: ~10^-6 meters $\rightarrow$ DNA must be packaged. - **Prokaryotic DNA Packaging (e.g., *E. coli*):** - No defined nucleus. - Negatively charged DNA held by positively charged proteins in a region called 'nucleoid'. - DNA organized in large loops. - **Eukaryotic DNA Packaging:** More complex. - **Histones:** Positively charged, basic proteins (rich in lysine and arginine). - Form **histone octamer** (unit of eight molecules). - Negatively charged DNA wraps around histone octamer $\rightarrow$ **nucleosome**. - Typical nucleosome: 200 bp of DNA helix. - **Chromatin:** Nucleosomes form repeating units in nucleus; thread-like, stained bodies. - **"Beads-on-string" structure:** Appearance of nucleosomes under EM. - **Higher-level packaging:** Chromatin fibers coil and condense to form chromosomes. - Requires **Non-histone Chromosomal (NHC) proteins**. - **Chromatin Types:** - **Euchromatin:** Loosely packed, stains light, transcriptionally active. - **Heterochromatin:** Densely packed, stains dark, transcriptionally inactive. ### Search for Genetic Material #### 1. Transforming Principle (Frederick Griffith, 1928) - **Experiment:** *Streptococcus pneumoniae* (causes pneumonia). - S strain (smooth, virulent, polysaccharide coat) $\rightarrow$ mice die. - R strain (rough, non-virulent, no coat) $\rightarrow$ mice live. - Heat-killed S strain $\rightarrow$ mice live. - Heat-killed S strain + live R strain $\rightarrow$ mice die (live S bacteria recovered). - **Conclusion:** R strain transformed by heat-killed S strain; a "transforming principle" transferred and made R virulent. Biochemical nature undefined. #### 2. Biochemical Characterization of Transforming Principle (Avery, MacLeod, McCarty, 1933-44) - **Aim:** Determine biochemical nature of Griffith's transforming principle. - **Method:** Purified biochemicals (proteins, DNA, RNA) from heat-killed S cells. Tested which could transform live R cells to S. - **Discovery:** DNA alone from S bacteria caused R bacteria to become transformed. - **Enzyme Treatments:** - Proteases (protein-digesting) and RNases (RNA-digesting) $\rightarrow$ no effect on transformation. - DNase (DNA-digesting) $\rightarrow$ inhibited transformation. - **Conclusion:** DNA is the hereditary material. #### 3. The Genetic Material is DNA (Hershey-Chase Experiment, 1952) - **Unequivocal proof:** Used bacteriophages (viruses that infect bacteria). - **Mechanism:** Phage injects genetic material into bacteria, which then treats it as its own and produces more viruses. - **Aim:** Discover if viral protein or DNA entered bacteria. - **Method:** 1. Grew phages in ^32^P (radioactive phosphorus) medium $\rightarrow$ DNA radioactive, protein non-radioactive. 2. Grew phages in ^35^S (radioactive sulfur) medium $\rightarrow$ protein radioactive, DNA non-radioactive. 3. **Infection:** Allowed radioactive phages to infect *E. coli*. 4. **Blending:** Agitated to detach viral coats from bacteria. 5. **Centrifugation:** Separated viral particles from bacteria. - **Results:** - Bacteria infected with ^32^P-labeled DNA viruses: radioactive cells, no radioactivity in supernatant $\rightarrow$ DNA passed to bacteria. - Bacteria infected with ^35^S-labeled protein viruses: non-radioactive cells, radioactivity in supernatant $\rightarrow$ proteins did not enter bacteria. - **Conclusion:** DNA is the genetic material passed from virus to bacteria. #### 4. Properties of Genetic Material (DNA vs. RNA) - **Criteria for Genetic Material:** 1. **Replication:** Must be able to generate its replica. Both DNA & RNA can. 2. **Stability:** Chemically and structurally stable. - DNA is more stable (less reactive, structurally more stable) due to: - Absence of 2'-OH group. - Presence of Thymine (T) instead of Uracil (U). - Double-stranded structure (allows repair). - RNA is less stable (reactive, easily degradable, catalytic) due to: - Presence of 2'-OH group. - Uracil (U) instead of Thymine (T). 3. **Mutation:** Provide scope for slow changes (mutation) for evolution. Both DNA & RNA can mutate. RNA mutates faster (unstable). 4. **Expression:** Able to express itself as "Mendelian Characters". - RNA can directly code for protein synthesis (easily expresses characters). - DNA depends on RNA for protein synthesis. - **Conclusion:** DNA is preferred for storage of genetic information (more stable); RNA is better for transmission of genetic information. ### RNA World - **Hypothesis:** RNA was the first genetic material. - **Evidence:** Essential life processes (metabolism, translation, splicing) evolved around RNA. - **Dual Role:** RNA acted as both genetic material and catalyst. - **Problem:** RNA was reactive and unstable as a catalyst. - **Evolution:** DNA evolved from RNA (chemical modifications) to be more stable. Double-stranded nature + complementary strands allowed for repair processes. ### Replication #### 1. Semiconservative DNA Replication - **Watson & Crick Proposal (1953):** "The specific pairing we have postulated immediately suggests a possible copying mechanism for the genetic material." - **Scheme:** 1. Two DNA strands separate. 2. Each strand acts as a template for new complementary strands. 3. Result: Each new DNA molecule has one parental and one newly synthesized strand. #### 2. Experimental Proof (Meselson & Stahl, 1958) - **Aim:** Prove DNA replicates semiconservatively. - **Organism:** *E. coli*. - **Method:** 1. Grew *E. coli* for generations in ^15^NH4Cl (heavy nitrogen) medium $\rightarrow$ DNA became heavy (^15^N^15^N). 2. Transferred *E. coli* to ^14^NH4Cl (normal nitrogen) medium. 3. Took samples at definite time intervals (after 20 min, 40 min). 4. Extracted DNA and separated on CsCl density gradient. - **Results:** - **Generation 0 (before transfer):** DNA was exclusively heavy (^15^N^15^N). - **Generation I (after 20 min/one replication):** DNA had hybrid/intermediate density (^14^N^15^N). - **Generation II (after 40 min/two replications):** Equal amounts of hybrid (^14^N^15^N) and light (^14^N^14^N) DNA. - **Conclusion:** DNA replication is semiconservative. - **Taylor & colleagues (1958):** Confirmed semiconservative replication in chromosomes using radioactive thymidine in *Vicia faba*. #### 3. Machinery and Enzymes - **Main Enzyme:** **DNA-dependent DNA polymerase**. - Uses DNA template to polymerize deoxynucleotides. - Highly efficient (e.g., *E. coli* replicates 4.6 × 10^6 bp in 18 min $\rightarrow$ ~2000 bp/sec). - High accuracy (prevents mutations). - **Energetics:** Deoxyribonucleoside triphosphates serve as: 1. Substrates. 2. Energy providers (terminal phosphates are high-energy). - **Replication Fork:** Small opening of DNA helix where replication occurs (long DNA cannot separate entirely due to high energy). - **DNA Polymerase Directionality:** - Polymerizes only in 5' $\rightarrow$ 3' direction. - **Continuous replication:** On template strand with 3' $\rightarrow$ 5' polarity. - **Discontinuous replication:** On template strand with 5' $\rightarrow$ 3' polarity $\rightarrow$ short fragments joined by **DNA ligase**. - **Origin of Replication (ori):** DNA polymerases cannot initiate replication randomly; replication starts at definite regions. ### Transcription #### 1. DNA Replication Context - Origin of replication needed for DNA propagation in recombinant DNA (vectors provide ori). - In eukaryotes, DNA replication occurs during S-phase. - Must be highly coordinated with cell division. Failure $\rightarrow$ polyploidy. #### 2. Transcription Process - Copying genetic information from one DNA strand into RNA. - **Principle of complementarity:** Governs process, but A pairs with U (instead of T). - **Differences from Replication:** - Only a segment of DNA is copied. - Only one strand is copied into RNA. - Requires defining boundaries for region and strand to be transcribed. #### 3. Why Not Both Strands Transcribed? - **Reason 1: Different Protein Sequences:** If both strands acted as templates, they would code for RNA molecules with different sequences, leading to different proteins, complicating genetic information transfer. - **Reason 2: Formation of Double-stranded RNA:** If both RNA molecules produced, they would be complementary, form dsRNA, preventing translation. #### 4. Transcription Unit - Defined by three regions in DNA: 1. **Promoter:** - Located towards 5'-end (upstream) of structural gene (with respect to coding strand polarity). - DNA sequence where RNA polymerase binds. - Defines template and coding strands. 2. **Structural Gene:** Codes for the RNA. 3. **Terminator:** - Located towards 3'-end (downstream) of coding strand. - Usually defines end of transcription. - Additional regulatory sequences may be present. #### 5. Template and Coding Strands - DNA-dependent RNA polymerase polymerizes in 5' $\rightarrow$ 3' direction. - **Template Strand:** - Has 3' $\rightarrow$ 5' polarity. - Acts as template for RNA synthesis. - **Coding Strand:** - Has 5' $\rightarrow$ 3' polarity. - Sequence is same as RNA (except T replaced by U). - Displaced during transcription; "coding" because its sequence dictates mRNA sequence (not directly coding for protein). #### 6. Types of RNA and Process of Transcription ##### In Bacteria: - **RNA Polymerase:** Single DNA-dependent RNA polymerase catalyzes transcription of all types of RNA. - **Stages:** 1. **Initiation:** RNA polymerase binds to promoter, associates with initiation-factor ($\sigma$) to initiate. 2. **Elongation:** Polymerase moves along template, opening helix, and polymerizing RNA in 5' $\rightarrow$ 3' direction. Short RNA stretch remains bound. 3. **Termination:** Polymerase reaches terminator region, nascent RNA falls off, polymerase detaches, associates with termination-factor ($\rho$) to terminate. - **Coupling of transcription and translation:** In bacteria, mRNA needs no processing; transcription and translation occur in same compartment, so translation can begin before mRNA is fully transcribed. ##### In Eukaryotes: - **Additional Complexities:** - At least three RNA polymerases in nucleus: - **RNA polymerase I:** Transcribes rRNAs (28S, 18S, 5.8S). - **RNA polymerase II:** Transcribes hnRNA (heterogeneous nuclear RNA) $\rightarrow$ precursor of mRNA. - **RNA polymerase III:** Transcribes tRNA, 5srRNA, snRNAs. - **Post-transcriptional processing (hnRNA $\rightarrow$ mRNA):** - **Splicing:** Introns (non-coding) are removed, exons (coding) are joined in defined order. - **Capping:** Unusual nucleotide (methyl guanosine triphosphate, mGppp) added to 5'-end. - **Tailing:** 200-300 adenylate residues added at 3'-end (poly-A tail), template-independent. - Fully processed mRNA is transported out of nucleus for translation. ### Genetic Code & Translation #### 1. Genetic Code - **Definition:** Code that directs amino acid sequence during protein synthesis. - No direct complementarity between nucleotides and amino acids. - **Historical Development:** - **George Gamow:** Proposed triplet code (4 bases for 20 amino acids $\rightarrow$ 4^3 = 64 codons). - **Har Gobind Khorana:** Developed chemical method to synthesize RNA with defined base combinations. - **Marshall Nirenberg:** Used cell-free system for protein synthesis to decipher code. - **Severo Ochoa enzyme:** Polynucleotide phosphorylase, used for template-independent RNA polymerization. - **Features of Genetic Code:** - **Triplet Codon:** Each codon (3 nucleotides) codes for one amino acid. 61 codons code for amino acids, 3 are stop codons. - **Degenerate:** Some amino acids are coded by more than one codon. - **Contiguous:** Read in mRNA in a continuous fashion, no punctuations. - **Nearly Universal:** Code is largely same from bacteria to humans (e.g., UUU codes for Phenylalanine). Exceptions: mitochondrial codons, some protozoans. - **AUG:** Dual function $\rightarrow$ codes for Methionine (Met) and acts as initiator codon. - **Stop/Terminator Codons:** UAA, UAG, UGA. #### 2. Mutations and Genetic Code - **Point Mutations:** Change of a single base pair. - Example: Sickle cell anemia (change of glutamate to valine in beta globin chain). - **Insertions/Deletions (Frameshift Mutations):** Insertion or deletion of one or two bases changes the reading frame from that point onwards. - Insertion/deletion of three or its multiple bases inserts/deletes one or multiple amino acids but leaves the reading frame unaltered. #### 3. tRNA – the Adapter Molecule - **Francis Crick's Postulate:** Adapter molecule reads code and links to amino acids. - **tRNA (soluble RNA):** Identified as adapter molecule. - **Structure & Function:** - **Anticodon loop:** Contains bases complementary to mRNA codon. - **Amino acid acceptor end:** Binds to specific amino acids. - Each amino acid has a specific tRNA. - **Initiator tRNA:** Specific tRNA for initiation (recognizes AUG). - No tRNAs for stop codons. - Secondary structure: Clover-leaf. Actual structure: Compact, inverted L-shape. #### 4. Translation - **Definition:** Polymerization of amino acids to form a polypeptide (protein). - Order and sequence of amino acids defined by mRNA base sequence. - Amino acids joined by **peptide bond** (requires energy). - **Ribosome:** Cellular factory for protein synthesis. - Consists of structural RNAs (rRNA) and ~80 proteins. - Inactive: exists as large and small subunits. - Active: small subunit encounters mRNA $\rightarrow$ translation begins. Large subunit has two sites for amino acids. - Acts as a catalyst for peptide bond formation (e.g., 23S rRNA in bacteria is a **ribozyme**). - **Translational Unit & UTRs:** - **Translational unit:** mRNA sequence flanked by start (AUG) and stop codon, coding for a polypeptide. - **Untranslated Regions (UTRs):** Additional sequences at 5'-end (before start codon) and 3'-end (after stop codon); required for efficient translation. - **Steps of Translation:** 1. **Activation/Charging of tRNA (Aminoacylation):** Amino acids activated in presence of ATP, linked to their cognate tRNA. 2. **Initiation:** Ribosome binds to mRNA at start codon (AUG); initiator tRNA recognizes AUG. 3. **Elongation:** Ribosome moves along mRNA; amino acid-linked tRNAs bind to appropriate mRNA codons (complementary base pairing between tRNA anticodon and mRNA codon); amino acids added one by one. 4. **Termination:** Release factor binds to stop codon $\rightarrow$ translation terminates, polypeptide released. ### Regulation of Gene Expression - Control of the broad process leading to polypeptide formation. - **Levels of Regulation (in eukaryotes):** - **Transcriptional level:** Formation of primary transcript. - **Processing level:** Regulation of splicing. - **Transport level:** mRNA transport from nucleus to cytoplasm. - **Translational level.** #### 1. Prokaryotic Gene Expression Control - **Predominant site:** Rate of transcriptional initiation. - RNA polymerase activity at promoter regulated by accessory proteins (activators/repressors). - **Operators:** Specific DNA sequences regulating promoter accessibility, adjacent to promoter elements, bind repressor proteins. Each operon has specific operator and repressor. #### 2. The Lac Operon - **Elucidated by Francois Jacob and Jacque Monod.** - **Definition:** Polycistronic structural gene regulated by common promoter and regulatory genes (e.g., *lac* operon, *trp* operon, *ara* operon). - **Components:** - **Regulatory gene (i gene):** Codes for the repressor of the *lac* operon. - **Structural genes (z, y, a):** - **z gene:** Codes for $\beta$-galactosidase (hydrolyzes lactose to galactose and glucose). - **y gene:** Codes for permease (increases cell permeability to $\beta$-galactosides). - **a gene:** Encodes transacetylase. - All gene products needed for lactose metabolism. #### 3. Regulation Mechanism - **Inducer:** Lactose (substrate for $\beta$-galactosidase) regulates switching on/off of operon. - **In absence of inducer (lactose):** 1. Repressor (from i gene) is constitutively synthesized. 2. Repressor binds to operator region. 3. Repressor prevents RNA polymerase from transcribing operon. - **In presence of inducer (lactose or allolactose):** 1. Inducer interacts with repressor, inactivating it. 2. Inactivated repressor cannot bind to operator. 3. RNA polymerase gains access to promoter, transcription proceeds. 4. Synthesis of $\beta$-galactosidase, permease, transacetylase occurs. - **Negative Regulation:** Repressor-mediated control. ### Human Genome Project (HGP) #### 1. Human Genome Project - Ambitious project to sequence entire human genome. - **Basis:** DNA base sequence determines genetic information; individual differences are DNA sequence differences. - Launched in 1990, enabled by genetic engineering and fast sequencing. #### 2. Magnitude & Requirements - **Size:** ~3 × 10^9 bp. - **Estimated Cost (initial):** US$3/bp $\rightarrow$ total US$9 billion. - **Data Storage:** Huge amount of data generated (3300 books for single human cell's DNA sequence). - **Necessity:** High-speed computational devices for storage, retrieval, analysis. - **Association:** Closely linked with **Bioinformatics**. #### 3. Goals of HGP 1. Identify all 20,000-25,000 genes in human DNA. 2. Determine sequences of 3 billion chemical base pairs. 3. Store information in databases. 4. Improve tools for data analysis. 5. Transfer related technologies to other sectors. 6. Address ethical, legal, and social issues (ELSI). #### 4. Coordination & Completion - **Duration:** 13-year project. - **Coordinated by:** U.S. Department of Energy (DoE) and National Institute of Health (NIH). - **Major partner:** Wellcome Trust (U.K.). - **Completion:** 2003. - **Impact:** Knowledge of DNA variations $\rightarrow$ revolutionary ways to diagnose, treat, prevent disorders; understanding human biology; applying knowledge to healthcare, agriculture, energy, environment. - **Model organisms sequenced:** Bacteria, yeast, *C. elegans*, *Drosophila*, rice, *Arabidopsis*. #### 5. Methodologies - **Approaches:** 1. **Expressed Sequence Tags (ESTs):** Identify all genes expressed as RNA. 2. **Blind Approach (Sequence Annotation):** Sequence whole genome (coding and non-coding), then assign functions to regions. - **Sequencing Process:** 1. **DNA Isolation** from cells. 2. **Fragmentation:** Convert DNA into random, smaller fragments. 3. **Cloning:** Fragments cloned into suitable hosts (bacteria, yeast) using specialized vectors (BACs, YACs) for amplification. 4. **Sequencing:** Fragments sequenced using automated DNA sequencers (Frederick Sanger method). 5. **Arrangement:** Sequenced fragments arranged based on overlapping regions using computer programs. 6. **Annotation:** Sequences assigned to chromosomes (chromosome 1 completed last, May 2006). #### 6. Salient Features of Human Genome 1. **Total base pairs:** 3164.7 million bp. 2. **Gene size:** Average 3000 bases; largest is dystrophin (2.4 million bases). 3. **Total number of genes:** Estimated at 30,000 (lower than previous estimates). 4. **Similarity:** 99.9% nucleotide bases are same in all humans. 5. **Unknown functions:** >50% of discovered genes have unknown functions. 6. **Protein coding:** ### DNA Fingerprinting #### 1. DNA Fingerprinting - **Technique:** Identifies differences in specific repetitive DNA sequences between individuals. - **Basis:** 99.9% of human base sequence is same $\rightarrow$ remaining differences make each individual unique. - Quick way to compare DNA sequences without sequencing entire genome. #### 2. Repetitive DNA - **Definition:** Regions in DNA where small stretches are repeated many times. - **Separation:** Separated from bulk genomic DNA as different peaks during density gradient centrifugation (bulk DNA $\rightarrow$ major peak; small peaks $\rightarrow$ **satellite DNA**). - **Classification of Satellite DNA:** Based on base composition (A:T rich or G:C rich), length, and number of repetitive units (e.g., micro-satellites, mini-satellites). - **Function:** Normally do not code for proteins. - **Importance:** High degree of polymorphism $\rightarrow$ basis of DNA fingerprinting. - **Forensic Application:** DNA from any tissue (blood, hair follicle, skin, bone, saliva, sperm) from an individual shows same polymorphism. - **Paternity Testing:** Polymorphisms are inheritable from parents. #### 3. DNA Polymorphism - **Variation at genetic level.** - **Cause:** Arises due to mutations. - **Definition (traditional):** If >1 variant (allele) at a locus occurs in human population with frequency >0.01. - **Simple terms:** Inheritable mutation observed in population at high frequency. - **Occurrence:** Higher probability in non-coding DNA; mutations accumulate over generations; form basis of variability/polymorphism. - **Types:** Single nucleotide changes to very large scale changes. - **Role:** Important for evolution and speciation. #### 4. Technique of DNA Fingerprinting - **Developed by:** Alec Jeffreys. - **Probe Used:** Satellite DNA with high polymorphism, specifically **Variable Number of Tandem Repeats (VNTR)**. - VNTR belongs to mini-satellite class. - Small DNA sequence arranged tandemly in many copy numbers. - Copy number varies from chromosome to chromosome. - High polymorphism in number of repeats. - VNTR size varies from 0.1 to 20 kb. - **Original Technique (Southern blot hybridization using radiolabeled VNTR):** 1. **Isolation of DNA.** 2. **Digestion of DNA** by restriction endonucleases. 3. **Separation of DNA fragments** by electrophoresis. 4. **Transferring (blotting)** separated DNA fragments to synthetic membranes (nitrocellulose or nylon). 5. **Hybridization** using labeled VNTR probe. 6. **Detection** of hybridized DNA fragments by autoradiography. - **Result:** Autoradiogram gives many bands of differing sizes, producing a characteristic pattern unique for each individual (except monozygotic twins). - **Sensitivity Enhancement:** Increased by Polymerase Chain Reaction (PCR) $\rightarrow$ single cell DNA sufficient. #### 5. Applications - Forensic science. - Paternity testing. - Determining population and genetic diversities. - Many different probes used to generate DNA fingerprints.