the question is about genetic code, transcription and translation and use molecular biology tools

Description

There are 14 questions most of them are answered by using websites (BLAST, Splign, Expasy, NEBcutter). Answers should be in Harvard references style. The explanation questions should be 100 words.

Don't use plagiarized sources. Get Your Custom Assignment on
the question is about genetic code, transcription and translation and use molecular biology tools
From as Little as $13/Page

Unformatted Attachment Preview

all letters from the following lines as the protein / DNA sequence. It is important to include the > symbol with the
name, or the program will assume that the name is a protein sequence and it is likely to return an error.
The sequences provided for bioinformatics tasks in the ‘sequences and links’ document are in FASTA
format. You will need to provide FASTA format sequences for some of your answers – marks will be
deducted if you provide a sequence without the FASTA header.
G Glycine
Gly
A Alanine
Ala
L Leucine
Leu
M Methionine
Met
F Phenylalanine Phe
W Tryptophan
Trp
K Lysine
Lys
Q Glutamine
Gln
E Glutamic Acid Glu
S Serine
Ser
P Proline
V Valine
I Isoleucine
C Cysteine
Y Tyrosine
H Histidine
R Arginine
N Asparagine
D Aspartic Acid
T Threonine
Pro
Val
Ile
Cys
Tyr
His
Arg
Asn
Asp
Thr
Figure 1. The standard genetic code (mRNA codon) and the single letter amino acid code.
1
The genetic code, transcription and translation
1.
The DNA sequence below shows the coding sequence of a very short hypothetical gene.
5’ ATGGCTGAAGGGGCGAGCCATATAAGAGCATAG 3’
i)
Write out the complementary, non-coding DNA sequence (aka the template) underneath the
coding sequence and label the 5’ and 3’ ends of each strand. (2 marks)
(Align your non-coding sequence under the coding sequence in the answer template so that the
complementary bases in the non-coding sequence lie directly beneath the coding bases. Use
Courier New font to do this – see instructions above).
ii) Using the genetic code (Fig. 1) deduce the amino acid sequence of the peptide it encodes.
(2 marks)
2.
The sequence below shows the non-coding (aka template) strand from the whole of the transcribed
region of a very short hypothetical gene
5’ GGCTTCTTTAGTACTGGCCAGTGGGATCCAAGTAGGCTGCCATTTCGT 3’
i)
Write out the sequence of the mRNA from this gene in the orientation 5′ → 3′ (2 marks)
ii) Using the genetic code deduce the amino acid sequence of the peptide it encodes (3 marks)
3.
A scientist is researching GS1, an enzyme with a size of 78,000 Da, present in a bacterium. The scientist
has isolated two mutant strains of the bacterium as described below.
Strain A: In this strain the GS1 protein is completely non-functional.
Strain B: This strain produces functional GS1, but the stability of the protein is somewhat reduced.
The isolated, purified GS1 protein from a wild type strain and strains A and B were analysed by SDSPAGE to determine their size (Fig. 2).
Figure 2. Diagram of an SDS-PAGE gel of purified GS1 proteins. The
purified GS1 protein from 3 bacterial strains – wild type, strain A and strain
B are shown.
2
i)
Using figure 2, determine the approximate sizes (in kDa) of the GS1 proteins isolated from strains
A and B and indicate whether they are larger or smaller than the GS1 from the wild type strain.
(NB. It is not possible to accurately determine the sizes of the proteins on the gel, a range of sizes
is all that is required for your answer). (2 marks)
4. The scientist determines the nucleotide sequence of the coding strand of the GS1 gene from strain A. It
is identical to the GS1 sequence from the wild type gene except for a single change occurring
approximately ⅓ of the way into the GS1 open reading frame. A small region of the GS1 sequence
(including the site where the mutation occurs) from the wild type and mutant strains is shown below.
Wild type TGTCCTCGGCCACAAGTTCTCTATC
Strain A
TGTCCTCGGCCACTAGTTCTCTATC
i)
How has this mutation produced the inactive GS1 protein in strain A? (2 marks)
ii) Using the genetic code, deduce the amino acid sequence of the wild type GS1 protein
corresponding to the short piece of DNA shown above. (3 marks)
5. Sequencing of the GS1 gene from strain B shows that it is identical to the wild type gene except for a
single alteration (the replacement of one nucleotide by another). How might this account for the features
of the GS1 protein produced by strain B? (3 marks)
3
Bioinformatics and molecular biology tools
A. Analysis of a eukaryotic gene
Most eukaryotic genes are composed of both exons (regions represented in the mRNA derived from the
gene) and introns (regions that are transcribed initially but are absent from the mRNA). Furthermore, not all
the nucleotides in an mRNA are translated into protein. Thus, gene sequences need careful analysis and
annotation to identify their functional regions. In this part of the assignment, DNA databases and other online
resources will be used to find the identity of a human gene (gene A) and determine its intron-exon structure
and the properties of the protein it encodes.
An example of a human gene (Human metallothionein 2A gene) showing the entire region that is transcribed
into RNA and the protein that results from translation of the exon sequences is shown in figure 3.
Figure 3. The Human metallothionein 2A gene. Exon sequences are in upper
case, intron sequences in lower case. The predicted amino acid sequence is
shown above the corresponding nucleotide sequence, * represents a
termination codon.
4
6. Below is the entire transcribed region of the chromosomal DNA of gene A (only the coding strand is
shown).
>gene_A_chromosomal_sequence
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGT
TACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGACAGGTTT
AAGGAGACCAATAGAAACTGGGCATGTGGAGACAGAGAAGACTCTTGGGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTC
TATTTTCCCACCCTTAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGCTCACCTGGACAACCT
CAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCCTGAGAACTTCAGGGTGAGTCTATGGGAC
GCTTGATGTTTTCTTTCCCCTTCTTTTCTATGGTTAAGTTCATGTCATAGGAAGGGGATAAGTAACAGGGTACAGTTTAGAATGG
GAAACAGACGAATGATTGCATCAGTGTGGAAGTCTCAGGATCGTTTTAGTTTCTTTTATTTGCTGTTCATAACAATTGTTTTCTT
TTGTTTAATTCTTGCTTTCTTTTTTTTTCTTCTCCGCAATTTTTACTATTATACTTAATGCCTTAACATTGTGTATAACAAAAGG
AAATATCTCTGAGATACATTAAGTAACTTAAAAAAAAACTTTACACAGTCTGCCTAGTACATTACTATTTGGAATATATGTGTGC
TTATTTGCATATTCATAATCTCCCTACTTTATTTTCTTTTATTTTTAATTGATACATAATCATTATACATATTTATGGGTTAAAG
TGTAATGTTTTAATATGTGTACACATATTGACCAAATCAGGGTAATTTTGCATTTGTAATTTTAAAAAATGCTTTCTTCTTTTAA
TATACTTTTTTGTTTATCTTATTTCTAATACTTTCCCTAATCTCTTTCTTTCAGGGCAATAATGATACAATGTATCATGCCTCTT
TGCACCATTCTAAAGAATAACAGTGATAATTTCTGGGTTAAGGCAATAGCAATATCTCTGCATATAAATATTTCTGCATATAAAT
TGTAACTGATGTAAGAGGTTTCATATTGCTAATAGCAGCTACAATCCAGCTACCATTCTGCTTTTATTTTATGGTTGGGATAAGG
CTGGATTATTCTGAGTCCAAGCTAGGCCCTTTTGCTAATCATGTTCATACCTCTTATCTTCCTCCCACAGCTCCTGGGCAACGTG
CTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCACCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGG
CTAATGCCCTGGCCCACAAGTATCACTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAA
CTACTAAACTGGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGCAA
Use this sequence to search the DNA sequence database GenBank to find the identity of the gene.
GenBank includes numerous gene sequences from all classes of organisms and viruses. As of October
2021, it contained approximately 233 million individual sequence files, constituting a total of about 1,000
billion nucleotides.
The GenBank database can be searched using the Megablast algorithm – a version of BLAST (Basic
Local Alignment Search Tool). BLAST algorithms compare a query DNA sequence with all the sequences
deposited in a sequence database in order to find matches. Therefore, most DNA sequences can be
rapidly identified as, for example, representing a particular gene. To run a Megablast search:





Open the web site: http://blast.ncbi.nlm.nih.gov/
Select “Nucleotide BLAST”
Copy and paste the chromosomal sequence from gene A into the “Enter Query Sequence” box
From the drop-down database options in the “choose search set” section select “Human RefSeqGene
sequences” – this represents a reference collection of thoroughly analysed and annotated gene
sequences.
Click “BLAST” (at bottom of page).
The results under the “Descriptions” heading show closely matching sequences in order of decreasing
similarity, with the top hit representing the best match. It should show 100% sequence identity and 100%
sequence coverage and have a low E value (look up the definition of this), indicating that it is identical to
gene A.
The gene name and symbol for each of the hits are shown. All genes have an official gene symbol
consisting of a few letters, with perhaps one or two numbers. For example, INS is the symbol for the
human insulin gene, TNNT2 represents the gene encoding the cardiac form of the muscle protein
troponin-T. Symbols for human genes are always shown in upper case and the name and symbol for a
gene, or its RNA (but not the corresponding protein) should be italicised.
i)
What is the identity of gene A? Record the gene name and gene symbol on the answer template.
(3 marks)
5
Now left-click on the gene name, or alternatively go to the alignments section. This will show an
alignment of the gene A chromosomal sequence with the matching human genes. Left-click on the gene
ID for the best matching gene (Fig. 4).
6
Figure 4. A section of the output from a BLAST search. Follow the indicated link
to obtain detailed information on the matching sequence.
This will open a web page that describes the gene in detail (e.g. chromosomal location, sequence, protein
product, diseases linked to the gene etc.). Take a look at this page to answer the questions below.
ii) Mutations in gene A cause two different, very common genetic diseases, one of which is
subdivided into “zero” and “plus” subtypes (also commonly referred to as major and intermedia
respectively). Give the name and the genetic and biochemical basis of the disease that includes
the term “zero” (or major) in its name. (3 marks)
Find the mRNA sequence for gene A. Click on the mRNA link in the list of information down the
left-hand side of the page. This will highlight the transcribed regions of the gene (figure 5).
Figure 5. Highlighted mRNA regions in the Human metallothionein 2A gene. Click on
the FASTA display option at the bottom of the screen (indicated by the arrow) to obtain the
complete mRNA sequence.
At the bottom right of your screen you will see an option to display the FASTA sequence.
7
iii) Copy and paste the mRNA FASTA sequence into your answer document. (2 marks)
iv) Find the amino acid sequence encoded by gene A. This is found down the left-hand side of the
page labelled as CDS (which means coding sequence). Copy the single letter amino acid
sequence and paste it into your answer document. (1 mark)
7. This question involves the preparation of an annotated sequence of gene A. This should show the
chromosomal sequence for gene A but with exons in upper case, introns in lower case (entire intron
sequences should be included), and amino acids added above the middle base of the corresponding
codon (i.e. the same format as shown above for the metallothionein 2A gene in Fig. 3).
First, determine the intron-exon structure of gene A by comparing the cDNA sequence and the
chromosomal sequence. This can be done using a program called “Splign”, which can be accessed at
https://www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi?textpage=online&level=form
Copy and paste the chromosomal DNA sequence of gene A into the ‘Genomic’ box and the cDNA
sequence (the mRNA sequence you obtained in Q8 v) into the ‘cDNA’ box (Important: The copied
sequences must be in FASTA format).
Click on align. The result should show that the cDNA sequence is comprised of several distinct regions
corresponding to individual exons from the chromosomal sequence (listed as “segments” in the Splign
output – you should inspect each segment in turn, see Figure 6 for an example).
Each segment represents a separate exon.
Examine each in turn
cDNA sequence
chromosomal
sequence
End of 1st intron
Start of 2nd intron
Figure 6. Example of Splign output (segment 2 from the Human metallothionein 2A gene). The
lower line of sequence is from chromosomal (or genomic) DNA, the top line is from cDNA. The regions
where the two sequences no longer overlap correspond to the beginning or end of an intron.
Use the search option in Microsoft Word to find the start and end of introns in the chromosomal DNA
sequence from gene A. Display your findings by showing the chromosomal sequence with exons in upper
case and introns in lower case as illustrated in Fig. 3 for the Human metallothionein 2A gene (NB in
Microsoft Word you can change the case of a block of text using shift F3).
Add the amino acid sequence (from the answer to section Q8 vi), such that the amino acids appear
above the central base of their respective codons. This can be done manually – copy and paste a few
amino acids at a time. Take care to use Courier New font of the same size as used for the DNA sequence,
and insert spaces between each amino acid as appropriate. Give your figure a title and legend.
(15 marks)
8
8.
Mutations in gene A are associated with serious genetic diseases including the “zero” (major) subtype of
the disease named in answer to Q6 ii). The chromosomal sequences of two mutant alleles of gene A
from patients with the zero subtype include the following mutation.
496G>A (a mutation from G to A in the mutant at position 496)
(NB. The position here relates to the chromosomal sequence of gene A)
i)
Highlight the affected nucleotide on your annotated sequence produced in answer to Q7. (1 mark)
ii)
Briefly discuss why this mutant fails to produce functional protein. Note that none of the mRNA
transcribed from this gene is of the expected size; some of the mRNA molecules produced are
223 nucleotides shorter than expected whilst others are 47 nucleotides longer than expected. (5
marks)
B. Analysis of a bacterial gene
Bacterial genes do not usually contain introns. Therefore, coding sequences can be amplified directly from
genomic DNA for a variety of molecular biology applications including diagnostic testing, cloning into plasmids
or to enable DNA sequencing. The polymerase chain reaction (PCR) is used to amplify regions of double
stranded DNA (dsDNA) using genomic DNA or cDNA as a template. (You may find it helpful to read how PCR
is performed before answering the following question).
9. A researcher wants to determine the sequence of gene x from a bacterium. To do this they need to
amplify the region of DNA containing gene x using PCR. The amplified region can then be sequenced
by the Sanger method. To perform a PCR, 2 oligonucleotide primers are required. Below are the
sequences of the 5’ and 3’ flanking regions of the gene, with the start codon shaded in green and the
stop codon shaded in red.
>gene_x
Gacggcctcggaccataacggcttcctgttggacgaggcggaggtcatctactggggtctatgtcctgattgttcgatatccgacacttc
gcgatcacatccgtgatcacagcccgataacaccaactcctggaaggaatgctatg…………………………………………………………………………………………
………………………………………………………………………………………………………………………………………………………………………………………………………………………………………………
…………………………………………………………………………………………………………………………………………………………………………………………
tgattcgggttgatcggccctgcccgccgatcaaccacaacccgccgcagcaccccgcgagctgaccggctcgcggggtgctggtgtttg
cccggcgcgatttgtcagaccccgcgtgcatggtggtcgcaggcacgacgagacggggatgacgagacggggat
The forward primer will anneal to the coding sequence upstream of the start site which is underlined
and shaded in grey. The reverse primer will anneal to the non-coding sequence downstream of the
stop site which is underlined and shaded in grey.
Write the 5’→ 3’ sequences of the primers required to carry out the PCR reaction. (NB. The sequence
shown is the coding sequence, the reverse primer must anneal to the non-coding sequence.) (4 marks)
9
10. The sequence of the DNA amplified in Q11 is shown below.
>gene_x_sequence_result
ataacaccaactcctggaaggaatgctatgcccgagcaacacccacccattacagaaaccaccaccggagccgctagcaacggct
gtcccgtcgtgggtcatatgaaataccccgtcgagggcggcggaaaccaggactggtggcccaaccggctcaatctgaaggtact
gcaccaaaacccggccgtcgctgacccgatgggtgcggcgttcgactatgccgcggaggtcgcgaccatcgacgttgacgccctg
acgcgggacatcgaggaagtgatgaccacctcgcagccgtggtggcccgccgactacggccactacgggccgctgtttatccgga
tggcgtggcacgctgccggcacctaccgcatccacgacggccgcggcggcgccgggggcggcatgcagcggttcgcgccgcttaa
cagctggcccgacaacgccagcttggacaaggcgcgccggctgctgtggccggtcaagaagaagtacggcaagaagctctcatgg
gcggacctgattgttttcgccggcaactgcgcgctggaatcgatgggcttcaagacgttcgggttcggcttcggccgggtcgacc
agtgggagcccgatgaggtctattggggcaaggaagccacctggctcggcgatgagcgttacagcggtaagcgggatctggagaa
cccgctggccgcggtgcagatggggctgatctacgtgaacccggaggggccgaacggcaacccggaccccatggccgcggcggtc
gacattcgcgagacgtttcggcgcatggccatgaacgacgtcgaaacagcggcgctgatcgtcggcggtcacactttcggtaaga
cccatggcgccggcccggccgatctggtcggccccgaacccgaggctgctccgctggagcagatgggcttgggctggaagagctc
gtatggcaccggaaccggtaaggacgcgatcaccagcggcatcgaggtcgtatggacgaacaccccgacgaaatgggacaacagt
ttcctcgagatcctgtacggctacgagtgggagctgacgaagagccctgctggcgcttggcaatacaccgccaaggacggcgccg
gtgccggcaccatcccggacccgttcggcgggccagggcgctccccgacgatgctggccactgacctctcgctgcgggtggatcc
gatctatgagcggatcacgcgtcgctggctggaacaccccgaggaattggccgacgagttcgccaaggcctggtacaagctgatc
caccgagacatgggtcccgttgcgagataccttgggccgctggtccccaagcagaccctgctgtggcaggatccggtccctgcgg
tcagccacgacctcgtcggcgaagccgagattgccagccttaagagccagatccgggcatcgggattgactgtctcacagctagt
ttcgaccgcatgggcggcggcgtcgtcgttccgtggtagcgacaagcgcggcggcgccaacggtggtcgcatccgcctgcagcca
caagtcgggtgggaggtcaacgaccccgacggggatctgcgcaaggtcattcgcaccctggaagagatccaggagtcattcaact
ccgcggcgccggggaacatcaaagtgtccttcgccgacctcgtcgtgctcggtggctgtgccgccatagagaaagcagcaaaggc
ggctggccacaacatcacggtgcccttcaccccgggccgcacggatgcgtcgcaggaacaaaccgacgtggaatcctttgccgtg
ctggagcccaaggcagatggcttccgaaactacctcggaaagggcaacccgttgccggccgagtacatgctgctcgacaaggcga
acctgcttacgctcagtgcccctgagatgacggtgctggtaggtggcctgcgcgtcctcggcgcaaactacaagcgcttaccgct
gggcgtgttcaccgaggcctccgagtcactgaccaacgacttcttcgtgaacctgctcgacatgggtatcacctgggagccctcg
ccagcagatgacgggacctaccagggcaaggatggcagtggcaaggtgaagtggaccggcagccgcgtggacctggtcttcgggt
ccaactcggagttgcgggcgcttgtcgaggtctatggcgccgatgacgcgcagccgaagttcgtgcaggacttcgtcgctgcctg
ggacaaggtgatgaacctcgacaggttcgacgtgcgctgattcgggttgatcggccctgcccgccgat
Find the amino acid sequence of the protein that gene x encodes. Use the translate tool at Expasy;
http://web.expasy.org/translate/. Copy and paste the sequence of gene x into the query box at Expasy.
Select the standard genetic code, compact format, and forward strand only options and click
“TRANSLATE”.
The output will highlight open reading frames (NB 5’3′ or 3’5’ indicates the strand, a hyphen indicates a
termination codon). These represent the proteins that could theoretically be translated from the input
sequence. Identify the most appropriate reading frame (based on your knowledge of how mRNA is
translated). Click on the start codon of your chosen translation to retrieve your amino acid sequence.
i)
Copy and paste the FASTA sequence in your answer document (2 marks)
11. Find the Identity of the protein encoded by gene x. Use BLAST at NCBI as you did in Q8, but this time
select Protein BLAST.
• Open the web site: http://blast.ncbi.nlm.nih.gov/
• select Protein BLAST
• Paste your FASTA sequence of the gene x protein sequence into the query box
• Leave all other settings as default
• Click BLAST (at the bottom of the page)
Use the best hit (most similar sequence) to find the following. NB Bacterial gene names are a combination
of lower-case letters and a capital letter (and may include numbers to denote a specific allele). Symbols
for genes and mRNA are lower case and italicised whereas the protein symbols begin with an upper-case
letter and are not italicised eg. RNA polymerase is RpoB, and this protein is encoded by rpoB gene.
i)
What organism is gene x from? (2 marks)
10
ii) What is the identity of gene x? Write the gene symbol on the answer template. (2 marks)
iii) What is the function of the enzyme encoded by gene x? (you will need to do a little research
to find the importance of this function) (3 marks)
12. Mutations which result in alterations to the encoded protein are written using the single letter amino acid
code. The first letter is the original, wild type amino acid, followed by a number – which is the residue
number (i.e. The position of the amino acid in the sequence) and the second letter is the new amino acid.
eg. Y137F denotes a change from Tyrosine to Phenylalanine at position 137.
The protein encoded by gene x sometimes mutates to produce a variant protein containing S315T;
i)
What does S315T mean? (1 mark)
ii) What is the significance of a mutation resulting in S315T in the protein encoded by gene x ?
(3 marks)
iii) Briefly speculate how you might study the effect of S315T and other mutations on the protein
encoded by gene x (with regards to your answer to Q14ii) (5 marks
11
C. Restriction mapping
Plasmids are used in molecular biology for several purposes. There are various types including cloning
vectors used to increase the number of copies of target DNA, and expression vectors used to create a
protein product. Plasmids contain several features including an origin of replication (specific for the
species in which they will be propagated) and a selectable marker (such as an antibiotic resistance
marker). Cloning of genes into a vector is achieved by cutting the plasmid using restriction enzymes,
(usually using 2 different enzymes to avoid re-annealing of the vector and to ensure correct orientation
of the inserted gene). To facilitate this, plasmids are engineered to contain a multiple cloning site (MCS),
which is a short segment of DNA containing many restriction sites (which usually only occur once in the
plasmid). In expression vectors the MCS is located downstream of a promoter.
Restriction maps show the restriction enzyme recognition sites in a DNA sequence and are required in
order for gene cloning to be performed. (You may find it helpful to read about plasmids and the process
of gene cloning before answering the following questions.)
13. In this question you will produce a restriction map of a plasmid, pJEP (figure 7).
Figure 7. Diagram of expression plasmid pJEP. The origin of replication (ori),
kanamycin resistance gene (KanR), multiplie cloning site (MCS) and promoter are
shown.
Generate a diagram showing restriction enzymes that only cut the plasmid once. Copy and paste the
sequence for the plasmid pJEP from the next page and paste the FASTA sequence in the input box at
NEBcutter (https://nc3.neb.com/NEBcutter/) (Fig. 8).
12
>pJEP
gtggttaatt aatctagagc tagcgaattc ctgtgtgaaa ttgttatccg ctcacaattc
cacacaacat acgagccgga agcataaagt gtaaagcctg gggtgcctaa tgagtgagct
aactcacatt aattgcgttg cgctcactgc ccgctttcca gtcgggaaac ctgtcgtgcc
agctgcatta atgaatcggc caacgcgcgg ggagaggcgg tttgcgtatt gggcgctctt
ccgcttcctc gctcactgac tcgctgcgct cggtcgttcg gctgcggcga gcggtatcag
ctcactcaaa ggcggtaata cggttatcca cagaatcagg ggataacgca ggaaagaaca
tgtgagcaaa aggccagcaa aaggccagga accgtaaaaa ggccgcgttg ctggcgtttt
tccataggct ccgcccccct gacgagcatc acaaaaatcg acgctcaagt cagaggtggc
gaaacccgac aggactataa agataccagg cgtttccccc tggaagctcc ctcgtgcgct
ctcctgttcc gaccctgccg cttaccggat acctgtccgc ctttctccct tcgggaagcg
tggcgctttc tcatagctca cgctgtaggt atctcagttc ggtgtaggtc gttcgctcca
agctgggctg tgtgcacgaa ccccccgttc agcccgaccg ctgcgcctta tccggtaact
atcgtcttga gtccaacccg gtaagacacg acttatcgcc actggcagca gccactggta
acaggattag cagagcgagg tatgtaggcg gtgctacaga gttcttgaag tggtggccta
actacggcta cactagaagg acagtatttg gtatctgcgc tctgctgaag ccagttacct
tcggaaaaag agttggtagc tcttgatccg gcaaacaaac caccgctggt agcggtggtt
tttttgtttg caagcagcag attacgcgca gaaaaaaagg atctcaagaa gatcctttga
tcttttctac ggggtctgac gctcagtgga acgaaaactc acgttaaggg attttggtca
tgagattatc aaaaaggatc ttcacctaga tccttttaaa ttaaaaatga agttttaaat
caatctaaag tatatatgag taaaaatatt ccggaattgc cagctggggc gccctctggt
aaggttggga agccctgcaa agtaaactgg atggctttct tgccgccaag gatctgatgg
cgcaggggat caagatctga tcaagagaca ggatgaggat cgtttcgcat gattgaacaa
gatggattgc acgcaggttc tccggccgct tgggtggaga ggctattcgg ctatgactgg
gcacaacaga caatcggctg ctctgatgcc gccgtgttcc ggctgtcagc gcaggggcgc
ccggttcttt ttgtcaagac cgacctgtcc ggtgccctga atgaactgca ggacgaggca
gcgcggctat cgtggctggc cacgacgggc gttccttgcg cagctgtgct cgacgttgtc
actgaagcgg gaagggactg gctgctattg ggcgaagtgc cggggcagga tctcctgtca
tcccaccttg ctcctgccga gaaagtatcc atcatggctg atgcaatgcg gcggctgcat
acgcttgatc cggctacctg cccattcgac caccaagcga aacatcgcat cgagcgagca
cgtactcgga tggaagccgg tcttgtcgat caggatgatc tggacgaaga gcatcagggg
ctcgcgccag ccgaactgtt cgccaggctc aaggcgcgca tgcccgacgg cgaggatctc
gtcgtgaccc atggcgatgc ctgcttgccg aatatcatgg tggaaaatgg ccgcttttct
ggattcatcg actgtggccg gctgggtgtg gcggaccgct atcaggacat agcgttggct
acccgtgata ttgctgaaga gcttggcggc gaatgggctg accgcttcct cgtgctttac
ggtatcgccg ctcccgattc gcagcgcatc gccttctatc gccttcttga cgagttcttc
tgaaccggta atattattga agcatttatc agggttattg tctcatgagc ggatacatat
ttgaatgtat ttagaaaaat aaacaaatag gggttccgcg cacatttccc cgaaaagtgc
cacctgacgt ctaagaaacc attattatca tgacattaac ctataaaaat aggcgtatca
cgaggccctt tcgtctcgcg cgtttcggtg atgacggtga aaacctctga cacatgcagc
tcccggagac ggtcacagct tgtctgtaag cggatgccgg gagcagacaa gcccgtcagg
gcgcgtcagc gggtgttggc gggtgtcggg gctggcttaa ctatgcggca tcagagcaga
ttgtactgag agtgcaccat atggtcgacc tcgagttaat taacgta
13
Figure 8. The NEBcutter submission page. Paste your FASTA sequence in the box, select
your preferences and click submit.
NEBcutter will generate a diagram of your plasmid showing the position of restriction sites, with the name
of the plasmid and the total length displayed in the centre. Ensure you have selected the single cutter
and circular options and added the name of the plasmid to the ‘Name project’ box. Download the image
as a JPEG and paste it into your answer document and give your figure a title and a legend. (3 marks)
14. The restriction map you have generated in Q13 can be viewed in graphical form, or in the form of a list of
enzymes. If you hover the cursor over the enzyme name on your plasmid diagram it shows the recognition
sequence of the enzyme and the position at which it cuts. Clicking on the enzyme name will give more
information about the enzyme activity. You can also view a list of all enzymes and how many times they
cut the sequence. The position of the cuts can be accessed by clicking on the grey triangle next to the
enzyme name (figure 9).
Figure 9. Example of NEBcutter restriction map enzyme list. Click on the grey triangle of
enzymes that cut the sequence to show the positions of the cut sites.
Use these features of your restriction map of pJEP to answer the following questions.
i)
The multiple cloning site in pJEP runs from position 2478 to position 30. How many restriction
enzymes have recognition sites present in the MCS. (1 mark)
ii) What is the recognition sequence for EcoRI? Does digesting DNA with EcoRI result in ‘blunt’
or ‘sticky’ ends? (2 marks)
14
iii) You have digested pJEP with 2 enzymes, SspI and PstI and run the DNA sample on an
agarose gel along with some other plasmid digests (Fig. 10).
Figure 10. Diagram of an agarose gel. Six digested plasmid
sample are shown.
Use your restriction map and the list of enzymes on NEBcutter to determine what fragment sizes
you would expect after digesting pJEP with PstI and SspI. Record the fragment sizes you would
expect and indicate the sample in Fig. 10 which represent pJEP digested with PstI and SspI. (4
marks)
15
all letters from the following lines as the protein / DNA sequence. It is important to include the > symbol with the
name, or the program will assume that the name is a protein sequence and it is likely to return an error.
The sequences provided for bioinformatics tasks in the ‘sequences and links’ document are in FASTA
format. You will need to provide FASTA format sequences for some of your answers – marks will be
deducted if you provide a sequence without the FASTA header.
G Glycine
Gly
A Alanine
Ala
L Leucine
Leu
M Methionine
Met
F Phenylalanine Phe
W Tryptophan
Trp
K Lysine
Lys
Q Glutamine
Gln
E Glutamic Acid Glu
S Serine
Ser
P Proline
V Valine
I Isoleucine
C Cysteine
Y Tyrosine
H Histidine
R Arginine
N Asparagine
D Aspartic Acid
T Threonine
Pro
Val
Ile
Cys
Tyr
His
Arg
Asn
Asp
Thr
Figure 1. The standard genetic code (mRNA codon) and the single letter amino acid code.
1
The genetic code, transcription and translation
1.
The DNA sequence below shows the coding sequence of a very short hypothetical gene.
5’ ATGGCTGAAGGGGCGAGCCATATAAGAGCATAG 3’
i)
Write out the complementary, non-coding DNA sequence (aka the template) underneath the
coding sequence and label the 5’ and 3’ ends of each strand. (2 marks)
(Align your non-coding sequence under the coding sequence in the answer template so that the
complementary bases in the non-coding sequence lie directly beneath the coding bases. Use
Courier New font to do this – see instructions above).
ii) Using the genetic code (Fig. 1) deduce the amino acid sequence of the peptide it encodes.
(2 marks)
2.
The sequence below shows the non-coding (aka template) strand from the whole of the transcribed
region of a very short hypothetical gene
5’ GGCTTCTTTAGTACTGGCCAGTGGGATCCAAGTAGGCTGCCATTTCGT 3’
i)
Write out the sequence of the mRNA from this gene in the orientation 5′ → 3′ (2 marks)
ii) Using the genetic code deduce the amino acid sequence of the peptide it encodes (3 marks)
3.
A scientist is researching GS1, an enzyme with a size of 78,000 Da, present in a bacterium. The scientist
has isolated two mutant strains of the bacterium as described below.
Strain A: In this strain the GS1 protein is completely non-functional.
Strain B: This strain produces functional GS1, but the stability of the protein is somewhat reduced.
The isolated, purified GS1 protein from a wild type strain and strains A and B were analysed by SDSPAGE to determine their size (Fig. 2).
Figure 2. Diagram of an SDS-PAGE gel of purified GS1 proteins. The
purified GS1 protein from 3 bacterial strains – wild type, strain A and strain
B are shown.
2
i)
Using figure 2, determine the approximate sizes (in kDa) of the GS1 proteins isolated from strains
A and B and indicate whether they are larger or smaller than the GS1 from the wild type strain.
(NB. It is not possible to accurately determine the sizes of the proteins on the gel, a range of sizes
is all that is required for your answer). (2 marks)
4. The scientist determines the nucleotide sequence of the coding strand of the GS1 gene from strain A. It
is identical to the GS1 sequence from the wild type gene except for a single change occurring
approximately ⅓ of the way into the GS1 open reading frame. A small region of the GS1 sequence
(including the site where the mutation occurs) from the wild type and mutant strains is shown below.
Wild type TGTCCTCGGCCACAAGTTCTCTATC
Strain A
TGTCCTCGGCCACTAGTTCTCTATC
i)
How has this mutation produced the inactive GS1 protein in strain A? (2 marks)
ii) Using the genetic code, deduce the amino acid sequence of the wild type GS1 protein
corresponding to the short piece of DNA shown above. (3 marks)
5. Sequencing of the GS1 gene from strain B shows that it is identical to the wild type gene except for a
single alteration (the replacement of one nucleotide by another). How might this account for the features
of the GS1 protein produced by strain B? (3 marks)
3
Bioinformatics and molecular biology tools
A. Analysis of a eukaryotic gene
Most eukaryotic genes are composed of both exons (regions represented in the mRNA derived from the
gene) and introns (regions that are transcribed initially but are absent from the mRNA). Furthermore, not all
the nucleotides in an mRNA are translated into protein. Thus, gene sequences need careful analysis and
annotation to identify their functional regions. In this part of the assignment, DNA databases and other online
resources will be used to find the identity of a human gene (gene A) and determine its intron-exon structure
and the properties of the protein it encodes.
An example of a human gene (Human metallothionein 2A gene) showing the entire region that is transcribed
into RNA and the protein that results from translation of the exon sequences is shown in figure 3.
Figure 3. The Human metallothionein 2A gene. Exon sequences are in upper
case, intron sequences in lower case. The predicted amino acid sequence is
shown above the corresponding nucleotide sequence, * represents a
termination codon.
4
6. Below is the entire transcribed region of the chromosomal DNA of gene A (only the coding strand is
shown).
>gene_A_chromosomal_sequence
ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGAGGAGAAGTCTGCCGT
TACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCAAGGTTACAAGACAGGTTT
AAGGAGACCAATAGAAACTGGGCATGTGGAGACAGAGAAGACTCTTGGGTTTCTGATAGGCACTGACTCTCTCTGCCTATTGGTC
TATTTTCCCACCCTTAGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG
CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGC