design (APOE (Apolipoprotein E)) gene

Description

This assignment involves you designing some PCR primers which would help you clone a cDNA of your gene of interest into the pUC18 vector. We will use some of the research you’ve started in the in-class exercise with your DNA sequence.

Don't use plagiarized sources. Get Your Custom Assignment on
design (APOE (Apolipoprotein E)) gene
From as Little as $13/Page

Once you have found the boundaries of your gene of interest (start and stop codons), you will design some PCR primers, which will bind to either side of the coding region – ie. they will bind to the untranslated regions of your cDNA sequence.

Go to the GenBank entry for your gene and look for a link that says “Pick Primers” You want primers that will allow you to amplify the whole coding region.

 Use the information in the database entry to help you figure out:

 The length of your coding region – this will be the minimum size of your product. The “maximum” length will be the full length of the mRNA/cDNA

 The beginning and end of the 5′ UTR

 The beginning and end of the 3′ UTR

These will allow you to indicate where you want the software to look for primer sequences

 Criteria for PCR primer design:

The primers should be between 16 and 21 nucleotides in length.

They should have a melting temperature between 55°C and 65°C

The primers are designed in pairs to make sure that their melting temperatures are similar (within 2 – 4 °C of each other), and that they’re are unlikely to anneal to each other.

The 3′ end of each primer should ideally end in a G or a C, and should have at least 2 of the last 3 nucleotides as a C or a G. This is called a GC-clamp and it ensures that the 3′ end binds very strongly to the template strand.

Single-nucleotide stretches should be avoided (ie. you should not have “AAAAAA” or “GGGGG” in your primer).

A useful thing to know:

Run your mRNA/cDNA sequence through WebCutter again, but this time identify any enzymes that do not cut your target DNA, and do cut the pUC18 plasmid in the MCS.

Pick two of those enzymes

Add a recognition sequence for one of the enzymes to the 5′ end of one of the primers

Add a recognition sequences for the other enzyme to the 5′ end of the other primer

Recalculate the Tm of your primer pair.

Once you have found an appropriate set of primers, you will BLAST the primer sequences against the databases to see if it’s possible for them to amplify any sequences besides the DNA sequence of interest.

This is especially important for human DNA. This way, if you manage to contaminate the sample with your own DNA, you won’t be amplifying unwanted genes.

 Once you have completed all the above steps, each group will submit a sheet which includes the following information:

1. The Name of cDNA/mRNA, or the protein it encodes.

2. The Accession number of your mRNA/cDNA sequence (from the GenBank entry)

3. The two primer sequences including restriction enzyme sites.

◦ For example: product length = 1018

Forward primer 1 ACTAGTACCAGCAAGTTGTTTTCTTGC 21 Restriction Site: SpeI

Template 61 ………………… 81

Reverse primer 1 GAATTCTTTAATCCCGAGCGACACCG 20 Restriction Site: EcoRI

Template 1048 ……………….. 1029

4. Melting temperature calculations for both primers

◦ make sure you include the restriction site nucleotides in the calculations

You should be able to easily find a melting temperature calculator online
5. Include an appendix part at the end that shows screenshots the steps that you have followed to answer questions 3 and 4.
Note: when you’re trying to show the binding of your primers to the template DNA, you need to change the font used to display DNA sequences to “Courier” or “Courier New”


Unformatted Attachment Preview

Lab 4
In this lab, we will be meeting in a computer lab and you will be performing some basic
sequence analysis on a genomic DNA sequence that will be provided for you by the TA.
The analysis you will be performing will simulate the sort of analysis you would have to
do before doing the experiments in the next few labs of this course.
Learning Objectives:






Students will:
Describe the basics of how a sequence similarity search works
Explain the difference between the GenBank and RefSEq databases
Perform a basic interpretation of a BLAST result
Identify various pieces of useful information in a GenBank Entry
Perform a virtual restriction digest
Design PCR primers using online tools
Pre-Lab Questions:
Please read the following brief page and answer the following questions:

http://www.bbc.co.uk/guides/z8yk87h

What is a database?

Why are databases useful?
Please watch the video on in the link below and answer the following questions:


Why do we normally use computers to help us design PCR primers?

What is a GC clamp?

What is a primer dimer?

Once you have designed the primers, it’s always a good idea to BLAST
them against the target organism’s genome. Why?

Why should you also BLAST your primers against the human genome?
Bioinformatics
Bioinformatics is an interdisciplinary field that combines biology, computer science,
mathematics, and statistics to acquire, store, analyze, and interpret biological data. It
plays a crucial role in understanding the structure, function, and evolution of biological
molecules, such as DNA, RNA, and proteins, as well as the biological processes and
systems they participate in. It applies a computer’s ability to find patterns in large
amounts of data to help Biologists do data analysis. Bioinformatics arose because of the
need to compare and analyze an increasing number of DNA and protein sequences.
This sort of information is very simple (sequences of A-s, T-s, C-s and G-s) but can be
very laborious to deal with by a single person directly. Thus, with a constantly increasing
amount of sequence information, scientists began generating computer programs that
could take over the very tedious job of direct sequence analysis.
Databases and Bioinformatics
Databases are central to the field of bioinformatics, as they serve as repositories for vast
amounts of biological data. These databases organize and make biological information
accessible to researchers, allowing them to retrieve, analyze, and compare data
efficiently. They can be categorized into two main types: primary databases and
secondary databases.
Primary Databases:
Primary Sequence Databases: These databases store raw biological sequence data,
including DNA, RNA, and protein sequences. They are considered primary because they
contain the original, unprocessed data. Examples of primary sequence databases
include:

GenBank: Maintained by the National Center for Biotechnology Information
(NCBI), it is one of the most comprehensive databases for DNA and RNA
sequences, including genomes from various organisms.

EMBL (European Molecular Biology Laboratory) Nucleotide Sequence Database.

DDBJ (DNA Data Bank of Japan): Stores DNA and RNA sequences primarily
from Japanese researchers.
Protein Databases: These databases store information about protein sequences and
their associated data, including functional annotations, 3D structures, and posttranslational modifications. Examples of primary protein databases include:

Swiss-Prot: Maintained by the Swiss Institute of Bioinformatics (SIB), it contains
manually curated, high-quality protein sequences with detailed annotations.

TrEMBL: Part of the UniProt database, it contains computationally generated
protein sequences with less detailed annotations.
Secondary Databases:
Secondary databases are derived from primary databases and provide curated,
processed, or specialized subsets of the data. They offer additional information or focus
on specific aspects of biological research simplifying the analysis and interpretation of
biological data. Researchers use these databases to extract knowledge and gain
insights into the functions, relationships, and evolutionary history of genes, proteins, and
other biological entities.
Here are some examples of secondary databases:

Gene Ontology (GO): A controlled vocabulary database that assigns functional
annotations to genes and proteins, categorizing them based on biological
processes, molecular functions, and cellular components.

KEGG (Kyoto Encyclopedia of Genes and Genomes): Provides information about
metabolic pathways, diseases, and functional hierarchies of genes and proteins.
It helps in understanding the biological significance of sequences.

InterPro: An integrated database of protein domains, families, and functional
sites, combining information from various sources to provide comprehensive
annotations.

ENSEMBL: A genome browser and database that provides detailed annotations
and comparative genomics information for various species.

STRING: Focuses on protein-protein interactions, providing data on known and
predicted interactions between proteins, helping researchers study cellular
processes and pathways.

HomoloGene: Identifies homologous genes across different species, facilitating
evolutionary and functional studies.

miRBase: Specializes in microRNA (miRNA) sequences and annotations,
essential for research in gene regulation and post-transcriptional control.
In summary, primary databases house raw biological sequence data, while secondary
databases provide curated, processed, and specialized subsets of data with specific
purposes. Both types of databases are essential tools in bioinformatics, enabling
researchers to access, analyze, and interpret biological information for various
applications in genetics, genomics, proteomics, and other areas of biological research.
Protein Bioinformatics
Proteins are linear polymers of amino acids, and it is the interactions between the
functional groups of these amino acids that facilitates the folding of the protein into its
final three-dimensional structure.
Bioinformatics plays a multifaceted role in the study of proteins, encompassing
sequence analysis, structure prediction, function annotation, interaction analysis, and
more. It helps researchers extract meaningful insights from the vast amount of proteinrelated data and contributes to advancements in biology, medicine, and biotechnology.
Many years of research by numerous scientists has given us the ability to predict
secondary and tertiary structural features from the amino acid sequence. We now have
numerous simple rules for protein folding that have been turned into computer
algorithms, which can be used to help us analyze our sequences. Thus, if you know the
amino acid sequence, you can predict the 3D structures.
Scientists have also been able to correlate many of the secondary structures, and
combinations of these secondary structures, with specific functions. These regions of a
protein are known as domains, and have been identified in numerous different proteins.
Based on what you’ve just read, it is clear that the identification of such functional
domains from only sequence information is possible. This is useful because a particular
functional domain (ie. DNA-binding domain) may be found in a variety of proteins. Also,
a single protein can have several different domains. Thus knowing which domains are
likely to be found in a protein encoded by a particular sequence will help us make some
predictions about the potential functions of our protein of interest.
Ultimately, we want to find out what a protein product of a gene does. So doing some
bioinformatic analysis of a particular sequence can give you a lot of information about it.
This will help you make up hypotheses about your sequence and will help to guide the
direction of your research and the types of experiments you might want to perform on
your actual proteins.
Important: Bioinformatic sequence analysis by itself is just a starting point for research.
Any predictions made through such means need to be confirmed through actual
experiments.
Sequence Analysis
There is a wide variety of bioinformatic tools. One of the most commonly used tools,
and often one of the first to be used, is BLAST (Basic Local Alignment and Search Tool).
This program takes your sequence and compares it to every other sequence present in
the world-wide databases. It scans your sequence for similarity to the billions of
sequences out there, and returns the sequences that are most similar to yours.
As it searches for similarities, it checks for conservative substitutions (ones that we know
which substitutions are least likely to affect the function of a protein domain) and ranks
the matches based on their overall similarity. A match in BLAST could help identify
possible functions of your sequence, possible members of a gene family within an
organism, or possible related genes in other species (orthologs). The BLAST algorithm
does it’s best to help scientists find matches with biological significance.
A simple online search will also uncover a wide variety of other online tools which will
allow you to do some very basic DNA sequence analysis and manipulation, perform
virtual restriction digests, find open reading frames, identify introns in genomic
sequences, translate the DNA sequence into a protein sequence, identify regulatory
elements in genes, design PCR primers, etc.
Similarly, there are numerous online software tools available for analysis of protein
sequences. With these you can perform multiple sequence alignment, predict
secondary structure, predict 3D structure, find transmembrane domains and perform
hydrophobicity plots, identify sites of post-translational modifications, etc.
All this from a simple DNA sequence!
Exercises
You will be provided with a genomic sequence from the spinach plant (Spinacia
oleracea). The spinach plant has had its genome sequenced fairly recently, and it’s
likely that the sequence you will receive has probably never been analyzed by anyone –
you are likely the first scientists to try to find out anything about it.
Below, you will find some general instructions for this exercise. Your TA will walk you
through the details of the software to use for your analysis.
1. Using BLAST to Identify a cDNA Sequence of Interest
Before the start of the lab, your TA will share with you a genomic DNA sequence from
the spinach plant.
Procedure:
Use your browser to go to the NCBI BLAST tool. You can just search for it or click
this link.
1. Paste your sequence into the BLAST text field, and perform a search to try to
identify any known genes that are similar to yours. This may give you a clue
as to the function of your gene of interest.

Note which part of the sequence has the similarity (write down where
the similarity starts and ends).

The spinach plant is not yet well studied at the molecular level, so your
sequences may not yield too many good different matches in the
databases.
2. Try to find a matching sequence that belongs to an mRNA (cDNA) and go to
that GenBank entry. Try to select the best/longest possible sequence in the
database.

Write down the Accession Number here: ______________________

How long is the DNA sequence? ____________ bp

Name of protein/enzyme:
_______________________________________
3. Look at the information in the database entry and try to find features of
interest in the DNA sequence (ie. start and stop codon). Knowing this will be
helpful in later experiments when you are trying to amplify the full coding
region of your gene.

Location of START codon: _______

Location of STOP codon:
_______
4. Try to find a GenBank entry for the genomic DNA of your gene as well. This
will give you some idea of the structure of the gene (ie. number and locations
of exons, etc)

How many exons does your gene have: _____________________

Does your sequence have more than one possible coding “variant”? __

How do the variants differ? ______________________________
________________________________________________________________
2. Using NCBI nucleotide database to find cDNA Sequence of
a Gene of Interest
Procedure:
Use your browser to go to the NCBI nucleotide database. You can just search for it
or click this link.
5. Write the name of the gene in the search field, and perform a search to try to
identify your gene of interest. This may give you a clue as to the function of
your gene of interest.

Go to the RefSeq transcripts for your gene of interest (write down the
number of variants).

Select the first variant and go to that GenBank entry. Write down the

Accession Number here: ______________________

How long is the DNA sequence? ____________ bp

Name of protein/enzyme:
_______________________________________
6. Look at the information in the database entry and try to find features of
interest in the DNA sequence (ie. start and stop codon). Knowing this will be
helpful in later experiments when you are trying to amplify the full coding
region of your gene.

Location of START codon: _______

Location of STOP codon:
_______
7. Try to find a GenBank entry for the genomic DNA of your gene as well. This
will give you some idea of the structure of the gene (ie. number and locations
of exons, etc)

How many exons does your gene have: _____________________

Does your sequence have more than one possible coding “variant”? __

How do the variants differ? ______________________________
________________________________________________________________
3. Restriction Analysis of DNA Sequence
Use your browser to find a piece of software that can perform virtual restriction digests.
A tool called “Webcutter” is a fairly good one, but there are others around as well. One
version of “Webcutter” can be found here.
1. Enter the full length of your mRNA (technically it’s a cDNA) sequence and try
to identify any enzymes which will cut the sequence before the START codon
as well as any that will cut after the STOP codon (ie. in the untranslated
regions).

In Webcutter, you can specify an ORF (Open Reading Frame) based
on the information you obtained from GenBank earlier – ie. START
and STOP codon locations
2. Once you’ve identified these restriction sites:
1. Check to make sure there aren’t any of these sites within the coding
region
2. Check the MCS of pUC18 to see if any of the same restriction enzymes
also cut the MCS.

The aim of this is to see if we will be able to use these enzymes to
clone the gene of interest into our plasmid vector.
3. Write down the names of these enzymes:
_________________________________

Ideally, they should be two different enzymes, this will allow us to use
directional cloning.

Position of restriction site for first enzyme: __________

Position of restriction site for second enzyme: ____________
4. Identification of regulatory miRNAs
Use your browser to go to the miRDB database for miRNA target prediction.
You can just search for it or click this link.
1. Enter the name of your gene and try to identify the miRNAs that
target your gene. How many miRNAs are predicted to bind to your
gene __________
2. Choose the miRNA with the highest target score and the following:

Name of the miRNA: __________

miRBase ID: __________

Where does it bind: __________

How many predicted targets are there for it __________
4. Protein Sequence Analysis
Use your browser to find the ExPASy server – it contains a large selection of tools used
for analysis of protein sequences (there are not many tool to do other things as well, but
most people still initially go there for protein sequence analysis). You can access it
directly from here.
1. Go back to your GenBank entry for your DNA sequence and find the protein
sequence. Copy that sequence somewhere to make it easily accessible.
2. Go to the ExPASy website. It may be helpful to start with the “Visual
Guidance” section of the website.
3. Use the “tag cloud” on the “Protein” page to help you find the appropriate
tools.
4. Find out the pI (isoelectric point) and molecular weight of your protein:
______________________
5. Find a tool to help you identify secondary structures of your protein
sequence.
6. Find out whether your protein has any trans-membrane domains.

Purchase answer to see full
attachment