Example 2: Generic
Primer Design
Because the exact path of where all the files are will depend on where you installed the JCVI primer design software, let us use a placeholder/variable for the directory of your installation for the sake of convenience. So for example, if you installed primer designer in “/usr/local/primer_design”, then the placeholder ${INSTALL_DIR}, should be replaced with “/usr/local”. In other words, if we refer to:
${INSTALL_DIR}/primer_design/examples
You should actually be looking in the directory:
/usr/local/primer_design/examples
Let us begin. In this example, let us assume you have selected a set of genes that you would like to design primers for. We have included a target specification file in:
${INSTALL_DIR}/primer_design/examples/eraf_sox4/target_regions_specification.txt
To confirm you have the input configuration file in the right place, try to find the file by performing the following command:
ls ${INSTALL_DIR}/primer_design/examples/eraf_sox4/target_regions_specification.txt
If the file
exists, then please continue to step 1.
If not, confirm that you have had a successful installation.
If you are
interested in how to modify the target_regions_specification.txt file, please
see the documentation in:
${INSTALL_DIR}/primer_design/TemplateExtractor/Manuals/TemplateTargetSpecification.doc
STEP
0: Setup the Reference
We label
this step 0, because in most cases it will only need to be done once. The goal of this step is to create a
reference genome so that primer design can look for alternative products across
the entire genome. Since the example we
are working with is to target two human genes, the reference genome that we
will work with is the human reference.
You can download a copy of the reference from ourceforge. This genome has already been prepared in the
correct format. If you would like to use
an alternative organism, or use a more recent version of the human reference,
you will need to follow the tutorial on how to prepare a reference genome with
the scripts included with primer design.
But before you endeavor on this path (of extra work), let us make sure
you are able to get what you want just based on our example.
Step 0a: Acquiring a preformatted
reference
In your ${INSTALL_DIR} directory, go ahead and
make a directory named “pd_references” and then
Download
the prepared human reference sequence from Sourceforge. From UNIX, you can use the wget command without having to go through a browser.
When you
see the homo_sapiens.tar.gz file fully downloaded, go
ahead and untar the package:
tar -xvf homo_sapiens.tar.gz
This will
extract the 24 (1-22 + X and Y) chromosomes and their associated formatdb’d results.
A library of human repeats will also be extracted. You will see two directories when after you untar the packages:
${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Reference
and:
${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Repeats
Step 0b: Editing the
reference_data.ini config file
In order for primer design to know where your reference genome has been downloaded, you need to modify the reference_data.ini file, which is located at:
${INSTALL_DIR}/primer_design/config/reference_data.ini
There are two lines that need to be updated, one line for the reference path, and another line for the repeat library path. First change the variable primer_critique.subjectPath and primer_critique.referenceIndexPath so that it looks like:
primer_critique.subjectPath=${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Reference
primer_critique.referenceIndexPath=${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Reference/RefGenomeIndex
Next modify the variable primer_critique.repeatPath path information:
primer_critique.repeatPath=${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Repeats
If the file
name extension of your references stays *.fa and the
file name extension of your repeats stays as *.fasta,
then you can leave the primer_critique.subjectPattern and primer_critique.repeatPattern the same.
STEP
1: Setup a Custom Primer Design Run
The purpose
of this step is to inform the primer designer where you want the output to go
and which target specification file you would like to
use.
If the ${INSTALL_DIR}/primer_design is not in your PATH environmental
variable, then you will need to explicitly specify the entire path for the
commands necessary for the next several steps.
Remember that in UNIX, the entire length of the command needs to be
placed on the same line unless the backslash (\) is placed before line
breaks. Go ahead and perform the first
step, which is the setup:
${INSTALL_DIR}/primer_design/Start_Generic_Primer_Design.pl
-d ${INSTALL_DIR}/primer_design/examples/eraf_sox
-p trial1 -t
${INSTALL_DIR}/primer_design/examples/eraf_sox4/target_regions_specification.txt
-J
In this command, the –d parameter refers to where to create all the files for primer design. The directory should already be created. The –p parameter tells primer design to create a new primer design run with the specified name. The –t tells primer design which target region specification file to use. In this case, we have created a target specification file for you and in step 0 we have told you how to set up your human reference to match it. The -J option tells the setup script to not perform tasks that are specific to JCVI, such as producing files for our LIMS and downstream trace processing and variant classifier pipeline.
When you run the command, you will see a flurry of commands/actions being reported to the screen. The script will pause, and you will be asked if you want to modify the configuration file for primer pairs designed for the “strict” primer pairs and then ask you if you want to launch the jobs on your local computer. Then the script will ask whether you want to modify the configuration file for the “HighGC” primer pairs and prompt you for permission to launch locally or on a compute grid. In both cases, the default configuration will suffice, and so you should just accept them by answering “Y” and then hitting the return button. When prompted for where to execute, go ahead and launch the jobs locally.
The following is an example of when the setup script prompts you to modify the configuration file:
Please
modify the following configuration file, if necessary.
${INSTALL_DIR}/primer_design/examples/eraf_sox4/trial1/primer_design/20101118/strict/global_strict.config
When
finished, enter 'Y' to continue or 'N' to halt processing
The following is an example of when the setup script prompt you to submit the jobs locally:
Submitting strict jobs
locally. Enter 'Y' to continue or 'N' to halt
processing
When you have answered “Y” to all 4 prompts, the jobs will start running locally.
STEP
2: Monitor your Run
Sometimes it may take a while to design primers, so it is useful to monitor your run. To monitor your run, use the following command:
${INSTALL_DIR}/primer_design/Monitor_Primer_Design_Run.pl
-d ${INSTALL_DIR}/primer_design/examples/eraf_sox4/ -p trial1 -t 20101118 -r
The –d and –p option should refer to the same path and run name you specified in step 1. The –t should be the date stamp corresponding to when you launched the run. So for example, in the above example, 20101118, refers to November 18, 2010. The –r option tells the monitor script to no prompt to launch failed jobs.
An example of the output from monitoring the run could be:
Job:
highGC11 ERAF/00 running.
Job:
strict ERAF/00 completed.
Job:
highGC11 SOX4/00 completed.
Job:
highGC11 SOX4/01 running.
Job:
highGC11 SOX4/02 running.
When all the jobs have been completed you should see the following:
Job:
highGC11 ERAF/00 completed.
Job:
strict ERAF/00 completed.
Job:
highGC11 SOX4/00 completed.
Job:
highGC11 SOX4/01 completed.
Job:
highGC11 SOX4/02 completed.
SUCCESS!! All primer design jobs are complete.
Please
run the Aggregate_Primer_Design_Results.pl script to generate the results and
reports files.
Depending
on the number of core/processors you have on your system, you may find that
highGC11 SOX4/01 and SOX4/02 will take a few hours to complete, and highGC11
ERAF/00 will take the longest. The more
difficult it is to find successful primers, the longer the process will
take. Difficult to design regions may
include those with a great deal of repetitive sequence or secondary
structure. Please be patient!
STEP
3: Aggregate your Run
When all jobs have completed you will want to aggregate all the separated computes into a single result. Execute the following command:
${INSTALL_DIR}/primer_design/Aggregate_Primer_Design_Results.pl
-d ${INSTALL_DIR}/primer_design/examples/eraf_sox4/ -p trial1 -t 20101118 -g
The -d, -p and -t are the same as in step 2. The -g option should be specified to avoid assigning UIDs to the design amplicons.
After the script has completed, you can find the aggregated results in:
${INSTALL_DIR}/primer_design/examples/eraf_sox4/trial1/reports
You can see the results in the *critique_report.csv , .pdf, and.ps files. Please see the PCR_Primer_Design-README.doc in ${INSTALL_DIR}/primer_design for additional documentation.
Done.