Example 2: Generic Primer Design

Because the exact path of where all the files are will depend on where you installed the JCVI primer design software, let us use a placeholder/variable for the directory of your installation for the sake of convenience. So for example, if you installed primer designer in “/usr/local/primer_design”, then the placeholder ${INSTALL_DIR}, should be replaced with “/usr/local”. In other words, if we refer to:

${INSTALL_DIR}/primer_design/examples

You should actually be looking in the directory:

/usr/local/primer_design/examples

Let us begin. In this example, let us assume you have selected a set of genes that you would like to design primers for. We have included a target specification file in:

${INSTALL_DIR}/primer_design/examples/eraf_sox4/target_regions_specification.txt

To confirm you have the input configuration file in the right place, try to find the file by performing the following command:

ls ${INSTALL_DIR}/primer_design/examples/eraf_sox4/target_regions_specification.txt

If the file exists, then please continue to step 1. If not, confirm that you have had a successful installation.

If you are interested in how to modify the target_regions_specification.txt file, please see the documentation in:

${INSTALL_DIR}/primer_design/TemplateExtractor/Manuals/TemplateTargetSpecification.doc

STEP 0: Setup the Reference

We label this step 0, because in most cases it will only need to be done once. The goal of this step is to create a reference genome so that primer design can look for alternative products across the entire genome. Since the example we are working with is to target two human genes, the reference genome that we will work with is the human reference. You can download a copy of the reference from ourceforge. This genome has already been prepared in the correct format. If you would like to use an alternative organism, or use a more recent version of the human reference, you will need to follow the tutorial on how to prepare a reference genome with the scripts included with primer design. But before you endeavor on this path (of extra work), let us make sure you are able to get what you want just based on our example.

Step 0a: Acquiring a preformatted reference

In your ${INSTALL_DIR} directory, go ahead and make a directory named “pd_references” and then

Download the prepared human reference sequence from Sourceforge. From UNIX, you can use the wget command without having to go through a browser.

wget http://sourceforge.net/projects/primerdesigner/files/ReferenceGenomes/Homo%20Sapiens%20NCBI36/homo_sapiens.tar.gz/download

When you see the homo_sapiens.tar.gz file fully downloaded, go ahead and untar the package:

tar -xvf homo_sapiens.tar.gz

This will extract the 24 (1-22 + X and Y) chromosomes and their associated formatdb’d results. A library of human repeats will also be extracted. You will see two directories when after you untar the packages:

${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Reference

and:

${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Repeats

Step 0b: Editing the reference_data.ini config file

In order for primer design to know where your reference genome has been downloaded, you need to modify the reference_data.ini file, which is located at:

${INSTALL_DIR}/primer_design/config/reference_data.ini

There are two lines that need to be updated, one line for the reference path, and another line for the repeat library path. First change the variable primer_critique.subjectPath and primer_critique.referenceIndexPath so that it looks like:

primer_critique.subjectPath=${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Reference

primer_critique.referenceIndexPath=${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Reference/RefGenomeIndex

Next modify the variable primer_critique.repeatPath path information:

primer_critique.repeatPath=${INSTALL_DIR}/pd_references/homo_sapiens/homo_sapiens_Repeats

If the file name extension of your references stays *.fa and the file name extension of your repeats stays as *.fasta, then you can leave the primer_critique.subjectPattern and primer_critique.repeatPattern the same.

STEP 1: Setup a Custom Primer Design Run

The purpose of this step is to inform the primer designer where you want the output to go and which target specification file you would like to use.

If the ${INSTALL_DIR}/primer_design is not in your PATH environmental variable, then you will need to explicitly specify the entire path for the commands necessary for the next several steps. Remember that in UNIX, the entire length of the command needs to be placed on the same line unless the backslash (\) is placed before line breaks. Go ahead and perform the first step, which is the setup:

${INSTALL_DIR}/primer_design/Start_Generic_Primer_Design.pl -d ${INSTALL_DIR}/primer_design/examples/eraf_sox -p trial1 -t ${INSTALL_DIR}/primer_design/examples/eraf_sox4/target_regions_specification.txt -J

In this command, the –d parameter refers to where to create all the files for primer design. The directory should already be created. The –p parameter tells primer design to create a new primer design run with the specified name. The –t tells primer design which target region specification file to use. In this case, we have created a target specification file for you and in step 0 we have told you how to set up your human reference to match it. The -J option tells the setup script to not perform tasks that are specific to JCVI, such as producing files for our LIMS and downstream trace processing and variant classifier pipeline.

When you run the command, you will see a flurry of commands/actions being reported to the screen. The script will pause, and you will be asked if you want to modify the configuration file for primer pairs designed for the “strict” primer pairs and then ask you if you want to launch the jobs on your local computer. Then the script will ask whether you want to modify the configuration file for the “HighGC” primer pairs and prompt you for permission to launch locally or on a compute grid. In both cases, the default configuration will suffice, and so you should just accept them by answering “Y” and then hitting the return button. When prompted for where to execute, go ahead and launch the jobs locally.

The following is an example of when the setup script prompts you to modify the configuration file:

Please modify the following configuration file, if necessary.

${INSTALL_DIR}/primer_design/examples/eraf_sox4/trial1/primer_design/20101118/strict/global_strict.config

When finished, enter 'Y' to continue or 'N' to halt processing

The following is an example of when the setup script prompt you to submit the jobs locally:

Submitting strict jobs locally. Enter 'Y' to continue or 'N' to halt processing

When you have answered “Y” to all 4 prompts, the jobs will start running locally.

STEP 2: Monitor your Run

Sometimes it may take a while to design primers, so it is useful to monitor your run. To monitor your run, use the following command:

${INSTALL_DIR}/primer_design/Monitor_Primer_Design_Run.pl -d ${INSTALL_DIR}/primer_design/examples/eraf_sox4/ -p trial1 -t 20101118 -r

The –d and –p option should refer to the same path and run name you specified in step 1. The –t should be the date stamp corresponding to when you launched the run. So for example, in the above example, 20101118, refers to November 18, 2010. The –r option tells the monitor script to no prompt to launch failed jobs.

An example of the output from monitoring the run could be:

Job: highGC11 ERAF/00 running.

Job: strict ERAF/00 completed.

Job: highGC11 SOX4/00 completed.

Job: highGC11 SOX4/01 running.

Job: highGC11 SOX4/02 running.

When all the jobs have been completed you should see the following:

Job: highGC11 ERAF/00 completed.

Job: strict ERAF/00 completed.

Job: highGC11 SOX4/00 completed.

Job: highGC11 SOX4/01 completed.

Job: highGC11 SOX4/02 completed.

SUCCESS!! All primer design jobs are complete.

Please run the Aggregate_Primer_Design_Results.pl script to generate the results and reports files.

Depending on the number of core/processors you have on your system, you may find that highGC11 SOX4/01 and SOX4/02 will take a few hours to complete, and highGC11 ERAF/00 will take the longest. The more difficult it is to find successful primers, the longer the process will take. Difficult to design regions may include those with a great deal of repetitive sequence or secondary structure. Please be patient!

STEP 3: Aggregate your Run

When all jobs have completed you will want to aggregate all the separated computes into a single result. Execute the following command:

${INSTALL_DIR}/primer_design/Aggregate_Primer_Design_Results.pl -d ${INSTALL_DIR}/primer_design/examples/eraf_sox4/ -p trial1 -t 20101118 -g

The -d, -p and -t are the same as in step 2. The -g option should be specified to avoid assigning UIDs to the design amplicons.

After the script has completed, you can find the aggregated results in:

${INSTALL_DIR}/primer_design/examples/eraf_sox4/trial1/reports

You can see the results in the *critique_report.csv , .pdf, and.ps files. Please see the PCR_Primer_Design-README.doc in ${INSTALL_DIR}/primer_design for additional documentation.

Done.