NGS alignment efficiency is highly affected by adaptor / PCR primer contamination

In one of my recent projects, I’ve been analysing RNA-Seq libraries with high variability of alignment efficiency.

I decided to spend some time and try to find the reason why there is such high variability in reads that can be mapped. I have ran FastQC on all .fastq.gz files. I have looked at overall library quality and later focused on some specific measures.

  1. I haven’t found any clear association between number of warns/fails and alignment efficiency.
  2. Similarly, there is no association between alignment efficiency and any group of quality measures.
  3. But, libraries with the highest fraction of uniquely aligned reads tend to pass `Overrepresented sequences` filter.
  4. Finally, I’ve realised that alignment efficiency anti-correlates with adapter / PCR primer contamination levels
  5. Below, you can find some BASH code I’ve used.
    [bash]
    # run fastqc using 4 threads
    mkdir fastqc
    fastqc -t 4 -i *.fq.gz -o fastqc

    # get fraction of reads affected by all over-represented sequences
    for f in fastqc/*.fq_fastqc/fastqc_data.txt; do
    echo $f `grep -A100 ">>Overrepresented sequences" $f |
    grep -m1 -B100 ">>END_MODULE" | awk ‘{sum+=$3} END {print sum}’`;
    done

    # get fraction of reads affected by Adapter or PCR primers
    for f in fastqc/*.fq_fastqc/fastqc_data.txt; do
    echo $f `grep -A100 ">>Overrepresented sequences" $f |
    grep -m1 -B100 ">>END_MODULE" |
    grep -P "Adapter|PCR" | awk ‘BEGIN {sum=0} {sum+=$3} END {print sum}’`;
    done
    [/bash]

    Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s