Learning Objective

Upon completion of this section on fastq quality scores the learner will understand the following:

  • ASSCI character encodings are used to represent quality scores
  • These numbers are converted to values between -5 and 41 to represent quality score depending on the encoding method

This table was taken from wikipedia where more information can be found on this topic.

To determine if the score is Phred+33, Phred+64 or Solexa+64, use this one-liner (you can use zcat if the fastq file is gzipped):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
head -n 10000 input.fastq |\
  awk '{if(NR%4==0) printf("%s",$0);}' |  od -A n -t u1 | \
  awk 'BEGIN{min=100;max=0;} \
      {for(i=1;i<=NF;i++) \
          {if($i>max) max=$i; \
               if($i<min) min=$i;}}END \
          {if(max<=74 && min<59) \
                     print "Phred+33"; \
           else \
           if(max>73 && min>=64) \
                     print "Phred+64"; \
           else \
           if(min>=59 && min<64 && max>73) \
                     print "Solexa+64"; else print "Unknown score encoding!";}' --- [Table of contents](/introduction/terminology_index.html)