FASTQ files are the basic unit to start any bioinformatic process. These derive from the traditional fasta files that store genetic information. The difference resides in that FASTQ also contain quality information of the data obtained in ASCII code. The standard and commonest format is that produced by Illumina but bear in mind that different platforms (Sanger, Solexa) output slightly different files.
Below is an example of one of our FASTQ files. It contains information on all reads from a single run, and for each read there are 4 lines of information:
!
represents the lowest quality while ~
is the highest.
sed -n 21,24p SN4570283_14827_P3F08_5220_1_sequence.fq
@D6L3XBQ1:283:C3UTFACXX:5:1101:1468:1865 1:N:0:GGCTAC NTCTCACCTGAATGCCCCAACAGCTCTCTCTTAAACCTTCACCTACACGCCCTGCAGCCAGAAGACTCAGCCCTGTATCTCTGCGCCAGCAGCCAAGACAC + #1=DDFFFHHHHHJJJJJJJJJGIJIJJJJJJJGJIJJJJIJJJGJJJJIJJHIJJHHIIIIHHHHHFFFFFDCECCDDFDEDDDDDDDDD<BDBDDDDDD
$ sed -n 21,24p SN4570283_14827_P3F08_5220_2_sequence.fq
@D6L3XBQ1:283:C3UTFACXX:5:1101:1468:1865 2:N:0:GGCTAC GGGGCTCTTGGAGGAAATGTTCACCCGAGCCCTCCGTGGCCCCCACGGCTTCCTGGCAGGCCCCGAAGGTTTCTGCACAGGAAAGCGGTGACTCTGCAAGG + CCCFFFFFGHHGHJHIJJIIIIJJJGIIIJJJJJJIJJJJJJJIJJGIHFFFFEEDEEDDDDDDB?@BD9>CDDCDCDDDD?DC<CBD<B@CDCCCC@CDD
ASCII codes translates into values from 33 to 126 which derives into Phred Scores (the standard Sanger variant to assess reliability of a base cal) from 0 to 93. However, not all platforms use all ASCII symbols:
SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS..................................................... ..........................XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX...................... ...............................IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII...................... .................................JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ...................... LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL.................................................... ''!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ | | | | | | 33 59 64 73 104 126'' 0........................26...31.......40 -5....0........9.............................40 0........9.............................40 3.....9.............................40 0.2......................26...31........41
Wiki has a detailed explanation of FASTQ build, Phred quality scores, and softwares to deal with them.
For raw reads, the range of scores will depend on the technology and the base caller used, but will typically be up to 41 for recent Illumina chemistry. On average one expects to have above 30 to consider reads to have good quality, which can be assessed with FastQC
Bear in mind that other platforms like Roche, do not directly produce FASTQ files, bus SFF files, which in addition from sequence and quality information, also store signal strengths. There are softwares designed to deal with Roche's SFF files. But one can also convert it to FASTQ files using scripts provided by Roche (sff.extract) or other softwares like seq_crumbs created by users. There are useful discussions about this topic at SeqAnswers I, SeqAnswers II, and Biostars