User Tools

Site Tools


genetica:bioinf_process:fastqc

This is an old revision of the document!


The evaluation of data quality of the fatsq files is the first one of several Quality Control (QC) steps in the analysis of Next Generation Sequnecing (NGS) data. For such purpose we will use the software FastQC.

The process is quite simple:

  1. Download and install FastQC in your local server following instructions. In detritus it can be found at /opt/exoma/bin/fastqc.
  2. Create a directory where to save the outputs of FastQC, for example name it fastqcRawdata
  3. Check the quality by typing: $ fastqc -o fastqcRawdata *_1_sequence.fq.gz
    1. the file does not need to be decompressed to run FastQc
  4. This generates a folder for each file analyzed with several files:
    1. fastqc_data.txt - this contains the quality statistics in txt format.
    2. summary.txt - contains a summary of this file quality statistics in form of pass or not pass
    3. fastqc_report.html - same as before but it can be opened with $ firefox fastqc_report.html which allows viewing graphs
    4. Icons - folder with
    5. Images - folder with graphs as png

To understand the output, there is a nice explanatory video by Babraham Institute.


Example of running FastQC in one of our samples:

Bonn's fastq files are stored at directory: Bonn_0_fastq, uncer different folders according to its plate of origin, hence:

[vifehe@detritus bonn_data]$ ls Bonn_0_fastq/ P1_001-040 P1_041-080 P1_081-095 P2_001-040 P2_041-080 P2_081-095 P3_001-040 P3_041-080 P3_081-095 P4_001-040 P4_081-095 P4_041-080 P5_001-017

# we create the directory where we will save FastQC output:

[vifehe@detritus Bonn_0_fastq]$ touch fastqcRawdata

# and we further create directories for each of the plates

[vifehe@detritus fastqcRawdata]$ touch fastqcRawdata_P1 [vifehe@detritus fastqcRawdata]$ touch fastqcRawdata_P2 [vifehe@detritus fastqcRawdata]$ touch fastqcRawdata_P3 [vifehe@detritus fastqcRawdata]$ touch fastqcRawdata_P4

[vifehe@detritus bonn_data]$ cat Bonn_0_fastq/fastqcRawdata/fastqcRawdata_P1/SN7640211_14074_P1A01_MND1014_1_sequence.fq_fastqc_folder/summary.txt PASS Basic Statistics SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Per base sequence quality SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Per sequence quality scores SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Per base sequence content SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Per base GC content SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz WARN Per sequence GC content SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Per base N content SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Sequence Length Distribution SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz WARN Sequence Duplication Levels SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Overrepresented sequences SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz PASS Kmer Content SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz

genetica/bioinf_process/fastqc.1426161080.txt.gz · Last modified: 2020/08/04 10:48 (external edit)