genetica:bioinf_process:fastqc
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
genetica:bioinf_process:fastqc [2015/03/12 11:51] – vifehe | genetica:bioinf_process:fastqc [2023/01/02 15:31] (current) – osotolongo | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | |||
The evaluation of data quality of the __fatsq__ files is the first one of several Quality Control **(QC)** steps in the analysis of Next Generation Sequnecing (NGS) data. For such purpose we will use the software [[http:// | The evaluation of data quality of the __fatsq__ files is the first one of several Quality Control **(QC)** steps in the analysis of Next Generation Sequnecing (NGS) data. For such purpose we will use the software [[http:// | ||
Line 21: | Line 20: | ||
- | __Example | + | ==Example |
- | Bonn's fastq files are stored at directory: Bonn_0_fastq, | + | Bonn's fastq files are stored at directory: Bonn_0_fastq, |
- | '' | + | < |
+ | [vifehe@detritus bonn_data]$ ls Bonn_0_fastq/ | ||
P1_001-040 | P1_001-040 | ||
P2_001-040 | P2_001-040 | ||
P3_001-040 | P3_001-040 | ||
- | P4_001-040 | + | P4_001-040 |
+ | </ | ||
# we create the directory where we will save FastQC output: | # we create the directory where we will save FastQC output: | ||
Line 36: | Line 37: | ||
# and we further create directories for each of the plates | # and we further create directories for each of the plates | ||
+ | < | ||
+ | [vifehe@detritus Bonn_0_fastq]$ cd fastqcRawdata | ||
+ | [vifehe@detritus fastqcRawdata]$ touch fastqcRawdata_P1 fastqcRawdata_P2 fastqcRawdata_P3 fastqcRawdata_P4 | ||
- | '' | + | </ |
- | '' | + | # to run fastqc on a single file, return to folder where we have our vcf files |
- | '' | + | < |
- | '' | + | [vifehe@detritus fastqcRawdata]$ |
+ | [vifehe@detritus | ||
+ | #to see just the first two files | ||
+ | [vifehe@detritus | ||
+ | SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | SN7640211_14074_P1A01_MND1014_2_sequence.fq.gz | ||
+ | [vifehe@detritus | ||
+ | Started analysis of SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 5% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 10% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 15% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 20% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 25% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 30% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 35% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 40% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 45% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 50% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 55% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 60% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 65% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 70% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 75% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 80% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 85% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 90% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 95% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Approx 100% complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | Analysis complete for SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz # finished at 14:09 | ||
+ | # the process per sample takes about 12 minutes | ||
+ | #this can be run in a loop | ||
+ | [vifehe@detritus P1_001-040]$ for x in P1_001-040/ | ||
+ | # to examine the output | ||
+ | [vifehe@detritus P1_001-040]$ cd ../ | ||
+ | #the program has created a folder named like the sequence and another compressed folder | ||
+ | [vifehe@detritus fastqcRawdata_P1]$ ls | head -n2 | ||
+ | SN7640211_14074_P1A01_MND1014_1_sequence.fq_fastqc | ||
+ | SN7640211_14074_P1A01_MND1014_1_sequence.fq_fastqc.zip | ||
+ | #list the contents of the folder created | ||
+ | [vifehe@detritus fastqcRawdata_P1]$ cd SN7640211_14074_P1A01_MND1014_1_sequence.fq_fastqc | ||
+ | [vifehe@detritus SN7640211_14074_P1A01_MND1014_1_sequence.fq_fastqc]$ ls | ||
+ | fastqc_data.txt | ||
- | [vifehe@detritus | + | #examine Summary.txt output |
+ | [vifehe@detritus | ||
PASS Basic Statistics SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | PASS Basic Statistics SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
PASS Per base sequence quality SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | PASS Per base sequence quality SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
Line 57: | Line 103: | ||
PASS Kmer Content SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | PASS Kmer Content SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | # examine fastqc_data.txt output | ||
+ | [vifehe@detritus SN7640211_14074_P1A01_MND1014_1_sequence.fq_fastqc]$ more -n20 fastqc_data.txt | ||
+ | ## | ||
+ | >> | ||
+ | # | ||
+ | Filename SN7640211_14074_P1A01_MND1014_1_sequence.fq.gz | ||
+ | File type Conventional base calls | ||
+ | Encoding Sanger / Illumina 1.9 | ||
+ | Total Sequences 44012752 | ||
+ | Filtered Sequences 0 | ||
+ | Sequence length 101 | ||
+ | %GC 49 | ||
+ | >> | ||
+ | >>Per base sequence quality pass | ||
+ | # | ||
+ | 1 31.64506284451379 33.0 31.0 34.0 28.0 34.0 | ||
+ | 2 31.880190722906853 34.0 31.0 34.0 28.0 34.0 | ||
+ | 3 31.972653289210363 34.0 31.0 34.0 28.0 34.0 | ||
+ | 4 35.39369340049448 37.0 35.0 37.0 32.0 37.0 | ||
+ | 5 35.09201710449735 37.0 35.0 37.0 32.0 37.0 | ||
+ | 6 35.08697933726116 37.0 35.0 37.0 32.0 37.0 | ||
+ | 7 35.06162818448617 37.0 35.0 37.0 32.0 37.0 | ||
+ | .... | ||
+ | .... | ||
+ | .... | ||
+ | >> | ||
+ | #Total Duplicate Percentage 33.90859348891959 | ||
+ | # | ||
+ | 1 100.0 | ||
+ | 2 29.769972680482393 | ||
+ | 3 10.634235430848415 | ||
+ | 4 4.525832792477321 | ||
+ | 5 1.99009856157457 | ||
+ | 6 1.1335335758380753 | ||
+ | 7 0.6905347682369101 | ||
+ | 8 0.447526904442063 | ||
+ | 9 0.296590343078804 | ||
+ | 10++ 1.4482363062804704 | ||
+ | >> | ||
+ | >> | ||
+ | >> | ||
+ | >> | ||
+ | >> | ||
+ | |||
+ | |||
+ | # Explore html file | ||
+ | [vifehe@detritus SN7640211_14074_P1A01_MND1014_1_sequence.fq_fastqc]$ firefox fastqc_report.html | ||
+ | # this opens the file in firefox in which the following pictures can be seen | ||
+ | |||
+ | </ | ||
+ | |||
+ | {{: | ||
+ | {{: | ||
+ | {{: | ||
+ | {{: | ||
+ | |||
+ | |||
+ | ---- | ||
+ | Because examining each file is time consuming, I've created a couple of scripts with which we can extract the information of our interest: | ||
+ | [[genetica: | ||
+ | [[genetica: |
genetica/bioinf_process/fastqc.1426161080.txt.gz · Last modified: 2020/08/04 10:48 (external edit)