NEWS & BLOGS
MAGs Generation Workflow
Published on: 2025-03-06 | By: EDITOR
2. Taxonomic Relative Abundance Estimation via Kraken2 and Bracken
3. Enterotyping Analysis Pipeline
(1) Data Preprocessing (Trimmomatic)
Trim low-quality reads using sliding-window quality filtering (Phred≥20),and remove host-contamination using bowtie2.
java -jar ~/trimmomatic-0.39-2/trimmomatic.jar PE -threads 64 ${id}_1.fq ${id}_2.fq ${id}_forward_paired.fq.gz ${id}_forward_unpaired.fq.gz ${id}_reverse_paired.fq.gz ${id}_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:100(2) Single-sample Assembly (SPAdes)
De novo assembly with --meta mode for metagenomic data optimization.
spades.py --meta -1 ${id}_paired_1.fastq -2 ${id}_paired_2.fastq --threads 50 --memory 256 -k 33,55,77 -o ${id}_spades(3) Multi-sample Co-assembly (megahit)
Merge reads across samples to enhances recovery of low-abundance species.
megahit --kmin-1pass --k-list 21,33,55,77 --min-contig-len 500 -t 128 -m 2048000000000 -o Coassembly_home_megahit --out-prefix Coassembly -1 CF02_paired_1.fastq,CF05_paired_1.fastq... -2 CF02_paired_2.fastq,CF05_paired_2.fastq...(4) Quality Assessment (CheckM2)
checkm2 predict --threads 64 -x fasta --database_path ~/CheckM2_database/uniref100.KO.1.dmnd --remove_intermediates --input ./Genomes/ --output-directory ./Genomes_checkM2(5) Taxonomic Classification (GTDB-Tk)
gtdbtk classify_wf --genome_dir ./Genomes --out_dir ./Genomes_GTDBtk_results2 --extension fasta --cpus 64 --skip_ani_screen(6) Genome Dereplication (dRep)
dRep dereplicate --completeness 50 --contamination 10 -sa 0.95 --SkipMash -p 64 ./Pbac_v2_cluster_95 -g ./All_Genomes_MAGs/*fasta --genomeInfo Pbac_v2_Quality.csv(7) rRNA Prediction (Barrnap)
barrnap -q -k bac ${genome}.fasta(8) tRNA Annotation (tRNAscan-SE)
tRNAscan-SE -B -o ${genome}_tRNA_result.txt -m ${genome}_tRNA_statistic.txt ${genome}.fasta(9) Gene prediction (prodigal)
prodigal -m -p meta -i ${genome}.fasta -a ${genome}.protein.fa -d ${genome}.nucl.fa -f gff -o ${genome}.gff (9) Gene annotation (eggNOG-mapper)
emapper.py --cpu 64 --itype CDS -m diamond --data_dir ~/eggNOG_database -i ${genome}.nucl.fasta -o ${genome}.nucl.eggnog