NEWS & BLOGS

Heading

MAGs Generation Workflow

Published on: 2025-03-06 | By: EDITOR


1. MAGs Generation Workflow

2. Taxonomic Relative Abundance Estimation via Kraken2 and Bracken

3. Enterotyping Analysis Pipeline


(1) Data Preprocessing (Trimmomatic)
Trim low-quality reads using sliding-window quality filtering (Phred≥20),and remove host-contamination using bowtie2.

java -jar ~/trimmomatic-0.39-2/trimmomatic.jar PE -threads 64 ${id}_1.fq ${id}_2.fq ${id}_forward_paired.fq.gz ${id}_forward_unpaired.fq.gz ${id}_reverse_paired.fq.gz ${id}_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:20 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:100

(2)  Single-sample Assembly (SPAdes)
De novo assembly with --meta mode for metagenomic data optimization.

spades.py --meta -1 ${id}_paired_1.fastq -2 ${id}_paired_2.fastq --threads 50 --memory 256 -k 33,55,77 -o ${id}_spades

(3) Multi-sample Co-assembly (megahit)
Merge reads across samples to enhances recovery of low-abundance species.

megahit --kmin-1pass --k-list 21,33,55,77 --min-contig-len 500 -t 128 -m 2048000000000 -o Coassembly_home_megahit --out-prefix Coassembly -1 CF02_paired_1.fastq,CF05_paired_1.fastq... -2 CF02_paired_2.fastq,CF05_paired_2.fastq...

(4) Quality Assessment (CheckM2)

checkm2 predict --threads 64 -x fasta  --database_path ~/CheckM2_database/uniref100.KO.1.dmnd --remove_intermediates --input ./Genomes/ --output-directory ./Genomes_checkM2

(5) Taxonomic Classification (GTDB-Tk)

gtdbtk classify_wf --genome_dir ./Genomes --out_dir ./Genomes_GTDBtk_results2 --extension fasta --cpus 64 --skip_ani_screen

(6) Genome Dereplication (dRep)

dRep dereplicate --completeness 50 --contamination 10 -sa 0.95 --SkipMash -p 64 ./Pbac_v2_cluster_95 -g ./All_Genomes_MAGs/*fasta --genomeInfo Pbac_v2_Quality.csv

(7) rRNA Prediction (Barrnap)

barrnap -q -k bac ${genome}.fasta

(8) tRNA Annotation (tRNAscan-SE)

tRNAscan-SE -B -o ${genome}_tRNA_result.txt -m ${genome}_tRNA_statistic.txt ${genome}.fasta

(9) Gene prediction (prodigal)

 prodigal -m -p meta -i ${genome}.fasta -a ${genome}.protein.fa -d ${genome}.nucl.fa -f gff -o ${genome}.gff 

(9) Gene annotation (eggNOG-mapper) 

 emapper.py --cpu 64 --itype CDS -m diamond --data_dir ~/eggNOG_database -i ${genome}.nucl.fasta -o ${genome}.nucl.eggnog