NEWS & BLOGS

Heading

Taxonomic Relative Abundance Estimation via Kraken2 and Bracken

Published on: 2025-03-15 | By: EDITOR


1. MAGs Generation Workflow

2. Taxonomic Relative Abundance Estimation via Kraken2 and Bracken

3. Enterotyping Analysis Pipeline


1. Build a Custom Kraken2/Bracken Database using reference in FelMGDB 

(1) Generate .dmp Files for Kraken2

Use gtdb_to_taxdump to create taxonomic dump (dmp) files.  Move the generated files to the ./${db_path}/taxonomy directory, where ${db_path} is the path to your database directory.

#installation: https://github.com/nick-youngblut/gtdb_to_taxdump

 gtdb_to_taxdump.py taxonomy_file.txt > taxID_info.tsv

The taxonomy_file.txt follows the structure below:

GPISO0023    d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus_B;s__Enterococcus_B hirae
MAGPD00321    d__Bacteria;p__Bacillota;c__Bacilli;o__Lactobacillales;f__Streptococcaceae;g__Lactococcus;s__Lactococcus lactis

(2) Replace Sequence Names in Genome Files and Add IDs

  • Action: Manually extract genome names and tax IDs from .dmp file. Save this information in a file named MAG_to_taxid.txt and upload it to the genome directory.
  • Format Requirements: The file must follow the structure below:
GPISO0001    98
GPISO0003    220
GPISO0004    222
GPISO0007    297
GPISO0010    185
... ...

Run the Renaming Script in the Genome Directory:

python replace_genome_sequence_name.py -i MAG_to_taxid.txt

(3) Build Kraken2 and Bracken Databases

Incorporate your custom genome sequences into the database:

db=${db_path}
for file in ./*taxid.fasta
do
    kraken2-build --add-to-library $file --db ${db}
done

Construct the Kraken2 database using the added genomes and taxonomy:

kraken2-build --build --db ${db}

Generate the Bracken database for accurate species-level abundance estimation:

bracken-build -d ${db} -t 4 -l 150

2. Abundance Estimation with Kraken2 and Bracken

 Note: You are not required to build the Kraken2 and Bracken databases using the above workflow. You can download our pre-built databases and use them directly for your analysis. If you utilize our databases, your relative abundance results can be compared with the values from this study.

(1)Run Kraken2 for Taxonomic Classification. 

Use Kraken2 to classify metagenomic reads against the custom database.

kraken2 --paired --db /public/home/zzs000190/Giant_Panda/Kraken2_Result_50_10 --use-names --threads 8 --report-zero-counts --report ${id}DG4D_kraken2.report ${id}_nonrRNA_fwd.fq ${id}_nonrRNA_rev.fq > ${id}.out

(2)Estimate Species-Level Abundance with Bracken

Process the Kraken2 report with Bracken to refine abundance estimates at the species level.

bracken -d /public/home/zzs000190/Giant_Panda/Kraken2_Result_50_10 -i ${id}_kraken2.report -o ${id}.bracken -r 150 -l S -t 4