Skip to content

Lineage annotation

The lineage subcommand allows you to add full lineage information to Bracken quantifications or mapping analyses.

Lineage information can be addded to the following file types for now:

  1. Bracken output, for instance my_sample.b2
  2. Bracken outputs merged by architeuthis, for instance merged.csv
  3. Mapping analyses generated by architeuthis, for instance mappings.csv

Info

architeuthis will automatically recognize and validate the file type and tell you if your file is not supported.

Usage

architeuthis lineage my_file.b2 -o my_file_lineage.csv

Here the -o or --out option specify the path of the output file. The output will be in CSV format.

Add lineage or merge first?

If you want a merged BRACKEN output with lineage it is more efficient to merge first and then assign the lineage information, because architeuthis used a taxonomy hash for faster assignment.

Specifying the NCBI Taxonomy dump

You can use any downloaded NCBI Taxonomy dump for lineage annotation by specifying the --data-dir option, for instance:

architeuthis lineage --data-dir /my/taxdump/ my_file.b2 -o my_file_lineage.csv

Specifying the lineage format

You can specify the lineage format using the taxonkit syntax. The defualt lineage format is {k};{p};{c};{o};{f};{g};{s} which are the canonical ranks down to species level. However, you could change this. For instance, to only keep genus and species:

architeuthis lineage --format "{g};{s}" my_file.b2 -o my_file_lineage.csv

Why does lineage not separate ranks into its own CSV columns?

This is to maintain flexibility for many supported organisms as some lack specific canonical ranks. For instance, many eukaryotes do not have a phylum. This strategy is also similar to what Qiime2 does.