Lineage annotation
The lineage
subcommand allows you to add full lineage information to Bracken
quantifications or mapping analyses.
Lineage information can be addded to the following file types for now:
- Bracken output, for instance
my_sample.b2
- Bracken outputs merged by
architeuthis
, for instancemerged.csv
- Mapping analyses generated by
architeuthis
, for instancemappings.csv
Info
architeuthis
will automatically recognize and validate the file type
and tell you if your file is not supported.
Usage
architeuthis lineage my_file.b2 -o my_file_lineage.csv
Here the -o
or --out
option specify the path of the output file. The output
will be in CSV
format.
Add lineage or merge first?
If you want a merged BRACKEN output with lineage it is more efficient to
merge first and then assign the lineage information, because architeuthis
used a taxonomy hash for faster assignment.
Specifying the NCBI Taxonomy dump
You can use any downloaded NCBI Taxonomy dump
for lineage annotation by specifying the --data-dir
option, for instance:
architeuthis lineage --data-dir /my/taxdump/ my_file.b2 -o my_file_lineage.csv
Specifying the lineage format
You can specify the lineage format using the taxonkit syntax.
The defualt lineage format is {k};{p};{c};{o};{f};{g};{s}
which are the canonical ranks down
to species level. However, you could change this. For instance, to only keep genus and species:
architeuthis lineage --format "{g};{s}" my_file.b2 -o my_file_lineage.csv
Why does lineage not separate ranks into its own CSV columns?
This is to maintain flexibility for many supported organisms as some lack specific canonical ranks. For instance, many eukaryotes do not have a phylum. This strategy is also similar to what Qiime2 does.