Usage#

Quickstart (for the impatient)#

To type polysaccharide loci from genome assemblies:

kaptive assembly <database> <path_to_assemblies> -o kaptive_results.tsv

This will run kaptive assembly with the default parameters, and produce a table detailing the best match locus, predicted phenotype, confidence score and detailed typing information for each input genome assembly in the file called kaptive_results.tsv.

Detailed usage#

We designed Kaptive 3 to be easier to use on the command-line than previous versions by structuring the program as a series of sub-commands that follow the general pattern of kaptive <mode> <database> <input>. There are three modes:

  • assembly: type assemblies

  • extract: extract features from Kaptive databases in different formats

  • convert: convert Kaptive results to different formats

Note

To see the full list of commands and options, run kaptive -h/--help.

kaptive assembly#

Given a Kaptive database and a bacterial genome assembly, kaptive assembly will perform 3 main tasks:

  • Determines the most likely locus type of the genome assembly.

  • Reconstructs the biosynthetic gene cluster from the assembly contig sequences.

  • Predicts the corresponding serotype/phenotype of the genome assembly.

Note

As of version 3, Kaptive no longer supports allelic (wzi, wzc) typing.

To perform K locus typing on a directory of Klebsiella pneumoniae assemblies, you would run:

kaptive assembly kpsc_k assemblies/*.fasta -o kaptive_results.tsv

Here we have told Kaptive to perform typing of assemblies with assembly and used the database keyword kpsc_k to specify the Klebsiella pneumoniae K locus database. All other parameters are set to the default.

Database keywords are a handy short-cut for using the databases distributed with Kaptive and located in the reference_databases directory. Alternatively, you can specify the full path to your own database.

You may also want to specify the locations and/or filenames of the output files using the following options:

Note, text outputs accept '-' for stdout

-o , --out           Output file to write/append tabular results to (default: stdout)
-f [], --fasta []    Turn on fasta output
                     Accepts a single file or a directory (default: cwd)
-j [], --json []     Turn on JSON lines output
                     Optionally choose file (can be existing) (default: kaptive_results.json)
-s [], --scores []   Dump locus score matrix to tsv (typing will not be performed!)
                     Optionally choose file (can be existing) (default: stdout)
-p [], --plot []     Plot results to "./{assembly}_kaptive_results.{fmt}"
                     Optionally choose a directory (default: cwd)
--plot-fmt png/svg   Format for locus plots (default: png)
--no-header          Suppress header line

Example:

kaptive assembly kpsc_k assemblies/*.fasta -o kaptive_results.tsv -f -j -p

This will output a tabular file called kaptive_results.tsv, a fasta file for each assembly called {assembly}_kaptive_results.fna, a JSON lines file called kaptive_results.json and a plot for each assembly called {assembly}_kaptive_results.{png,svg}.

Warning

It is possible to write all text formats (TSV, JSON and FASTA) to the same file (including stdout), however this is not recommended for downstream analysis.

Advanced options#

Advanced users may wish to customise Kaptive’s scoring options (for picking the best match locus), confidence options (for marking matches as ‘Typeable’ or ‘Untypeable’) or database parsing options. We recommend keeping the default options for standard typing using the Klebsiella and/or A. baumanii databases distributed with Kaptive.

Scoring options:

--min-cov            Minimum gene %coverage (blen/q_len*100) to be used for scoring (default: 50.0)
--score-metric       Metric for scoring each locus (default: 0)
                       0: AS (alignment score of genes found)
                       1: mlen (matching bases of genes found)
                       2: blen (aligned bases of genes found)
                       3: q_len (query length of genes found)
--weight-metric      Weighting for the 1st stage of the scoring algorithm (default: 3)
                       0: No weighting
                       1: Number of genes found
                       2: Number of genes expected
                       3: Proportion of genes found
                       4: blen (aligned bases of genes found)
                       5: q_len (query length of genes found)
--n-best             Number of best loci from the 1st round of scoring to be
                     fully aligned to the assembly (default: 2)

Confidence options:

--gene-threshold     Species-level locus gene identity threshold (default: database specific)
--max-other-genes    Typeable if <= other genes (default: 1)
--percent-expected   Typeable if >= % expected genes (default: 50)
--below-threshold    Typeable if any genes are below threshold (default: False)

See database options here and other options:

-V, --verbose         Print debug messages to stderr
-v , --version        Show version number and exit
-h , --help           Show this help message and exit
-t , --threads        Number of threads for alignment (default: maximum available CPUs / 32)

kaptive convert#

The convert command allows you to convert the Kaptive results JSON file into a range of useful formats, including:

  • tsv: Tabular output (tsv)

  • json: JSON lines format (same as input but optionally filtered)

  • fna: Locus nucleotide sequences in fasta format.

  • ffn: Gene nucleotide sequences in fasta format.

  • faa: Protein sequences in fasta format.

  • plot: Locus plots as PNG or SVG

Warning

The convert command is only compatible with JSON files from Kaptive v3.0.0 onwards.

Usage#

General usage is as follows:

kaptive convert <db> <json> [formats] [options]

Inputs:

db path/keyword       Kaptive database path or keyword
json                  Kaptive JSON lines file or - for stdin

Formats:

Note, text outputs accept '-' for stdout

-t [], --tsv []       Convert to tabular format in file (default: stdout)
-j [], --json []      Convert to JSON lines format in file (default: stdout)
--fna []              Convert to locus nucleotide sequences in fasta format
                      Accepts a single file or a directory (default: cwd)
--ffn []              Convert to locus gene nucleotide sequences in fasta format
                      Accepts a single file or a directory (default: cwd)
--faa []              Convert to locus gene protein sequences in fasta format
                      Accepts a single file or a directory (default: cwd)
-p [], --plot []      Plot results to "./{assembly}_kaptive_results.{fmt}"
                      Optionally choose a directory (default: cwd)
--plot-fmt png/svg    Format for locus plots (default: png)
--no-header           Suppress header line

Filter options:

-r , --regex          Python regular-expression to select JSON lines (default: All)
-l  [ ...], --loci  [ ...]
                    Space-separated list to filter locus names (default: All)
-s  [ ...], --samples  [ ...]
                    Space-separated list to filter sample names (default: All)

Note

Filters take precedence in descending order

For example, to convert the JSON file to a tabular format, run either of the following commands:

kaptive convert kpsc_k kaptive_results.json --tsv kaptive_results.tsv

cat *.json | kaptive convert kpsc_k - --tsv - > kaptive_results.tsv

To output multiple formats, you can run:

kaptive convert kpsc_k kaptive_results.json --tsv kaptive_results.tsv --fna - --faa proteins/

Where the tabular results will be written to kaptive_results.tsv, the locus nucleotide sequences will be written to stdout, and the protein sequences will be written to the directory proteins/ with the filenames {assembly}_kaptive_results.faa.

Warning

It is possible to write all text formats (TSV, JSON, FNA, FAA and FFN) to the same file (including stdout), however this is not recommended for downstream analysis.

API#

Whilst Kaptive isn’t designed to be a fully-fledged API, it is possible to use it as a module in your own Python scripts. For typing assemblies, you can use the kaptive.assembly.typing_pipeline function, which takes an assembly and a kaptive.database.Database object as input and returns a kaptive.typing.TypingResult object.

from kaptive.assembly import typing_pipeline
from kaptive.database import load_database
from pathlib import Path

db = load_database('kpsc_k')  # Load the Klebsiella K locus database once and pass it to the typing pipeline
for result in map(lambda a: typing_pipeline(a, db), Path('assemblies').glob('*.fna.gz')):
    if result:  # If the assembly was successfully typed
        print(result.format('tsv'), end='')  # TSV format will end in a newline, so we set end to ''

For example, if you wanted to perform K and O locus typing on a single assembly, you could do the following:

# Here, we pass the keyword arguments for the database, they will be loaded inside the typing pipeline
for result in map(lambda d: typing_pipeline('test/kpsc/2018-01-389.fasta', d), ['kpsc_k', 'kpsc_o']):
    if result:  # If the assembly was successfully typed
        print(result.format('tsv'), end='')  # TSV format will end in a newline, so we set end to ''

Note

By default the typing_pipeline runs minimap2 on a all available CPUs, however this can be controlled with the threads parameter.