Outputs

Contents

Outputs#

Tabular#

The main output of the assembly typing mode is a tab-delimited table of the results with the following columns:

Column name

Description

Assembly

The name of the input assembly, taken from the assembly filename.

Best match locus

The locus type which most closely matches the assembly.

Best match type

The predicted serotype/phenotype of the assembly.

Match confidence

A categorical measure of locus call quality (see confidence score).

Problems

Characters indicating issues with the locus match (see problems).

Identity

Weighted percent identity of the best matching locus to the assembly.

Coverage

Weighted percent coverage of the best matching locus in the assembly.

Length discrepancy

If the locus was found in a single piece, this is the difference between the locus length and the assembly length.

Expected genes in locus

A fraction indicating how many of the genes in the best matching locus were found in the locus part of the assembly.

Expected genes in locus, details

Gene names for the expected genes found in the locus part of the assembly.

Missing expected genes

A string listing the gene names of expected genes that were not found.

Other genes in locus

The number of unexpected genes (genes from loci other than the best match) which were found in the locus part of the assembly.

Other genes in locus, details

Gene names for the other genes found in the locus part of the assembly.

Expected genes outside locus

A fraction indicating how many of the expected genes which were found in the assembly but not in the locus part of the assembly (usually zero)

Expected genes outside locus, details

Gene names for the expected genes found outside the locus part of the assembly.

Other genes outside locus

The number of unexpected genes (genes from loci other than the best match) which were found outside the locus part of the assembly.

Other genes outside locus, details

Gene names for the other genes found outside the locus part of the assembly.

Truncated genes, details

Gene names for the truncated genes found in the assembly.

Extra genes, details

Gene names for the extra genes found in the assembly.

Note

Numbers beside gene names indicate the percent identity and percent coverage of the gene in the assembly.

Note

You may sometimes see two copies of the same gene in the ``Expected genes in locus, details `` column. These represent (likely) parts of the same gene which have usually been split over contigs. In Kaptive v3.0.0 onwards, we adopted this behaviour to allow users to see where locus splitting as occurred, and determine the total percent identity of a gene that has been split.

The default is to print this table to stdout. You can use UNIX redirection operators (> or >>) or the -o/--out flag to write to a file.

If the summary table already exists and is not empty, Kaptive will append to it (not overwrite it) and suppress the header line. This allows you to run Kaptive in succession on sets of assemblies, all outputting to the same table file.

To disable the tabular output, simply redirect the output to /dev/null.

Fasta#

The -f/--fasta flag produces a fasta file of the region(s) of the assembly which correspond to the best locus match. This may be a single piece (in cases of a good assembly and a strong match) or it may be in multiple pieces (in cases of poor assembly and/or a novel locus).

You can specify either a directory, which will write one file per assembly named as {assembly}_kaptive_results.fna, or a single file (“-” for stdout), which will write all the sequences to that file.

For example:

kaptive assembly kpsc_k assembly.fasta -f

This result in default behaviour which will produce one file per assembly in the current directory. However, to specify a directory:

kaptive assembly kpsc_k assembly.fasta -f kaptive_results/

or for a single file, both are valid:

kaptive assembly kpsc_k assembly.fasta -f kaptive_results.fna
kaptive assembly kpsc_k assembly.fasta -f - > kaptive_results.fna

Note

This is the same as the --fna flag in kaptive convert.

JSON#

The -j/--json flag produces a JSON file of the results which allows Kaptive to reconstruct the TypingResult objects after a run which can be used with kaptive convert. Unlike previous version (2 and below), this is a JSON lines file (or “-” for stdout), where each line is a JSON object representing the results for a single assembly. If the file already exists, Kaptive will append to it (not overwrite it).

The default is to write this file to: kaptive_results.json, however the path can be specified after the flag, for example:

kaptive assembly kpsc_k assembly.fasta -j kaptive_results.json

Warning

It is possible to write all text formats (TSV, JSON and FASTA) to the same file (including stdout), however this is not recommended for downstream analysis.

Plot#

Kaptive can now produce a visual representation of the locus match in the assembly. This is done using the -p/--plot flag, which produces a plot in the format specified by the --plot-fmt flag (default: png).

The default is to write files to the current directory with the name: {assembly}_kaptive_results.{fmt}, however the output directory can be specified after the flag, for example:

kaptive assembly kpsc_k assembly.fasta -p kaptive_plots
_images/example_plot.png

The plot is a visual representation of the locus genes and the corresponding assembly contig pieces.

  • The gene transparencies are based on the protein percent identity.

  • The contig transparencies are based on the final percent identity.

  • Unexpected genes are shown in orange.

  • Genes are blunt and outlined in yellow if they are truncated.

  • Genes are outlined in red if they are below the gene identity threshold.