Export SHM Trees

MiXCR provides functions for export SHM Trees, SHM Trees with all nodes in a tab-delimited way. Additionally, SHM trees may be export in NEWICK format.

.shmt produced by findShmTrees and holds SHM trees and clonotypes that included in trees.

SHM trees tables

mixcr exportShmTrees [-f]
    [--filter-min-nodes <n>] 
    [--filter-min-height <n>] 
    [--ids <id>[,<id>...]]...
    [--chains <chains>] 
    [--preset <preset>]
    [--preset-file <file>]
    [--no-header]
    [--not-covered-as-empty] 
    [<exportField>]...
    [--force-overwrite] 
    [--no-warnings] 
    [--verbose] 
    [--help] 
    [[--filter-in-feature <gene_feature>] [--pattern-max-errors <n>] (--filter-aa-pattern <pattern> | --filter-nt-pattern <pattern>)] 
    trees.shmt [trees.tsv]

Exports tab-delimited info from .shmt file for found SHM trees, without information about each node.

Command line options:

trees.shmt: Input file produced by 'findShmTrees' command.
[trees.tsv]: Path to output table. Print in stdout if omitted.
--filter-min-nodes <n>: Minimal number of nodes in tree
--filter-min-height <n>: Minimal height of the tree
--ids <id>[,<id>...]: Filter specific trees by id
--chains <chains>: Export only trees that contains clones with specific chain (e.g. TRA or IGH).
-p, --preset <preset>: Specify preset of export fields (possible values: 'full', 'min'; 'full' by default)
-pf, --preset-file <presetFile>: Specify preset file of export fields
--no-header: Don't print column names
--not-covered-as-empty: Export not covered regions as empty text.
-f, --force-overwrite: Force overwrite of output file(s).
-nw, --no-warnings: Suppress all warning messages.
--verbose: Verbose messages.
-h, --help: Show this help message and exit.
<exportField>: A list of export fields

Filter by pattern options:

--filter-in-feature <gene_feature>: Match pattern inside specified gene feature. Default: CDR3
--pattern-max-errors <n>: Max allowed subs & indels. Default: 0
--filter-aa-pattern <pattern>: Filter specific trees by aa pattern.
--filter-nt-pattern <pattern>: Filter specific trees by nt pattern.

SHM trees with nodes tables

mixcr exportShmTreesWithNodes [-f]
    [--only-observed] 
    [--filter-min-nodes <n>] 
    [--filter-min-height <n>] 
    [--ids <id>[,<id>...]]... 
    [--chains <chains>] 
    [--preset <preset>] 
    [--preset-file <presetFile>] 
    [--no-header]
    [--not-covered-as-empty]
    [<exportField>]...
    [--force-overwrite] 
    [--no-warnings] 
    [--verbose]  
    [--help] 
    [[--filter-in-feature <gene_feature>] [--pattern-max-errors <n>] (--filter-aa-pattern <pattern> | --filter-nt-pattern <pattern>)] 
    trees.shmt [trees.tsv]

Exports tab-delimited info from .shmt file for every node of SHM trees.

Command line options:

trees.shmt: Input file produced by 'findShmTrees' command.
[trees.tsv]: Path where to write output export table. Print in stdout if omitted.
--only-observed: Exclude nodes that was reconstructed by algorithm
--filter-min-nodes <n>: Minimal number of nodes in tree
--filter-min-height <n>: Minimal height of the tree
--ids <id>[,<id>...]: Filter specific trees by id
--chains <chains>: Export only trees that contains clones with specific chain (e.g. TRA or IGH).
-p, --preset <preset>: Specify preset of export fields (possible values: 'min', 'full'; 'full' by default)
-pf, --preset-file <presetFile>: Specify preset file of export fields
--no-header: Don't print column names
--not-covered-as-empty: Export not covered regions as empty text.
-f, --force-overwrite: Force overwrite of output file(s).
-nw, --no-warnings: Suppress all warning messages.
--verbose: Verbose messages.
-h, --help: Show this help message and exit.
<exportField>: A list of export fields

Filter by pattern options:

--filter-in-feature <gene_feature>: Match pattern inside specified gene feature. Default: CDR3
--pattern-max-errors <n>: Max allowed subs & indels. Default: 0
--filter-aa-pattern <pattern>: Filter specific trees by aa pattern.
--filter-nt-pattern <pattern>: Filter specific trees by nt pattern.

NEWICK

mixcr exportShmTreesNewick 
    [--filter-min-nodes <n>] 
    [--filter-min-height <n>] 
    [--ids <id>[,<id>...]]... 
    [--chains <chains>]
    [--force-overwrite] 
    [--no-warnings] 
    [--verbose] 
    [--help] 
    [[--filter-in-feature <gene_feature>] [--pattern-max-errors <n>] (--filter-aa-pattern <pattern> | --filter-nt-pattern <pattern>)] 
    trees.shmt outputDir

Export SHM trees in NEWICK format.

NEWICK will be printed with distances, leafs. As content nodeId will be printed.

Command line options:

trees.shmt: Input file produced by 'findShmTrees' command.
outputDir: Output directory to write newick files. Separate file for every tree will be created
--filter-min-nodes <n>: Minimal number of nodes in tree
--filter-min-height <n>: Minimal height of the tree
--ids <id>[,<id>...]: Filter specific trees by id
--chains <chains>: Export only trees that contains clones with specific chain (e.g. TRA or IGH).
-f, --force-overwrite: Force overwrite of output file(s).
-nw, --no-warnings: Suppress all warning messages.
--verbose: Verbose messages.
-h, --help: Show this help message and exit.

Filter by pattern options:

--filter-in-feature <gene_feature>: Match pattern inside specified gene feature. Default: CDR3
--pattern-max-errors <n>: Max allowed subs & indels. Default: 0
--filter-aa-pattern <pattern>: Filter specific trees by aa pattern.
--filter-nt-pattern <pattern>: Filter specific trees by nt pattern.

Examples

For convenience, MiXCR provides two predefined sets of fields for exporting SHM trees: min (will export minimal required information about tree or tree nodes) and full (used by default); one can use these sets by specifying the --preset option:

> mixcr exportShmTrees --preset min trees.shmt trees.tsv

One can add additional columns to the preset in the following way:

> mixcr exportShmTreesWithNodes --preset min -qFeature CDR2 trees.shmt trees_with_nodes.tsv

One can also put all specify export fields in a separate file

> cat myFields.txt
-vHits
-dHits
-nFeature CDR3

> mixcr exportShmTreesWithNodes --preset-file myFields.txt trees.shmt trees_with_nodes.tsv

Export fields

SHM tree-specific fields

The following fields are available for both exportShmTrees and exportShmTreesWithNodes:

-treeId: SHM tree id
-numberOfClonesInTree: Number of uniq clones in the SHM tree
-totalReadsCountInTree: Total sum of read counts of clones in the SHM tree
-totalUniqueTagCountInTree <(Molecule|Cell|Sample)>: Total count of unique tags in the SHM tree with specified type

The following fields are only available for exportShmTrees:

-vHit: Export best V hit
-jHit: Export best J hit
-nFeature <gene_feature> [(germline|mrca|parent)]: Export nucleotide sequence of specified gene feature of specified node type. If second arg is omitted, then feature will be printed for current node. Otherwise - for corresponding parent, germline or mrca.
-allNFeatures [(germline|mrca|parent)]: Export nucleotide sequences for all covered gene features. If first arg is omitted, then feature will be printed for current node. Otherwise - for corresponding parent, germline or mrca.
-aaFeature <gene_feature> [(germline|mrca|parent)]: Export amino acid sequence of specified gene feature of specified node type. If second arg is omitted, then feature will be printed for current node. Otherwise - for corresponding parent, germline or mrca.
-allAAFeatures [(germline|mrca|parent)]: Export nucleotide sequences for all covered gene features. If first arg is omitted, then feature will be printed for current node. Otherwise - for corresponding parent, germline or mrca.

SHM tree node-specific fields

The following fields are available only for exportShmTreesWithNodes:

-nodeId: Node id in SHM tree
-isObserved: Is node have clones. All other nodes are reconstructed by algorithm
-parentId: Parent node id in SHM tree
-distance <(germline|mrca|parent)>: Distance from another node in number of mutations.
-fileName: Name of clns file with sample
-nFeature <gene_feature> [(germline|mrca|parent)]: Export nucleotide sequence of specified gene feature. If second arg is omitted, then feature will be printed for a current node. Otherwise - for corresponding parent, germline or mrca.
-allNFeatures [(germline|mrca|parent)]: Export nucleotide sequences for all covered gene features. If first arg is omitted, then feature will be printed for a current node. Otherwise - for corresponding parent, germline or mrca.
-aaFeature <gene_feature> [(germline|mrca|parent)]: Export amino acid sequence of specified gene feature. If second arg is omitted, then feature will be printed for a current node. Otherwise - for corresponding parent, germline or mrca.
-allAAFeatures [(germline|mrca|parent)]: Export amino acid sequences for all covered gene features. If first arg is omitted, then feature will be printed for a current node. Otherwise - for corresponding parent, germline or mrca.
-nLength <gene_feature> [(germline|mrca|parent)]: Export length of specified gene feature in nucleotides. If second arg is omitted, then feature length will be printed for a current node. Otherwise - for corresponding parent, germline or mrca.
-allNLength [(germline|mrca|parent)]: Export lengths for all covered gene features in nucleotides. If first arg is omitted, then feature will be printed for a current node. Otherwise - for corresponding parent, germline or mrca.
-aaLength <gene_feature> [(germline|mrca|parent)]: Export length of specified gene feature in amino acids. If second arg is omitted, then feature length will be printed for a current node. Otherwise - for corresponding parent, germline or mrca. If second option is germline then corresponding top genes will be counted excluding CDR3 (VJJunction germline is unknown). It's recommended to run findAlleles before exporting -aaLength <gene_feature> germline because otherwise germline sequence will not incorporate allelic mutations. (only for nodes with clones)
-allAALength [<(germline|mrca|parent)>]: Export lengths for all covered gene features in amino acids. If the first arg is omitted, then feature will be printed for a current node. Otherwise - for corresponding parent, germline or mrca.
-nMutations <gene_feature> [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Extract nucleotide mutations from specific node for specific gene feature. If the second arg is omitted, then mutations will be calculated from germline. Otherwise - for corresponding parent, germline or mrca. By default, exports all types of mutations. One can specify a mutation type using the third arg.
-allNMutations [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Extract nucleotide mutations from specific node for all covered gene features. If the first arg is omitted, then mutations will be calculated from germline. Otherwise - for corresponding parent, germline or mrca. By default, exports all types of mutations. One can specify a mutation type using the third arg.
-nMutationsRelative <gene_feature> <relative_to_gene_feature> [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Extract nucleotide mutations from specific node for specific gene feature relative to another feature. If the third arg is omitted, mutations will be calculated from the germline. Otherwise - for corresponding parent, germline or mrca. By default, exports all types of mutations. One can specify a mutation type using the fourth arg.
-aaMutations <gene_feature> [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Extract amino acid mutations from specific node for specific gene feature. If the second arg is omitted, mutations will be calculated from the germline. Otherwise - for corresponding parent, germline or mrca. By default, exports all types of mutations. One can specify a mutation type using the third arg.
-allAAMutations [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Extract amino acid mutations from specific node for all covered gene features. If the first arg is omitted, then mutations will be calculated from germline. Otherwise - for corresponding parent, germline or mrca. By default, exports all types of mutations. One can specify a mutation type using the third arg.
-aaMutationsRelative <gene_feature> <relative_to_gene_feature> [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Extract amino acid mutations from specific node for specific gene feature relative to another feature. If the third arg is omitted, mutations will be calculated from the germline. Otherwise - for corresponding parent, germline or mrca. By default, exports all types of mutations. One can specify a mutation type using the fourth arg.
-mutationsDetailed <gene_feature> [(germline|mrca|parent)]: Detailed list of nucleotide and corresponding amino acid mutations from specific node. Format <nt_mutation>:<aa_mutation_individual>:<aa_mutation_cumulative>, where <aa_mutation_individual> is an expected amino acid mutation given no other mutations have occurred, and <aa_mutation_cumulative> amino acid mutation is the observed amino acid mutation combining effect from all others. If the second arg is omitted, mutations will be calculated from the germline. Otherwise - for corresponding parent, germline or mrca.
-allMutationsDetailed [(germline|mrca|parent)]: Detailed list of nucleotide and corresponding amino acid mutations from specific node for all covered gene features. If the first arg is omitted, then mutations will be calculated from germline. Otherwise - for corresponding parent, germline or mrca.
-mutationsDetailedRelative <gene_feature> <relative_to_gene_feature> [(germline|mrca|parent)]: Detailed list of nucleotide and corresponding amino acid mutations written, positions relative to specified gene feature. Format ::, where is an expected amino acid mutation given no other mutations have occurred, and amino acid mutation is the observed amino acid mutation combining effect from all others. WARNING: format may change in following versions.
-nMutationsCount [<gene_feature>] [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Number of nucleotide mutations. By default, will be used all covered features. Resolutions of wildcards in VJJunction are excluded from calculation. If second arg is omitted, then mutations will be calculated from germline. Otherwise - for corresponding parent, germline or mrca.
-allNMutationsCount [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Number of nucleotide mutations for all covered gene features. Resolutions of wildcards in VJJunction are excluded from calculation. If first arg is omitted, then mutations will be calculated from germline. Otherwise - for corresponding parent, germline or mrca.
-aaMutationsCount [<gene_feature>] [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Number of amino acid mutations. By default, will be used all covered features. Resolutions of wildcards in CDR3 are excluded from calculation. If second arg is omitted, then mutations will be calculated from the germline. Otherwise - for corresponding parent, germline or mrca.
-allAAMutationsCount [(germline|mrca|parent)] [(substitutions|indels|inserts|deletions)]: Number of amino acid mutations for all covered gene features. Resolutions of wildcards in CDR3 are excluded from calculation. If first arg is omitted, then mutations will be calculated from germline. Otherwise - for corresponding parent, germline or mrca.
-nMutationRate [<gene_feature>] [(substitutions|indels|inserts|deletions)]: Number of nucleotide mutations from germline divided by target sequence size. By default, will be used all covered features. Resolutions of wildcards in VJJunction are excluded from calculation.
-aaMutationRate [<gene_feature>] [(substitutions|indels|inserts|deletions)]: Number of amino acid mutations from germline divided by target sequence size. By default, will be used all covered features. Resolutions of wildcards in CDR3 are excluded from calculation.
-biochemicalProperty <gene_feature> <property> [(germline|mrca|parent)]: Biochemical property of specified gene feature normalized by AA sequence size. Possible values: Hydropathy, Charge, Polarity, Volume, Strength, MjEnergy, Kf1, Kf2, Kf3, Kf4, Kf5, Kf6, Kf7, Kf8, Kf9, Kf10, Rim, Surface, Turn, Alpha, Beta, Core, Disorder, N2Strength, N2Hydrophobicity, N2Volume, N2Surface
-baseBiochemicalProperties <gene_feature> [(germline|mrca|parent)]: Base biochemical properties of specified gene feature normalized by AA sequence size: N2Strength, N2Hydrophobicity, N2Surface, N2Volume, Charge

The following fields are available only for exportShmTreesWithNodes on nodes with clones:

-targets: Export number of targets
-dHit: Export best D hit
-cHit: Export best C hit
-vGene: Export best V hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-dGene: Export best D hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-jGene: Export best J hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-cGene: Export best C hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-vFamily: Export best V hit family name (e.g. TRBV12 for TRBV12-3*00)
-dFamily: Export best D hit family name (e.g. TRBV12 for TRBV12-3*00)
-jFamily: Export best J hit family name (e.g. TRBV12 for TRBV12-3*00)
-cFamily: Export best C hit family name (e.g. TRBV12 for TRBV12-3*00)
-vHitScore: Export score for best V hit
-dHitScore: Export score for best D hit
-jHitScore: Export score for best J hit
-cHitScore: Export score for best C hit
-vHitsWithScore: Export all V hits with score
-dHitsWithScore: Export all D hits with score
-jHitsWithScore: Export all J hits with score
-cHitsWithScore: Export all C hits with score
-vHits: Export all V hits
-dHits: Export all D hits
-jHits: Export all J hits
-cHits: Export all C hits
-vGenes: Export all V gene names (e.g. TRBV12-3 for TRBV12-3*00)
-dGenes: Export all D gene names (e.g. TRBV12-3 for TRBV12-3*00)
-jGenes: Export all J gene names (e.g. TRBV12-3 for TRBV12-3*00)
-cGenes: Export all C gene names (e.g. TRBV12-3 for TRBV12-3*00)
-vFamilies: Export all V gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-dFamilies: Export all D gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-jFamilies: Export all J gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-cFamilies: Export all C gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-vAlignment: Export best V alignment
-dAlignment: Export best D alignment
-jAlignment: Export best J alignment
-cAlignment: Export best C alignment
-vAlignments: Export all V alignments
-dAlignments: Export all D alignments
-jAlignments: Export all J alignments
-cAlignments: Export all C alignments
-qFeature <gene_feature>: Export quality string of specified gene feature
-nFeatureImputed <gene_feature>: Export nucleotide sequence of specified gene feature using letters from germline (marked lowercase) for uncovered regions
-allNFeaturesImputed [<from_reference_point> <to_reference_point>]: Export nucleotide sequence using letters from germline (marked lowercase) for uncovered regions for all gene features between specified reference points (in separate columns).

For example, -allNFeaturesImputed FR3Begin FR4End will export -nFeatureImputed FR3, -nFeatureImputed CDR3, -nFeatureImputed FR4.