Export SHM Trees
MiXCR provides functions for export SHM Trees, SHM Trees with all nodes in a tab-delimited way. Additionally, SHM trees may be export in NEWICK format
.
.shmt
produced by findShmTrees
and holds SHM trees and clonotypes that included in trees.
SHM trees tables
mixcr exportShmTrees [-f]
[--filter-min-nodes <n>]
[--filter-min-height <n>]
[--ids <id>[,<id>...]]...
[--chains <chains>]
[--preset <preset>]
[--preset-file <file>]
[--no-header]
[--not-covered-as-empty]
[<exportField>]...
[--force-overwrite]
[--no-warnings]
[--verbose]
[--help]
[[--filter-in-feature <gene_feature>] [--pattern-max-errors <n>] (--filter-aa-pattern <pattern> | --filter-nt-pattern <pattern>)]
trees.shmt [trees.tsv]
Exports tab-delimited info from .shmt
file for found SHM trees, without information about each node.
Command line options:
trees.shmt
- Input file produced by 'findShmTrees' command.
[trees.tsv]
- Path to output table. Print in stdout if omitted.
--filter-min-nodes <n>
- Minimal number of nodes in tree
--filter-min-height <n>
- Minimal height of the tree
--ids <id>[,<id>...]
- Filter specific trees by id
--chains <chains>
- Export only trees that contains clones with specific chain (e.g. TRA or IGH).
-p, --preset <preset>
- Specify preset of export fields (possible values: 'full', 'min'; 'full' by default)
-pf, --preset-file <presetFile>
- Specify preset file of export fields
--no-header
- Don't print column names
--not-covered-as-empty
- Export not covered regions as empty text.
-f, --force-overwrite
- Force overwrite of output file(s).
-nw, --no-warnings
- Suppress all warning messages.
--verbose
- Verbose messages.
-h, --help
- Show this help message and exit.
<exportField>
- A list of export fields
Filter by pattern options:
--filter-in-feature <gene_feature>
- Match pattern inside specified gene feature. Default: CDR3
--pattern-max-errors <n>
- Max allowed subs & indels. Default: 0
--filter-aa-pattern <pattern>
- Filter specific trees by aa pattern.
--filter-nt-pattern <pattern>
- Filter specific trees by nt pattern.
SHM trees with nodes tables
mixcr exportShmTreesWithNodes [-f]
[--only-observed]
[--filter-min-nodes <n>]
[--filter-min-height <n>]
[--ids <id>[,<id>...]]...
[--chains <chains>]
[--preset <preset>]
[--preset-file <presetFile>]
[--no-header]
[--not-covered-as-empty]
[<exportField>]...
[--force-overwrite]
[--no-warnings]
[--verbose]
[--help]
[[--filter-in-feature <gene_feature>] [--pattern-max-errors <n>] (--filter-aa-pattern <pattern> | --filter-nt-pattern <pattern>)]
trees.shmt [trees.tsv]
Exports tab-delimited info from .shmt
file for every node of SHM trees.
Command line options:
trees.shmt
- Input file produced by 'findShmTrees' command.
[trees.tsv]
- Path where to write output export table. Print in stdout if omitted.
--only-observed
- Exclude nodes that was reconstructed by algorithm
--filter-min-nodes <n>
- Minimal number of nodes in tree
--filter-min-height <n>
- Minimal height of the tree
--ids <id>[,<id>...]
- Filter specific trees by id
--chains <chains>
- Export only trees that contains clones with specific chain (e.g. TRA or IGH).
-p, --preset <preset>
- Specify preset of export fields (possible values: 'min', 'full'; 'full' by default)
-pf, --preset-file <presetFile>
- Specify preset file of export fields
--no-header
- Don't print column names
--not-covered-as-empty
- Export not covered regions as empty text.
-f, --force-overwrite
- Force overwrite of output file(s).
-nw, --no-warnings
- Suppress all warning messages.
--verbose
- Verbose messages.
-h, --help
- Show this help message and exit.
<exportField>
- A list of export fields
Filter by pattern options:
--filter-in-feature <gene_feature>
- Match pattern inside specified gene feature. Default: CDR3
--pattern-max-errors <n>
- Max allowed subs & indels. Default: 0
--filter-aa-pattern <pattern>
- Filter specific trees by aa pattern.
--filter-nt-pattern <pattern>
- Filter specific trees by nt pattern.
NEWICK
mixcr exportShmTreesNewick
[--filter-min-nodes <n>]
[--filter-min-height <n>]
[--ids <id>[,<id>...]]...
[--chains <chains>]
[--force-overwrite]
[--no-warnings]
[--verbose]
[--help]
[[--filter-in-feature <gene_feature>] [--pattern-max-errors <n>] (--filter-aa-pattern <pattern> | --filter-nt-pattern <pattern>)]
trees.shmt outputDir
Export SHM trees in NEWICK format.
NEWICK will be printed with distances, leafs. As content nodeId will be printed.
Command line options:
trees.shmt
- Input file produced by 'findShmTrees' command.
outputDir
- Output directory to write newick files. Separate file for every tree will be created
--filter-min-nodes <n>
- Minimal number of nodes in tree
--filter-min-height <n>
- Minimal height of the tree
--ids <id>[,<id>...]
- Filter specific trees by id
--chains <chains>
- Export only trees that contains clones with specific chain (e.g. TRA or IGH).
-f, --force-overwrite
- Force overwrite of output file(s).
-nw, --no-warnings
- Suppress all warning messages.
--verbose
- Verbose messages.
-h, --help
- Show this help message and exit.
Filter by pattern options:
--filter-in-feature <gene_feature>
- Match pattern inside specified gene feature. Default: CDR3
--pattern-max-errors <n>
- Max allowed subs & indels. Default: 0
--filter-aa-pattern <pattern>
- Filter specific trees by aa pattern.
--filter-nt-pattern <pattern>
- Filter specific trees by nt pattern.
Examples
For convenience, MiXCR provides two predefined sets of fields for exporting SHM trees: min
(will export minimal required information about tree or tree nodes) and full
(used by default); one can use these sets by specifying the --preset
option:
> mixcr exportShmTrees --preset min trees.shmt trees.tsv
One can add additional columns to the preset in the following way:
> mixcr exportShmTreesWithNodes --preset min -qFeature CDR2 trees.shmt trees_with_nodes.tsv
One can also put all specify export fields in a separate file
> cat myFields.txt
-vHits
-dHits
-nFeature CDR3
> mixcr exportShmTreesWithNodes --preset-file myFields.txt trees.shmt trees_with_nodes.tsv
Export fields
SHM tree-specific fields
The following fields are available for both exportShmTrees
and exportShmTreesWithNodes
:
-treeId
- SHM tree id
-numberOfClonesInTree
- Number of uniq clones in the SHM tree
-totalReadsCountInTree
- Total sum of read counts of clones in the SHM tree
-totalUniqueTagCountInTree <(Molecule|Cell|Sample)>
- Total count of unique tags in the SHM tree with specified type
The following fields are only available for exportShmTrees
:
-vHit
- Export best V hit
-jHit
- Export best J hit
-nFeature <gene_feature> <germline|mrca>
- Export nucleotide sequence of specified gene feature of specified node type.
-allNFeatures <(germline|mrca)>
- Export nucleotide sequences for all covered gene features.
-aaFeature <gene_feature> <germline|mrca>
- Export amino acid sequence of specified gene feature of specified node type.
-allAAFeatures <(germline|mrca)>
- Export nucleotide sequences for all covered gene features.
SHM tree node-specific fields
The following fields are available only for exportShmTreesWithNodes
:
-nodeId
- Node id in SHM tree
-isObserved
- Is node have clones. All other nodes are reconstructed by algorithm
-parentId
- Parent node id in SHM tree
-distance <(germline|mrca|parent)>
- Distance from another node in number of mutations.
-fileName
- Name of clns file with sample
-nFeature <gene_feature> [<(germline|mrca|parent)>]
- Export nucleotide sequence of specified gene feature. If second arg is omitted, then feature will be printed for current node. Otherwise - for corresponding
parent
,germline
ormrca
-allNFeatures [<(germline|mrca|parent)>]
- Export nucleotide sequences for all covered gene features. If second arg is omitted, then feature will be printed for current node. Otherwise - for corresponding
parent
,germline
ormrca
-aaFeature <gene_feature> [<(germline|mrca|parent)>]
- Export amino acid sequence of specified gene feature. If second arg is omitted, then feature will be printed for current node. Otherwise - for corresponding
parent
,germline
ormrca
-allAAFeatures [<(germline|mrca|parent)>]
- Export amino acid sequences for all covered gene features. If second arg is omitted, then feature will be printed for current node. Otherwise - for corresponding
parent
,germline
ormrca
-lengthOf <gene_feature> [<(germline|mrca|parent)>]
- Export length of specified gene feature. If second arg is omitted, then feature length will be printed for current node. Otherwise - for corresponding
parent
,germline
ormrca
-allLengthOf [<(germline|mrca|parent)>]
- Export lengths for all covered gene features. If second arg is omitted, then feature will be printed for current node. Otherwise - for corresponding
parent
,germline
ormrca
-nMutations <gene_feature> <(germline|mrca|parent)>
- Extract nucleotide mutations from specific node for specific gene feature.
-allNMutations <(germline|mrca|parent)>
- Extract nucleotide mutations from specific node for all covered gene features.
-nMutationsRelative <gene_feature> <relative_to_gene_feature> <(germline|mrca|parent)>
- Extract nucleotide mutations from specific node for specific gene feature relative to another feature.
-aaMutations <gene_feature> <(germline|mrca|parent)>
- Extract amino acid mutations from specific node for specific gene feature.
-allAAMutations <(germline|mrca|parent)>
- Extract amino acid mutations from specific node for all covered gene features.
-aaMutationsRelative <gene_feature> <relative_to_gene_feature> <(germline|mrca|parent)>
- Extract amino acid mutations from specific node for specific gene feature relative to another feature.
-mutationsDetailed <gene_feature> <(germline|mrca|parent)>
- Detailed list of nucleotide and corresponding amino acid mutations from specific node. Format
<nt_mutation>:<aa_mutation_individual>:<aa_mutation_cumulative>
, where<aa_mutation_individual>
is an expected amino acid mutation given no other mutations have occurred, and<aa_mutation_cumulative>
amino acid mutation is the observed amino acid mutation combining effect from all others. -allMutationsDetailed <(germline|mrca|parent)>
- Detailed list of nucleotide and corresponding amino acid mutations from specific node for all covered gene features.
-mutationsDetailedRelative <gene_feature> <relative_to_gene_feature> <(germline|mrca|parent)>
- Detailed list of nucleotide and corresponding amino acid mutations written, positions relative to specified gene feature. Format
: : , where is an expected amino acid mutation given no other mutations have occurred, and amino acid mutation is the observed amino acid mutation combining effect from all other. WARNING: format may change in following versions.
The following fields are available only for exportShmTreesWithNodes
on nodes with clones:
-targets
- Export number of targets
-dHit
- Export best D hit
-cHit
- Export best C hit
-vGene
- Export best V hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-dGene
- Export best D hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-jGene
- Export best J hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-cGene
- Export best C hit gene name (e.g. TRBV12-3 for TRBV12-3*00)
-vFamily
- Export best V hit family name (e.g. TRBV12 for TRBV12-3*00)
-dFamily
- Export best D hit family name (e.g. TRBV12 for TRBV12-3*00)
-jFamily
- Export best J hit family name (e.g. TRBV12 for TRBV12-3*00)
-cFamily
- Export best C hit family name (e.g. TRBV12 for TRBV12-3*00)
-vHitScore
- Export score for best V hit
-dHitScore
- Export score for best D hit
-jHitScore
- Export score for best J hit
-cHitScore
- Export score for best C hit
-vHitsWithScore
- Export all V hits with score
-dHitsWithScore
- Export all D hits with score
-jHitsWithScore
- Export all J hits with score
-cHitsWithScore
- Export all C hits with score
-vHits
- Export all V hits
-dHits
- Export all D hits
-jHits
- Export all J hits
-cHits
- Export all C hits
-vGenes
- Export all V gene names (e.g. TRBV12-3 for TRBV12-3*00)
-dGenes
- Export all D gene names (e.g. TRBV12-3 for TRBV12-3*00)
-jGenes
- Export all J gene names (e.g. TRBV12-3 for TRBV12-3*00)
-cGenes
- Export all C gene names (e.g. TRBV12-3 for TRBV12-3*00)
-vFamilies
- Export all V gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-dFamilies
- Export all D gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-jFamilies
- Export all J gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-cFamilies
- Export all C gene family anmes (e.g. TRBV12 for TRBV12-3*00)
-vAlignment
- Export best V alignment
-dAlignment
- Export best D alignment
-jAlignment
- Export best J alignment
-cAlignment
- Export best C alignment
-vAlignments
- Export all V alignments
-dAlignments
- Export all D alignments
-jAlignments
- Export all J alignments
-cAlignments
- Export all C alignments
-qFeature <gene_feature>
- Export quality string of specified gene feature
-nFeatureImputed <gene_feature>
- Export nucleotide sequence of specified gene feature using letters from germline (marked lowercase) for uncovered regions
-allNFeaturesImputed [<from_reference_point> <to_reference_point>]
- Export nucleotide sequence using letters from germline (marked lowercase) for uncovered regions for all gene features between specified reference points (in separate columns).
For example, -allNFeaturesImputed FR3Begin FR4End
will export -nFeatureImputed FR3
, -nFeatureImputed CDR3
, -nFeatureImputed FR4
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-aaFeatureImputed <gene_feature>
- Export amino acid sequence of specified gene feature using letters from germline (marked lowercase) for uncovered regions
-allAAFeaturesImputed [<from_reference_point> <to_reference_point>]
- Export amino acid sequence using letters from germline (marked lowercase) for uncovered regions for all gene features between specified reference points (in separate columns).
For example, -allAAFeaturesImputed FR3Begin FR4End
will export -aaFeatureImputed FR3
, -aaFeatureImputed CDR3
, -aaFeatureImputed FR4
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-minFeatureQuality <gene_feature>
- Export minimal quality of specified gene feature
-allMinFeaturesQuality [<from_reference_point> <to_reference_point>]
- Export minimal quality for all gene features between specified reference points (in separate columns).
For example, -allMinFeaturesQuality FR3Begin FR4End
will export -minFeatureQuality FR3
, -minFeatureQuality CDR3
, -minFeatureQuality FR4
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-allNFeaturesWithMinQuality [<from_reference_point> <to_reference_point>]
- Export nucleotide sequences and minimal quality for all gene features between specified reference points (in separate columns).
For example, -allNFeaturesWithMinQuality FR3Begin FR4End
will export -nFeature FR3
, -minFeatureQuality FR3
, -nFeature CDR3
, -minFeatureQuality CDR3
, -nFeature FR4
, -minFeatureQuality FR4
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-allNFeaturesImputedWithMinQuality [<from_reference_point> <to_reference_point>]
- Export nucleotide sequences and minimal quality for all gene features between specified reference points (in separate columns).
For example, -allNFeaturesImputedWithMinQuality FR3Begin FR4End
will export -nFeatureImputed FR3
, -minFeatureQuality FR3
, -nFeatureImputed CDR3
, -minFeatureQuality CDR3
, -nFeatureImputed FR4
, -minFeatureQuality FR4
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-avrgFeatureQuality <gene_feature>
- Export average quality of specified gene feature
-allAvrgFeaturesQuality [<from_reference_point> <to_reference_point>]
- Export average quality for all gene features between specified reference points (in separate columns).
For example, -allAvrgFeaturesQuality FR3Begin FR4End
will export -avrgFeatureQuality FR3
, -avrgFeatureQuality CDR3
, -avrgFeatureQuality FR4
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-positionInReferenceOf <reference_point>
- Export position of specified reference point inside reference sequences (clonal sequence / read sequence).
-allPositionsInReference [<from_reference_point> <to_reference_point>]
- Export position inside reference sequences (clonal sequence / read sequence) for all reference points between specified (in separate columns).
For example, -allPositionsInReference FR3Begin FR4End
will export -positionInReferenceOf FR3Begin
, -positionInReferenceOf CDR3Begin, -positionInReferenceOf CDR3End
and -positionInReferenceOf FR4End
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-positionOf <reference_point>
- Export position of specified reference point inside target sequences (clonal sequence / read sequence).
-allPositions [<from_reference_point> <to_reference_point>]
- Export position inside target sequences (clonal sequence / read sequence) for all reference points between specified (in separate columns).
For example, -allPositions FR3Begin FR4End
will export -positionOf FR3Begin
, -positionOf CDR3Begin, -positionOf CDR3End
and -positionOf FR4End
.
By default, boundaries will be got from analysis parameters if possible or FR1Begin FR4End
otherwise.
-defaultAnchorPoints
- Outputs a list of default reference points (like CDR2Begin, FR4End, etc. see documentation bellow for the full list and formatting)
-cloneId
- Unique clone identifier
-readCount
- Number of reads assigned to the clonotype
-readFraction
- Fraction of reads assigned to the clonotype
-targetSequences
- Export aligned sequences (targets), separated with comma
-targetQualities
- Export aligned sequence (target) qualities, separated with comma
-vIdentityPercents
- V alignment identity percents
-dIdentityPercents
- D alignment identity percents
-jIdentityPercents
- J alignment identity percents
-cIdentityPercents
- C alignment identity percents
-vBestIdentityPercent
- V best alignment identity percent
-dBestIdentityPercent
- D best alignment identity percent
-jBestIdentityPercent
- J best alignment identity percent
-cBestIdentityPercent
- C best alignment identity percent
-chains
- Chains
-topChains
- Top chains
-geneLabel <label>
- Export gene label (i.e. ReliableChain)
-tagCounts
- All tags with counts
-tags <(Molecule|Cell|Sample)>
- All tags values (i.e. CELL barcode or UMI sequence).
-uniqueTagCount <(Molecule|Cell|Sample)>
- Unique tag count
Tag type will be used for filtering tags for export.
-uniqueTagFraction <(Molecule|Cell|Sample)>
- Fraction of unique tags (UMI, CELL, etc.) the clone or alignment collected.
Tag type will be used for filtering tags for export.
-cellGroup
- Cell group (for single cell analysis)