Skip to content

mixcr findAlleles

Finds V- and J-gene allelic variants in a given sample(s). As result MiXCR creates a new reference library and re-aligns clonotypes against it.

Note that clontypes passed as input must be cut by and fully covered by the same gene feature. So, for example .clns files with contigs, must be assembled using assembleContigs with --assemble-contigs-by option.

Also, all inputs must have the same align library, the same scoring of V and J genes and the same features to align.

Allele inference algorithms applies different strategies to identify allelic variants with sufficient statistical significance. The algorithm for B-cells reliably discriminate between somatic hypermutations (including those in hot spot positions) and real allelic variants.

Command line options

mixcr findAlleles 
   (--output-template <template.clns> | --no-clns-output) 
   [--export-library <path.(json|fasta)>] 
   [--export-alleles-mutations <path>] 
   [-O <key=value>]... 
   [--report <path>] 
   [--json-report <path>] 
   [--threads <n>] 

The command returns a highly-compressed, memory- and CPU-efficient binary .clns (clones) file that holds exhaustive information about clonotypes re-aligned to novelly discovered allelic variants. The resulting reference library is built-in in the .clns file but also may be exported directly with --export-library option. Clonotype tables can be further extracted in tabular form using exportClones or in human-readable form using exportClonesPretty. Additionally, MiXCR produces a comprehensive report which provides a detailed summary of allele search.

Basic command line options are:

Input files or directory with files for allele search.

In case of directory no filter by file type will be applied.

--output-template <template.clns>
Output template may contain {file_name} and {file_dir_path},

outputs for '-o /output/folder/{file_name}_with_alleles.clns input_file.clns input_file2.clns' will be /output/folder/input_file_with_alleles.clns and /output/folder/input_file2_with_alleles.clns,

outputs for '-o {file_dir_path}/{file_name}_with_alleles.clns /some/folder1/input_file.clns /some/folder2/input_file2.clns' will be /seme/folder1/input_file_with_alleles.clns and /some/folder2/input_file2_with_alleles.clns

Resulted outputs must be uniq

Command will not realign input clns files. Must be specified if --output-template is omitted.
--export-library <path.(json|fasta)>
Paths where to write library with found alleles and other genes that exits in inputs.

For .json library will be written in reqpseqio format.

For .fasta library will be written in FASTA format with gene name and reliable range in description. There will be several records for one gene if clnx were assembled by composite gene feature.

--export-alleles-mutations <path>
Path to write descriptions and stats (see below) for all result alleles, existed and new (see below).
-O <key=value>
Overrides default build SHM parameter values
-r, --report <path>
Report file (human readable version, see -j / --json-report for machine readable report).
-j, --json-report <path>
JSON formatted report file.
Put temporary files in the same folder as the output files.
-t, --threads <n>
Processing threads
-f, --force-overwrite
Force overwrite of output file(s).
-nw, --no-warnings
Suppress all warning messages.
Verbose messages.
-h, --help
Show this help message and exit.
-r, --report <reportFile>
Report file (human readable version, see -j / --json-report for machine readable report)
-j, --json-report <jsonReport>
JSON formatted report file
-t, --threads <threads>
Specify number of processing threads
-O <String=String>
Overrides default find alleles parameter values (see below).


mixcr findAlleles \
    --output-template {file_name}.allelic.clns \
    --output-library alleles.repseqio.json \
    --export-alleles-mutations allele_stats.tsv \
    donor1_t1.clns donor1_t2.clns donor1_t3.clns

Allelic variants summary table

Summary table produced with --export-alleles-mutations contain the following columns:

allele name in a resulting library; for novel allelic variants will contain count of mutations from known allele and number of mutations in CDR3
gene name; the same for heterozygous
V or J
Is there was enough info to infer an allele or it's absence.
gene features inside which allele was found (including CDR3 part that was used for search)
ranges in genome of alleleMutationsReliableGeneFeatures
allele mutations from germline
clones count that was aligned to this allele
count of clones with no mutations in V and J
lower bound of diversity of clones
total clones count of this allele and its zygotes (the same geneName)
count of clones that align better on original library than on build one
counts of clones with no mutations in V and J after useClonesWithCountGreaterThen filter
counts of clones after useClonesWithCountGreaterThen filter
count of clones that align better on original library than on build one after useClonesWithCountGreaterThen filter
stats of score change of clones (size, sum, min, max, avg, quadraticMean, stdDeviation)

Allele inference algorithm parameters

Below one can find parameters of inference algorithms that may be tuned.

If data has UMI tag then search alleles only by clones with count greater or equal to value (default 0).
If data has no UMI tag then search alleles only by clones with count greater or equal to value (default 1).
Percentage to get top of alleles by lower bound of diversity (default 0.25).
On decision about clone matching to allele will check relation between score penalties between the best and the next alleles (default 0.15).
After an allele is found, it will be enriched with mutations that exists in this portion of clones that aligned on the allele (default 0.95).
Alleles will be filtered by min count of clones that are naive by complementary gene (default 2).
Min percentage from max diversity (count of different CDR3 length multiply by count of uniq complementary genes) of mutation for it may be considered as a candidate for allele mutation (default 0.03).
Filter out allele candidates with percentage from max diversity (count of different CDR3 length multiply by count of uniq complementary genes) less than this parameter (default 0.03).
If percentage from max diversity (count of different CDR3 length multiply by count of uniq complementary genes) of zero allele greater or equal to this, than it will not be tested by diversity ratio (default 0.1).
Letter must be represented in not less than minClonesCount clones (default 5).
Portion of clones from group that must have the same letter (default 0.7).
Letter must be represented by not less than minDiversity percentage of diversity by complementary gene (default 0.5).
If searchMutationsInCDR3 set to null there will be no search for mutations in CDR3