# mixcr refineTagsAndSort

Corrects sequencing and PCR errors inside barcode sequences and sorts resulting file by tags. This step does extremely important job by correcting artificial diversity caused by errors in barcodes.

## Command line options

mixcr refineTagsAndSort
[--dont-correct]
[--power <d>]
[--substitution-rate <d>]
[--indel-rate <d>]
[--min-quality <n>]
[--max-substitutions <n>]
[--max-indels <n>]
[--max-errors <n>]
[--memory-budget <n>]
[--set-whitelist <key=value>]
[--reset-whitelist tag]
[--report <path>]
[--json-report <path>]
[--use-local-temp]
[--force-overwrite]
[--no-warnings]
[--verbose]
[--help]
alignments.vdjca alignments.corrected.vdjca

Command takes input .vdjca file produced at align step and writes the resulting .vdjca file with corrected barcode sequences. Additionally, it provides a comprehensive report with the correction performance.

Basic command line options are:

alignments.vdjca
Path to input alignments
alignments.corrected.vdjca
Path where to write corrected alignments
--dont-correct
Don't correct barcodes, only sort alignments by tags. Default value determined by the preset.
-p, --power <d>
This parameter determines how thorough the procedure should eliminate variants looking like errors. Smaller value leave less erroneous variants at the cost of accidentally correcting true variants. This value approximates the fraction of erroneous variants the algorithm will miss (type II errors). Default value determined by the preset.
-s, --substitution-rate <d>
Expected background non-sequencing-related substitution rate. Default value determined by the preset.
-i, --indel-rate <d>
Expected background non-sequencing-related indel rate. Default value determined by the preset.
-q, --min-quality <n>
Minimal Phred33 quality score for the tag. Tags having positions with lower quality score will be discarded, if not corrected. Default value determined by the preset.
--max-substitutions <n>
Maximal number of substitutions to search for. Default value determined by the preset.
--max-indels <n>
Maximal number of indels to search for. Default value determined by the preset.
--max-errors <n>
Maximal number of substitutions and indels combined to search for. Default value determined by the preset.
--memory-budget <n>
Memory budget in bytes. Default: 4Gb
--set-whitelist <key=value>
Sets the whitelist for a specific tag to guide the tag refinement procedure. Usage:
--set-whitelist CELL=preset:737K-august-2016

or
--set-whitelist UMI=file:my_umi_whitelist.txt

--reset-whitelist tag
Resets the whitelist for a specific tag so that unguided refinement procedure will be applied for it. Usage:
-r, --report <path>
Report file (human readable version, see -j / --json-report for machine readable report).
-j, --json-report <path>
JSON formatted report file.
--use-local-temp
Put temporary files in the same folder as the output files.
-f, --force-overwrite
Force overwrite of output file(s).
-nw, --no-warnings
Suppress all warning messages.
--verbose
Verbose messages.
-h, --help
Show this help message and exit.

## Hardware recommendations

Barcode correction step is memory consuming. If barcode correction is enabled the amount of memory required for processing is proportional to the number of unique barcodes. In the extreme case of millions of unique barcodes MiXCR may require up to 32Gb of RAM. If the barcode correction is switched off (with --dont-correct option), there are no such memory requirements, since MiXCR offloads sorting to the disk if there is not enough RAM.