The cellranger-arc reanalyze
command reruns secondary analysis performed on the peak-barcode matrix (dimensionality reduction, clustering and visualization) using different parameter settings.
These are the required command line arguments:
Argument | Description |
---|---|
--id=ID | Required. A unique run id and output folder name [a-zA-Z0-9_-]+ of maximum length 64 characters. |
--matrix=H5 | Required. Path to a feature-barcode matrix H5 generated by cellranger-arc count or aggr . If you intend to subset to a set of barcodes then use the raw matrix, otherwise use the filtered feature-barcode matrix. |
--atac-fragments=TSV.GZ | Required. Path to the atac_fragments.tsv.gz generated by cellranger-arc count or aggr . Note it is assumed that the tabix index file atac_fragments.tsv.gz.tbi is present in the same directory. |
--reference=PATH | Required. Path to folder containing cellranger-arc-compatible reference. Reference packages can be downloaded from support.10xgenomics.com or constructed using the cellranger-arc mkref command. Note this reference must match the reference used for the initial cellranger-arc count run. |
Optional command line parameters are listed below (also available through cellranger-arc reanalyze --help
):
Option | Description |
---|---|
--description=TXT | Sample description to embed in output files [default: ] |
--barcodes=LIST | Specify barcodes to use in analysis. The barcodes could be specified in a text file that contains one barcode per line (blank lines are ignored). CSV (with/without a header) is also accepted. Only the first column of the CSV is used — exports from Loupe Browser will have this format. Required if neither --peaks nor --params has been specified. |
--min-atac-count=NUM | Cell caller override: define the minimum number of ATAC transposition events in peaks (ATAC counts) for a cell barcode. Note: this option must be specified in conjunction with min-gex-count . With --min-atac-count=X and --min-gex-count=Y a barcode is defined as a cell if it contains at least X ATAC counts AND at least Y GEX UMI counts. |
--min-gex-count=NUM | Cell caller override: define the minimum number of GEX UMI counts for a cell barcode. Note: this option must be specified in conjunction with min-atac-count . With --min- atac-count=X and --min-gex-count=Y a barcode is defined as a cell if it contains at least X ATAC counts AND at least Y GEX UMI counts. |
--peaks=BED | Override peak caller: specify peaks to use in secondary analyses from supplied 3-column BED file. The supplied peaks file must be sorted by position and not contain overlapping peaks; comment lines beginning with # are allowed. Required if neither --barcodes nor --params has been specified. |
--params=CSV | Specify key-value pairs in CSV format for analysis: any subset of random_seed , k_means_max_clusters , feature_linkage_max_dist_mb , num_gex_pcs , num_atac_pcs . For example, to override the number of GEX principal components used to 15 and the distance threshold for feature linkage computation to 2.5 MB, the CSV would take the form: num_gex_pcs,15 feature_linkage_max_dist_mb,2.5 Required if neither --peaks nor --barcodes has been specified. |
--agg=AGGREGATION_CSV | If the input matrix was produced by cellranger-arc aggr , it is possible to pass the same aggregation CSV in order to retain per-library tag information in the resulting .cloupe file. |
--jobmode=MODE | Job manager to use. Valid options: local (default), sge, lsf, slurm, or path to a .template file. Search for help on "Cluster Mode" at support.10xgenomics.com for more details on configuring the pipeline to use a compute cluster [default: local]. |
--localcores=NUM | Set max cores the pipeline may request at one time. Only applies to local jobs. |
--localmem=NUM | Set max GB the pipeline may request at one time. Only applies to local jobs. |
--localvmem=NUM | Set max virtual address space in GB for the pipeline. Only applies to local jobs. |
--mempercore=NUM | Reserve enough threads for each job to ensure enough memory will be available, assuming each core on your cluster has at least this much memory available. Only applies to cluster jobmodes. |
--maxjobs=NUM | Set max jobs submitted to cluster at one time. Only applies to cluster jobmodes. |
--jobinterval=NUM | Set delay between submitting jobs to cluster, in ms. Only applies to cluster jobmodes. |
--overrides=PATH | The path to a JSON file that specifies stage-level overrides for cores and memory. Finer-grained than --localcores , --mempercore , and --localmem . |
--uiport=PORT | Serve web UI at http://localhost:PORT |
After determining input arguments and options, run cellranger-arc reanalyze
. This example reanalyzes the results of an aggregation named AGG123
in order to filter out doublet barcodes:
$ cd /home/jdoe/runs
$ ls -1 AGG123/outs/*.gz # verify the input file exists
AGG123/outs/fragments.tsv.gz
$ cellranger-arc reanalyze --id=AGG123_reanalysis \
--barcodes=no_doublets.csv \
--matrix=/home/jdoe/runs/AGG123/outs/raw_feature_bc_matrix.h5 \
--reference=/home/jdoe/refs/hg19 \
--atac-fragments=/home/jdoe/runs/AGG123/outs/atac_fragments.tsv.gz
The pipeline will begin to run, creating a new folder named with the reanalysis ID specified with the --id
argument (e.g. /home/jdoe/runs/AGG123_reanalysis
) for its output. If this output folder already exists, cellranger-arc
will assume it is an existing pipestance and attempt to resume running it.
A successful run should conclude with a message similar to this:
2021-04-26 03:30:46 [runtime] (update) ID.AGG123_reanalysis.SC_ATAC_GEX_REANALYZER_CS.ATAC_GEX_CLOUPE_PREPROCESS.fork0 join_running
2021-04-26 03:36:05 [runtime] (join_complete) ID.AGG123_reanalysis.SC_ATAC_GEX_REANALYZER_CS.ATAC_GEX_CLOUPE_PREPROCESS
Outputs:
- Secondary analysis outputs:
clustering:
atac: {
...
}
gex: {
...
}
dimensionality_reduction:
atac: {
...
}
gex: {
...
}
feature_linkage:
...
tf_analysis:
...
- Filtered feature barcode matrix HDF5: /home/jdoe/runs/AGG123_reanalysis/outs/filtered_feature_bc_matrix.h5
- Loupe browser visualization file: /home/jdoe/runs/AGG123_reanalysis/outs/cloupe.cloupe
- ATAC peak annotations based on proximal genes: /home/jdoe/runs/AGG123_reanalysis/outs/atac_peak_annotation.tsv
- Secondary analysis summary: /home/jdoe/runs/AGG123_reanalysis/outs/summary.json
Pipestance completed successfully!
Refer to the Multiome ATAC + Gene Expression Analysis page for an explanation of the output.
The CSV file passed to --params
should have one row for every parameter that you want to customize. There is no header row. If a parameter is not specified in your CSV, its default value will be used.
Here are detailed descriptions of each parameter. For parameters that subset the data, a default value of null
indicates that no subsetting happens by default.
Parameter | Type | Default Value | Recommended Range | Description |
---|---|---|---|---|
feature_linkage_max_dist_mb | float | 1 | 0.1-5, depending on the what is a biological meaningful length scale for the organism | Change the distance over which pairs of features are considered for feature linkage estimation. Increasing this number will increase the number of linkage, but features that are very far away on the genome are less likely to be causally linked. |
num_atac_pcs | int | 15 | 10-100, depending on the number of cell populations / clusters you expect to see. | Compute N principal components for LSA. Setting this too high may cause spurious clusters to be called. |
num_gex_pcs | int | 10 | 10-100, depending on the number of cell populations / clusters you expect to see. | Compute N principal components for PCA. Setting this too high may cause spurious clusters to be called. |
k_means_max_clusters | int | 5 | 2-5, depending on the number of cell populations / clusters you expect to see. | Compute K-means clustering using K values of 2 to N. Setting this too high may cause spurious clusters to be called. |
random_seed | int | 0 | any 64-bit integer | Due to the randomized nature of the algorithms, changing this will produce slightly different results. |