The Xenium platform aims to support and embrace principles of data findability, accessibility, interoperability, and reusability (FAIR) so that it is easy to share newly generated Xenium data for collaborative analysis and reproduce findings from published Xenium data.
What Xenium output should I keep for archival storage for reanalysis and grant funding requirements?
We recommend archiving Xenium raw data outputs, which consist of:
- Decoded transcripts with assigned Phred-scaled Q-Scores
- High-resolution morphology images
Decoded transcripts are provided in .zarr
, .parquet
, and .csv
format. Morphology images are provided in ome.tif
format. These data should be archived to fulfill grant funding requirements and for reanalysis, and may be submitted to repositories such as GEO. All other Xenium outputs are derived from these raw data in Xenium Onboard Analysis, can be rederived after a Xenium instrument run, and are not strictly necessary for long-term archival and reproducibility.
Additional detail on Xenium raw data output:
- A Xenium Q-Score indicates the probability that the detected object exists and was correctly identified by the decoding algorithm. All decoded transcript Q-Scores are output in the transcripts files. The cells and cell-feature matrix output files in the Xenium output bundle are filtered to Q-Score ≥ 20. For more details, see our Overview of Xenium Algorithms support page.
- Xenium morphology images will always be provided at the same resolution that our onboard segmentation algorithm uses as input. This ensures that you can benefit from improvements to our segmentation model as we add to its training over time, or run your own segmentation if you choose. Our off-instrument reanalysis package, Xenium Ranger, enables you to easily rerun segmentation or import your own segmentation results to generate derived outputs (e.g., cell-feature matrix) and view them in Xenium Explorer.
- We will stand by these FAIR principles with future capabilities. High-resolution morphology images will continue to be included in the Xenium output bundle for our onboard multimodal segmentation method.
- Other outputs from Xenium Onboard Analysis (XOA) are derived data from these raw outputs, and the community can recapitulate them from Xenium raw data.
Xenium raw data reduces low-level internal sensor data as described at Overview of Xenium Algorithms. It preserves details needed to assess decoded transcript quality, abstracting away low-level details of the instrumentation and assay that require calibration and specialized methods that will change over time as the platform improves and gains new capabilities.
On-instrument processing of Xenium internal sensor data — i.e., the 3D per-pixel values that Xenium Analyzer’s internal image sensor captures across multiple FOVs, multiple fluorescence channels, and multiple cycles of chemistry and imaging processing — is closely tied to Xenium optics. Consequently, Xenium internal sensor data cannot be reanalyzed after processing with Xenium Onboard Analysis.
Internal sensor data is not practically useful for reanalysis or storage (~tens of terabytes of data per sample). In the spirit of scientific reproducibility, it is more useful to store the Xenium decoded transcripts with assigned Phred-scaled Q-Scores and morphology images (typical output directory sizes) for reanalysis.
To add further transparency and to supplement existing methods to QC Xenium data, downsampled RNA diagnostic images are available in the Xenium auxiliary output directory in Xenium Onboard Analysis v1.6 and later. In XOA v1.7 and later, these images are also available in the Analysis Summary. These images are not needed for raw data archival, but should be useful in gaining confidence in the robustness of Xenium's decoding algorithm.
Each tissue region selected on the Xenium Analyzer produces a separate output directory with images, decoded transcripts, cell-feature count matrices, and more.
The file formats were deliberately designed and chosen to balance compatibility, performance, and file size. There is no simple formula for calculating the output directory size from the Xenium Analyzer region area alone. Output size also depends on sample-specific factors like tissue shape, number of cells, number of decoded transcripts, and percent of high quality transcripts.
To help budget for data storage requirements, here are some examples based on estimations and 10x Genomics public datasets.
The table below shows estimated output directory sizes as a function of tissue area, assuming the sample has similar properties to a model mouse brain coronal section with the following metrics:
- 0.72 cm2 tissue area
- 11 Z-slices
- 162k cells
- 62.4M transcripts
- 0.25 cells per 100 µm2
- 107 transcripts > Q20 per 100 µm2
- 80% of transcripts > Q20
Tissue | Tissue area (cm2) | Estimated output directory size (GB), XOA v1.0-1.9 | Estimated output directory size (GB), XOA v2.0* |
---|---|---|---|
Core needle biopsy | 0.01 | 0.2 | 0.23 |
Hemisphere of coronal mouse brain | 0.5 | 10 | 11 |
Full coronal mouse brain | 1 | 20 | 23 |
Tissue section covering entire sample area | 2.35 | 60 | 68 |
* Estimates based on data generated with cell segmentation staining workflow and multimodal cell segmentation.
The 10x Genomics public datasets page provides additional examples of several sample configurations. For example:
Dataset | Tissue area (cm2) | Output directory size (GB) |
---|---|---|
Mouse brain tiny subset | ~0.17 | 3.5 |
Mouse brain full coronal section | 0.66 | 13.0 |
FFPE human breast, Tissue 1 | 0.90 | 24.4 |
FFPE human breast using the entire sample area, Replicate 1 | 2.28 | 51.9 |