7 Types of input data
7.1 10x Cell Ranger output
10x Cell Ranger processing pipeline produces outputs in two standard formats. 1) With a matrix .mtx file, a feature, and a barcode .tsv files. These can be found in the outs folder of the Cell Ranger pipeline.
For example, if the Cell Ranger pipeline was run for sample sample345 using the cellranger count function, with the sample id set to sample345, then the files mentioned above can be found in sample345/outs/filtered_feature_bc_matrix folder.
To load this data, run Natian and click Create Seurat from 10x output button.
Use the Browse… to navigate and select all three files.
Ensure that all three files are loaded before proceeding to create the Seurat object.
- Click on proceed to create a Seurat file.
The files are copied to a temporary folder, and then loaded to the RStudio, R-environment. This does not affect the original files. Any edits, processing, changes done to the Seurat object does not impact the raw data.
- Provide a name for the Seurat object. You cannot use space or any special characters. Allowed characters are
A-Z, a-z, 0-9, _ [underscore]. Click create Seurat object.
Avoid copy-paste within this text box to enter special characters. Doing so will lead to errors and crash the app!
7.2 Gene count matrix output
Gene count matrix outputs generated from other alignment/counting pipelines can be used. These data sets have to be a .csv or .txt files, or .rds file with the data in the dgCmatrix format.
| Gene/Cell | Cell_1 | Cell_2 | Cell_3 | … |
|---|---|---|---|---|
| Gene1 | 5 | 0 | 2 | … |
| Gene2 | 10 | 0 | 0 | … |
| Gene3 | 0 | 1 | 0 | … |
| … | … | … | … | … |
To load this data, run Natian and click Load a Gene count matrix file button.
Use the Browse… to navigate and select the file.
Provide a name for the Seurat object in the
Enter Sample nameinput box.You cannot use space or any special characters. Allowed characters are
A-Z, a-z, 0-9, _ [underscore]. Click create Seurat object.
Avoid copy-paste within this text box to enter special characters. Doing so will lead to errors and crash the app!
A default filteration step is employed when Seurat objects are created using Natian. In Natian, cells that show fewer than 200 genes are filtered. Also, genes not detected in at least 3 cells are removed. This can be modified by editing the lines 679 and 844 with min.cells and min.features values in the app_Natian.R file.
NOTE: When resuming the processing of a Seurat object, this filtration is not performed.
7.3 Gene expression omnibus (GEO)
Single-cell data from previously published works can be accessed using the Gene Expression Omnibus (GEO). We provide a way to download the supplementary files associated with the publication using the GSE ID of the corresponding GEO submission.
There is no standard format in which supplementary files are submitted to GEO. Therefore, we have refrained from automatic loading of the data, but provide a quick way to access GEO data sets and if available download a supplementary file.
- To download the file, get the
GSEXXXXID from the publication which reports data submitted to GEO database. - Click on the Download data from GSE file to get the GSE ID input box
- Enter the GSE ID in the input box and select Create a weblink
Optional: If the entered GSE ID is wrong or the supplementary data could not be downloaded, then a web link based on the GSE ID is produced. You can choose to use this link.
To use the link click on Take me to the link. This will open a new browser with the link to the GSE ID.
Once the data download is complete or if you were able to download the data using the link, use the I have the downloaded data.
This will allow you to open supplementary files from GEO databases that are gene count matrices or .RDS or .RData files.
- Alternatively, use the Start Panel button to go back to the main page.
This will allow you to open supplementary files from GEO databases that are 10x Cell Ranger pipeline outputs.
7.4 Partially processed Seurat objects
It is possible to resume processing partially processed single-cell data in the form of Seurat object using Natian.
To load a partially processed or processed Seurat objects that require re-processing click on the Resume processing Seurat file.
Use the Browse.. to load an
.RDSor.RDataor an.RObjfile. This will add the Seurat object to the environment and populate the name of the object in the drop down list.
.RData files can contain more than one Seurat file. All of these files will be loaded to the environment. .RDS and .RObj files can only have one Seurat object that will be loaded.
- Use the drop down list to select any Seurat object already in the environment to re-process or resume processing.
The order of selection of the files is not important. The files can also be compressed in
gzipformat as produced by some versions of Cell Ranger. Natian will load these files, and extract the files subsequently.