seurat subset downsample

5 comments williamsdrake commented on Jun 4, 2020 edited Hi Seurat Team, Error in CellsByIdentities (object = object, cells = cells) : timoast closed this as completed on Jun 5, 2020 ShellyCoder mentioned this issue If you are going to use idents like that, make sure that you have told the software what your default ident category is. The slice_sample() function in the dplyr package is useful here. What do hollow blue circles with a dot mean on the World Map? Cannot find cells provided, Any help or guidance would be appreciated. Two MacBook Pro with same model number (A1286) but different year. Sign up for a free GitHub account to open an issue and contact its maintainers and the community. Sign in Short story about swapping bodies as a job; the person who hires the main character misuses his body. Default is all identities. I ma just worried it is just picking the first 600 and not randomizing, https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/sample. Learn R. Search all packages and functions. This can be misleading. using FetchData, Low cutoff for the parameter (default is -Inf), High cutoff for the parameter (default is Inf), Returns all cells with the subset name equal to this value. If I verify the subsetted object, it does have the nr of cells I asked for in max.cells.per.ident (only one ident in one starting object). Folder's list view has different sized fonts in different folders. The text was updated successfully, but these errors were encountered: I guess you can randomly sample your cells from that cluster using sample() (from the base in R). The first step is to select the genes Monocle will use as input for its machine learning approach. Description Randomly subset (cells) seurat object by a rate Usage 1 RandomSubsetData (object, rate, random.subset.seed = NULL, .) To learn more, see our tips on writing great answers. See Also. to your account. If ident.use = NULL, then Seurat looks at your actual object@ident (see Seurat::WhichCells, l.6). Yes it does randomly sample (using the sample() function from base). It only takes a minute to sign up. Inf; downsampling will happen after all other operations, including Eg, the name of a gene, PC1, a So if you want to sample randomly 1000 cells, independent of the clusters to which those cells belong, you can simply provide a vector of cell names to the cells.use argument. 351 2 15. Which ability is most related to insanity: Wisdom, Charisma, Constitution, or Intelligence? just "BC03" ? Related question: "SubsetData" cannot be directly used to randomly sample 1000 cells (let's say) from a larger object? Interpreting non-statistically significant results: Do we have "no evidence" or "insufficient evidence" to reject the null? Seurat: Error in FetchData.Seurat(object = object, vars = unique(x = expr.char[vars.use]), : None of the requested variables were found: Ubiquitous regulation of highly specific marker genes. data.table vs dplyr: can one do something well the other can't or does poorly? To learn more, see our tips on writing great answers. My question is Is this randomized ? Downsample number of cells in Seurat object by specified factor. If you use the default subset function there is a risk that images Identity classes to subset. Subset a Seurat object RDocumentation. You signed in with another tab or window. Well occasionally send you account related emails. However, one of the clusters has ~10-fold more number of cells than the other one. I have two seurat objects, one with about 40k cells and another with around 20k cells. Should I re-do this cinched PEX connection? These genes can then be used for dimensional reduction on the original data including all cells. Number of cells to subsample. You can then create a vector of cells including the sampled cells and the remaining cells, then subset your Seurat object using SubsetData() and compute the variable genes on this new Seurat object. So if you clustered your cells (e.g. I actually did not need to randomly sample clusters but instead I wanted to randomly sample an object - for me my starting object after filtering. However, if you did not compute FindClusters() yet, all your cells would show the information stored in object@meta.data$orig.ident in the object@ident slot. privacy statement. You can subset from the counts matrix, below I use pbmc_small dataset from the package, and I get cells that are CD14+ and CD14-: This vector contains the counts for CD14 and also the names of the cells: Getting the ids can be done using which : A bit dumb, but I guess this is one way to check whether it works: I am using this code to actually add the information directly on the meta.data. Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. Can be used to downsample the data to a certain max per cell ident. Sign in to comment Assignees No one assigned Labels None yet Projects None yet Milestone If there are insufficient cells to achieve the target min.group.size, only the available cells are retained. Also, please provide a reproducible example data for testing, dput (myData). By clicking Post Your Answer, you agree to our terms of service, privacy policy and cookie policy. the Allied commanders were appalled to learn that 300 glider troops had drowned at sea. Is there a way to maybe pick a set number of cells (but randomly) from the larger cluster so that I am comparing a similar number of cells? The final variable genes vector can be used for dimensional reduction. = 1000). It's a closed issue, but I stumbled across the same question as well, and went on to find the answer. By clicking Sign up for GitHub, you agree to our terms of service and Step 1: choosing genes that define progress. Setup the Seurat Object For this tutorial, we will be analyzing the a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10X Genomics. Indentity classes to remove. Does it make sense to subsample as such even? I managed to reduce the vignette pbmc from the from 2700 to 600. I dont have much choice, its either that or my R crashes with so many cells. You signed in with another tab or window. It first does all the selection and potential inversion of cells, and then this is the bit concerning downsampling: So indeed, it groups it into the identity classes (e.g. You can subset from the counts matrix, below I use pbmc_small dataset from the package, and I get cells that are CD14+ and CD14-: library (Seurat) CD14_expression = GetAssayData (object = pbmc_small, assay = "RNA", slot = "data") ["CD14",] This vector contains the counts for CD14 and also the names of the cells: head (CD14_expression,30 . Otherwise, if you'd like to have equal number of cells (optimally) per cluster in your final dataset after subsetting, then what you proposed would do the job. as.Seurat: Coerce to a 'Seurat' Object; as.sparse: Cast to Sparse; AttachDeps: . targetCells: The desired cell number to retain per unit of data. I checked the active.ident to make sure the identity has not shifted to any other column, but still I am getting the error? You can however change the seed value and end up with a different dataset. What pareameters are excluding these cells? This is pretty much what Jean-Baptiste was pointing out. Here we present an example analysis of 65k peripheral blood mononuclear blood cells (PBMCs) using the R package Seurat. max per cell ident. My analysis is helped by the fact that the larger cluster is very homogeneous - so, random sampling of ~1000 cells is still very representative. The integration method that is available in the Seurat package utilizes the canonical correlation analysis (CCA). Not the answer you're looking for? I would like to randomly downsample each cell type for each condition. Sign in These genes can then be used for dimensional reduction on the original data including all cells. We start by reading in the data. Asking for help, clarification, or responding to other answers. Parameter to subset on. At the moment you are getting index from row comparison, then using that index to subset columns. I have a seurat object with 5 conditions and 9 cell types defined. If I have an input of 2000 cells and downsample to 500, how are te 1500 cells excluded? Therefore I wanted to confirm: does the SubsetData blindly randomly sample? 1. use.imputed=TRUE), Run the code above in your browser using DataCamp Workspace, WhichCells: Identify cells matching certain criteria, WhichCells(object, ident = NULL, ident.remove = NULL, cells.use = NULL, Connect and share knowledge within a single location that is structured and easy to search. To use subset on a Seurat object, (see ?subset.Seurat) , you have to provide: What you have should work, but try calling the actual function (in case there are packages that clash): Thanks for contributing an answer to Bioinformatics Stack Exchange! Did the drapes in old theatres actually say "ASBESTOS" on them? So if you repeat your subsetting several times with the same max.cells.per.ident, you will always end up having the same cells. If NULL, does not set a seed. Subsets a Seurat object containing Spatial Transcriptomics data while Thanks for contributing an answer to Stack Overflow! But it didnt work.. Subsetting from seurat object based on orig.ident? 1) The downsampled percentage of cells in WT and KO is more over same compared to the actual % of cells in WT and KO 2) In each versions, I have highlighted the KO cells for cluster 1, 4, 5, 6 and 7 where the downsampled number is less than the WT cells. This subset also has the same exact mean and median as my original object Im subsetting from. This method expects "correspondences" or shared biological states among at least a subset of single cells across the groups. Making statements based on opinion; back them up with references or personal experience. which, lets suppose, gives you 8 clusters), and would like to subset your dataset using the code you wrote, and assuming that all clusters are formed of at least 1000 cells, your final Seurat object will include 8000 cells. to your account. Downsample Seurat Description. Developed by Rahul Satija, Andrew Butler, Paul Hoffman, Tim Stuart. Why does Acts not mention the deaths of Peter and Paul? ctrl1 Astro 1000 cells How are engines numbered on Starship and Super Heavy? Downsample a seurat object, either globally or subset by a field, The desired cell number to retain per unit of data. However, when I try to do any of the following: seurat_object <- subset (seurat_object, subset = meta . They actually both fail due to syntax errors, yours included @williamsdrake . Examples ## Not run: # Subset using meta data to keep spots with more than 1000 unique genes se.subset <- SubsetSTData(se, expression = nFeature_RNA >= 1000) # Subset by a . Which language's style guidelines should be used when writing code that is supposed to be called from another language? Character. Factor to downsample data by. So, it's just a random selection. Thanks, downsample is an input parameter from WhichCells, Maximum number of cells per identity class, default is Inf; downsampling will happen after all other operations, including inverting the cell selection. Is it safe to publish research papers in cooperation with Russian academics? Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. What would be the best way to do it? What are the advantages of running a power tool on 240 V vs 120 V? For ex., 50k or 60k. The best answers are voted up and rise to the top, Not the answer you're looking for? are kept in the output Seurat object which will make the STUtility functions Conditions: ctrl1, ctrl2, ctrl3, exp1, exp2 column name in object@meta.data, etc. Already on GitHub? Bioinformatics Stack Exchange is a question and answer site for researchers, developers, students, teachers, and end users interested in bioinformatics. Additional arguments to be passed to FetchData (for example, exp1 Astro 1000 cells Hi, I guess you can randomly sample your cells from that cluster using sample() (from the base in R). The text was updated successfully, but these errors were encountered: This is more of a general R question than a question directly related to Seurat, but i will try to give you an idea. For instance, you might do something like this: You signed in with another tab or window. Already on GitHub? Usage Arguments., Value. Returns a list of cells that match a particular set of criteria such as you may need to wrap feature names in backticks (``) if dashes - zx8754. For this application, using SubsetData is fine, it seems from your answers. Subset of cell names. Well occasionally send you account related emails. Learn more about Stack Overflow the company, and our products. Sign in Browse other questions tagged, Start here for a quick overview of the site, Detailed answers to any questions you might have, Discuss the workings and policies of this site. For example, Thanks for this, but I really want to understand more how the downsample function actualy works. Well occasionally send you account related emails. exp2 Micro 1000 cells Why did US v. Assange skip the court of appeal? I want to create a subset of a cell expressing certain genes only. For your last question, I suggest you read this bioRxiv paper. By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. SampleUMI(data, max.umi = 1000, upsample = FALSE, verbose = FALSE) Arguments data Matrix with the raw count data max.umi Number of UMIs to sample to upsample Upsamples all cells with fewer than max.umi verbose How to subset the rows of my data frame based on a list of names? I can figure out what it is by doing the following: meta_data = colnames (seurat_object@meta.data) [grepl ("DF.classification", colnames (seurat_object@meta.data))] Where meta_data = 'DF.classifications_0.25_0.03_252' and is a character class. ctrl3 Astro 1000 cells Already on GitHub? This is called feature selection, and it has a major impact in the shape of the trajectory. Thanks for the wonderful package. Image of minimal degree representation of quasisimple group unique up to conjugacy, Folder's list view has different sized fonts in different folders. But this is something you can test by minimally subsetting your data (i.e. Was Aristarchus the first to propose heliocentrism? downsampled.obj <- large.obj[, sample(colnames(large.obj), size = ncol(small.obj), replace=F))]. Have a question about this project? . If I always end up with the same mean and median (UMI) then is it truly random sampling? Does it not? privacy statement. Example Of course, your case does not exactly match theirs, since they have ~1.3M cells and, therefore, more chance to maximally enrich in rare cell types, and the tissues you're studying might be very different. ctrl2 Astro 1000 cells Making statements based on opinion; back them up with references or personal experience. # Subset Seurat object based on identity class, also see ?SubsetData subset (x = pbmc, idents = "B cells") subset (x = pbmc, idents = c ("CD4 T cells", "CD8 T cells"), invert = TRUE) subset (x = pbmc, subset = MS4A1 > 3) subset (x = pbmc, subset = MS4A1 > 3 & PC1 > 5) subset (x = pbmc, subset = MS4A1 > 3, idents = "B cells") subset (x = pbmc, Can you tell me, when I use the downsample function, how does seurat exclude or choose cells? I would rather use the sample function directly. accept.value = NULL, max.cells.per.ident = Inf, random.seed = 1, ). By clicking Sign up for GitHub, you agree to our terms of service and If this new subset is not randomly sampled, then on what criteria is it sampled? Returns a list of cells that match a particular set of criteria such as identity class, high/low values for particular PCs, ect.. DoHeatmap ( subset (pbmc3k.final, downsample = 100), features = features, size = 3) New additions to FeaturePlot FeaturePlot (pbmc3k.final, features = "MS4A1") FeaturePlot (pbmc3k.final, features = "MS4A1", min.cutoff = 1, max.cutoff = 3) FeaturePlot (pbmc3k.final, features = c ("MS4A1", "PTPRCAP"), min.cutoff = "q10", max.cutoff = "q90") Hi Leon, Happy to hear that. By clicking Sign up for GitHub, you agree to our terms of service and Connect and share knowledge within a single location that is structured and easy to search. New blog post from our CEO Prashanth: Community is the future of AI, Improving the copy in the close modal and post notices - 2023 edition, Subsetting of object existing of two samples, Set new Idents based on gene expression in Seurat and mix n match identities to compare using FindAllMarkers, What column and row naming requirements exist with Seurat (context: when loading SPLiT-Seq data), Subsetting a Seurat object based on colnames, How to manage memory contraints when analyzing a large number of gene count matrices? It won't necessarily pick the expected number of cells . However, for robustness issues, I would try to resample from obj1 several times using different seed values (which you can store for reproducibility), compute variable genes at each step as described above, and then get either the union or the intersection of those variable genes. I think this is basically what you did, but I think this looks a little nicer. Already on GitHub? If you make a dataframe containing the barcodes, conditions, and celltypes, you can sample 1000 cells within each condition/ celltype. For more information on customizing the embed code, read Embedding Snippets. inplace: bool (default: True) Thanks again for any help! When do you use in the accusative case? How to refine signaling input into a handful of clusters out of many. If a subsetField is provided, the string 'min' can also be used, in which case, If provided, data will be grouped by these fields, and up to targetCells will be retained per group. Thank you for the suggestion. subset: bool (default: False) Inplace subset to highly-variable genes if True otherwise merely indicate highly variable genes. to your account. to a point where your R doesn't crash, but that you loose the less cells), and then decreasing in the number of sampled cells and see if the results remain consistent and get recapitulated by lower number of cells. by default, throws an error, A predicate expression for feature/variable expression, however, when i use subset(), it returns with Error. Hello All, I would like to randomly downsample the larger object to have the same number of cells as the smaller object, however I am getting an error when trying to subset. Here, the GEX = pbmc_small, for exemple. crash. downsample Maximum number of cells per identity class, default is Inf; downsampling will happen after all other operations, including inverting the cell selection seed Random seed for downsampling. random.seed Random seed for downsampling Value Returns a Seurat object containing only the relevant subset of cells Examples Run this code # NOT RUN { pbmc1 <- SubsetData (object = pbmc_small, cells = colnames (x = pbmc_small) [1:40]) pbmc1 # } # NOT RUN { # } Content Discovery initiative April 13 update: Related questions using a Review our technical responses for the 2023 Developer Survey, Filter data.frame rows by a logical condition, How to make a great R reproducible example, Subset data to contain only columns whose names match a condition. If a subsetField is provided, the string 'min' can also be . This is what worked for me: exp2 Astro 1000 cells. Already have an account? Numeric [0,1]. identity class, high/low values for particular PCs, etc. Meta data grouping variable in which min.group.size will be enforced. Adding EV Charger (100A) in secondary panel (100A) fed off main (200A). Generating points along line with specifying the origin of point generation in QGIS. Creates a Seurat object containing only a subset of the cells in the original object. **subset_deg **FindAllMarkers. If NULL, does not set a seed Value A vector of cell names See also FetchData Examples You can check lines 714 to 716 in interaction.R. Why don't we use the 7805 for car phone chargers? I meant for you to try your original code for Dbh.pos, but alter Dbh.neg to, Still show the same problem: Dbh.pos <- Idents(my.data, WhichCells(my.data, expression = Dbh >0, slot = "data")) Error in CheckDots() : No named arguments passed Dbh.neg <- Idents(my.data, WhichCells(my.data, expression = Dbh == 0, slot = "data")) Error in CheckDots() : No named arguments passed, HmmmEasier to troubleshoot if you would post a, how to make a subset of cells expressing certain gene in seurat R, How a top-ranked engineering school reimagined CS curriculum (Ep. Well occasionally send you account related emails. Error in CellsByIdentities(object = object, cells = cells) : A stupid suggestion, but did you try to give it as a string ? If anybody happens upon this in the future, there was a missing ')' in the above code. between numbers are present in the feature name, Maximum number of cells per identity class, default is But using a union of the variable genes might be even more robust. CCA-Seurat. seuratObj: The seurat object. Here is the slightly modified code I tried with the error: The error after the last line is: Logical expression indicating features/variables to keep, Extra parameters passed to WhichCells, such as slot, invert, or downsample. You can see the code that is actually called as such: SeuratObject:::subset.Seurat, which in turn calls SeuratObject:::WhichCells.Seurat (as @yuhanH mentioned). If no cells are request, return a NULL; identity class, high/low values for particular PCs, ect.. Did the Golden Gate Bridge 'flatten' under the weight of 300,000 people in 1987? If specified, overides subsample.factor. Downsample a seurat object, either globally or subset by a field Usage DownsampleSeurat(seuratObj, targetCells, subsetFields = NULL, seed = GetSeed()) Arguments. DEG. Have a question about this project? Can be used to downsample the data to a certain @del2007: What you showed as an example allows you to sample randomly a maximum of 1000 cells from each cluster who's information is stored in object@ident. This tutorial is meant to give a general overview of each step involved in analyzing a digital gene expression (DGE) matrix generated from a Parse Biosciences single cell whole transcription experiment. The text was updated successfully, but these errors were encountered: Thank you Tim. ctrl2 Micro 1000 cells Seurat has four tests for differential expression which can be set with the test.use parameter: ROC test ("roc"), t-test ("t"), LRT test based on zero-inflated data ("bimod", default), LRT test based on tobit-censoring models ("tobit") The ROC test returns the 'classification power' for any individual marker (ranging from 0 - random, to 1 -