Segmentation and Cell Type Mapping¶

Cell Segmentation¶

Cell segmentation associates mRNA spot locations with individual cells, creating transcriptomic profiles that make up the cell-by-gene table.

SpaceTx Cell Segmentation Pipeline

Feature-based nucleus segmentation based on DAPI is applied to stitched 2D volumes and consists of two steps: (i) foreground segmentation and (ii) its consecutive splitting into separate instances (nuclei). For the first task, simple thresholding, either by employing one of the many available thresholding methods or by fine-tuning the cutoff intensity value, was sufficient to segment the foreground in all the available data sets. Splitting of the foreground into separate instances is performed per connected component, which allows efficient parallelizable implementation of our method. Based on the observation that most nuclei have rather regular elliptical shape, we developed an approach inspired by the work of Bilgin et al. [1] than employs elliptic features to extract two types of information: (i) curvature maps whose local minima correspond to locations of separation lines between touching nuclei and (ii) markers cognitively describing shapes of the nuclei and defined as the regions with positive Gaussian curvature and negative mean curvature. Calculation of the curvature maps and the markers is guided by a scale parameter, one for each, the value of which is chosen experimentally based on the average nucleus size. The markers are consecutively turned into super-pixels by performing skeletonization on the foreground with subtracted markers region. Calculated super-pixels are progressively merged based on the inter-pixel similarity score for obtaining the final segmentation. The similarity between two neighbouring super-pixels is defined based on the average value of the curvature map in the region between the corresponding markers and is recalculated after each merging iteration. This process stops after the maximum value of the inter-pixel similarity does not exceed the predefined threshold (whose value was kept fixed for all the test data sets). The described algorithm that worked equally well on the entire variety of SpaceTX data, including DAPI channels of the fluorescence microscopy data as well as 10x Visium histopathology images.
Augmented Cell Segmentation using Baysor. Using the nuclear segementation as a prior for cell location, Baysor assigned mRNA spots to cells probabilistically, including cell size and gene composition.

Cell Type Mapping Methods¶

This is the first time spatial transcriptomics data has been analyzed and compared across methods for cell type determination. We developed approaches to combine multiple cell type mapping methods and applied them to data sets from five experimental methods.

A primary output of segmentation in each of the image-based experimental methods is a table of putative cells which each have a count of the number of molecules per gene as well as a soma location. Cells located outside of the primary visual cortex (VISp) based on expert annotation or that do not pass quality control filters are excluded from further analysis.

We next aimed to assign a cell type and associated confidence to each putative cell based on gene molecule counts. This was done using a combined analysis approach, where the strategies below were used to map cells to cell types from the taxonomy defined for SpaceTx, and the results of these strategies were then combined to produce a single combined cell type call with associated confidence score for each remaining cell.

Annotation of VISp area for mapping Cell locations were visualized in Napari https://napari.org and manually annotated to contain the largest possible area of primary visual cortex. Layers were manually annotated based on cell density and gene expression.

FR-Match cell2cluster [results] [example code]
pciSEQ [results] [example code]
mFISHtools [results] [example code]
Baysor [results] [example code]
Tangram [results] [example code]

Cell Type Mapping Consensus¶

The SpaceTx cell type mapping strategies and combination of these results into consensus mapping results are described in detail in Reference-based cell type matching of spatial transcriptomics data.

Visium Data Single Cell Mapping¶

Visium is often described as unbiased and transcriptome-wide, since it’s not based on a priori selection of genes to be examined and targets all poly-adenlated transcripts. However, the current Visium platform does not provide single cell resolution and, though dependent on tissue, the common estimate is that approximately 1-10 cells contribute to the observed gene expression at each spot. The “mini-bulk” property of spots means that spots rarely can be assigned to a single cell type, and more common is to characterize these w.r.t. cell type compositions (e.g., proportion of cell types present at each location). For such purposes, several methods where single cell/nuclei data is used as a guide reference have been proposed, with the shared objective of decomposing the joint expression profiles observed in a spot into contributions from the cell types defined in the single cell/nuclei data.

stereoscope and Tangram were used to deconvolve the spatial Visium data using non-spatial single cell data, a procedure we refer to as “mapping” of the latter onto the former. In short, once presented with the Visium data, both these approaches will provide a proportion estimate of each cell type within every spot, thus allowing the spatial distribution of each cell type to be charted. Although the final products are similar, the two methods differ substantially in design; stereoscope relying on a probabilistic framework where both data modalities are modelled with a negative binomial distribution, while Tangram cast the the task more as a optimization problem where a map between single cells and spatial locations.

For consistency, the cell type mapping analysis focuses on the small region that constitutes the VISp as shown in Figure 1; even though spatial gene expression data was generated from a larger part of the tissues (about half of a coronal mouse brain section fit onto the capture area). We aim to evaluate the mapping from each method, as well as comparing these two to each other. Nevertheless, establishing a ground truth for the proportion values is arguably even harder than for the cell type identities in the image-based methods, hence concurrence with well-known locations of cell types (e.g., Layer associated) will serve as the basis of our evaluation.

Figure 1. A) zoom in on the regions of interests. Spots included in the analysis are marked on the tissue with black circles. The tissue edged to which distance is measured is indicated by a dashed red line. B) Pearson correlation values between cell type proportion estimates from stereoscope and Tangram. The star (*) on Meis2 indicates that this correlation did not have a significant p-value. C) Smoothed curves (loess smoothing) of the cell type proportions when plotted as a function of distance to the tissue edge (red in A).

By computing the correlation (Pearsons’s r) between proportion estimates for each cell type it’s possible to quantitatively assess how results from the two methods relate. A significant positive correlation between the proportion estimates could be observed for all cell types except Meis2, where the correlation was negative but also non-significant (p = 0.36), see Figure 1B. High correlation values were observed for several of the layer types as well as Macrophages and Astrocytes. Most of the cell types with poor correlation were - according to the proportion estimates - lowly abundant in the tissue, implying mapping of rare cell types likely are more challenging to map and the result associated with higher uncertainty.

Next, we were interested in how the different layer cell types were distributed along the axis orthogonal to the tissue edge, i.e., when travelling further into the tissue (blue arrow Figure 1A). We thus measured the shortest distance for every spot to the tissue edge and modelled the cell type proportion values as a function of this distance, loess (locally estimated scatterplot smoothing) curve smoothing was used to get a more continuous graph, and to better capture the general trends in the data, see Methods. The two methods by en large agreed; albeit not always unimodal, the cell type distributions had one dominant major mode - overlapping well across methods - and exhibited the expected right shift trend (layer types with higher numbers being more prevalent deeper into the cortex and vice versa), see Figure 1C. For some cell types, the distributions were multimodal indicating potential “mismapping”, still in the cases where both methods independently located these peaks at almost identical positions the explanation might be biological or experimental.