Multi-sample tissue compartment detection

1. Integrate samples with Scanorama

[1]:
import scanorama
import numpy as np
import scanpy as sc
import matplotlib.pyplot as plt
import chrysalis as ch

We begin by loading the anterior and posterior mouse brain samples. We proceed by removing the low-quality capture spots and genes that are expressed in fewer than 10 spots and appending samples to the adatas list. We then run scanorama using the dedicated scanpy function to correct the expression matrices.

[2]:
samples = ['V1_Mouse_Brain_Sagittal_Anterior_Section_2', 'V1_Mouse_Brain_Sagittal_Posterior_Section_2']

adatas = []

for sample in samples:
    ad = sc.datasets.visium_sge(sample_id=sample)
    ad.var_names_make_unique()

    sc.pp.calculate_qc_metrics(ad, inplace=True)
    sc.pp.filter_cells(ad, min_counts=1000)
    sc.pp.filter_genes(ad, min_cells=10)

    sc.pp.normalize_total(ad, inplace=True, target_sum=1e4)
    sc.pp.log1p(ad)

    ch.detect_svgs(ad)

    adatas.append(ad)

Calculating SVGs: 100%|██████████| 12429/12429 [01:53<00:00, 109.57it/s]
Calculating SVGs: 100%|██████████| 12206/12206 [02:20<00:00, 86.96it/s]
[3]:
adatas_cor = scanorama.correct_scanpy(adatas, return_dimred=True)
Found 16415 genes among all datasets
[[0.         0.50372208]
 [0.         0.        ]]
Processing datasets (0, 1)

ch.integrate_adatas can be used to calculate spatially variable genes (SVGs) for each sample in the input list or dictionary and concatenate the identified SVG gene sets across samples. Sample IDs are automatically assigned [0, 1, …] if not stated otherwise.

2. Run chrysalis on the integrated samples

[4]:
adata = ch.integrate_adatas(adatas_cor, sample_col='sample')

A single AnnData object is created with a .obs columns storing the union of SVGs. Sample-wise Moran’s I values and the identified SVGs are stored in adata.varm. adata.obs['sample'] stores the sample IDs for each observation.

[5]:
print(adata)
AnnData object with n_obs × n_vars = 6104 × 16415
    obs: 'in_tissue', 'array_row', 'array_col', 'n_genes_by_counts', 'log1p_n_genes_by_counts', 'total_counts', 'log1p_total_counts', 'pct_counts_in_top_50_genes', 'pct_counts_in_top_100_genes', 'pct_counts_in_top_200_genes', 'pct_counts_in_top_500_genes', 'n_counts', 'sample'
    var: 'gene_ids', 'feature_types', 'genome', 'n_cells_by_counts', 'mean_counts', 'log1p_mean_counts', 'pct_dropout_by_counts', 'total_counts', 'log1p_total_counts', 'n_cells', "Moran's I", 'spatially_variable', 'gene_symbols'
    uns: 'spatial', 'log1p'
    obsm: 'spatial', 'X_scanorama'
    varm: 'spatially_variable', "Moran's I"

As calculatting SVGs might take longer depending on the dataset size, it is recommended to save the AnnData object after this step, or to save the samples individually before appending them to adatas in cell 1.

adata.write('mouse_ap_integrated.h5ad')

adata = sc.read_h5ad('mouse_ap_integrated.h5ad')

We can proceed by calling ch.pca and ch.aa as shown in the previous tutorial. By looking at the explained variance curve, we can select 10 tissue compartments, however, the number of compartments can be increased depending on the resolution that is suitable for the specific dataset or inquiry.

[6]:
adata.var.index = adata.var['gene_symbols']
adata.var_names = adata.var['gene_symbols']
ch.pca(adata)

ch.plot_explained_variance(adata)
plt.show()
../_images/tutorials_mouse_brain_integration_tutorial_13_0.png

After tissue compartment inference, we may visualize the samples stored in the integrated AnnData object simultaniously using ch.plot_samples.

[7]:
ch.aa(adata, n_pcs=20, n_archetypes=10)
[8]:
ch.plot_samples(adata, 1, 2, dim=10, suptitle='Mouse brain', show_title=False)
plt.show()
../_images/tutorials_mouse_brain_integration_tutorial_16_0.png

If we would like to examine the spatial distribution of a specific tissue compartment, we can visualize it using the selected_comp parameter.

[9]:
ch.plot_samples(adata, 1, 2, dim=10, selected_comp=9, suptitle='Mouse brain', show_title=False)
plt.show()
../_images/tutorials_mouse_brain_integration_tutorial_18_0.png

We can observe clear continuity between tissue compartments across samples indicating that the integration was successful.

[10]:
for i in np.unique(adata.obs['sample']):
    ch.plot_compartments(adata, sample_id=i, ncols=5, rasterized=True, title_size=12)
    plt.show()
../_images/tutorials_mouse_brain_integration_tutorial_20_0.png
../_images/tutorials_mouse_brain_integration_tutorial_20_1.png

Finally we can take a look at the integrated tissue compartment signatures.

[11]:
ch.plot_weights(adata, ncols=5)
plt.show()
../_images/tutorials_mouse_brain_integration_tutorial_22_0.png