Integrating multiple scRNA-seq data

This tutorial shows loading, preprocessing, DAVAE integration and visualization of 293T and Jurkat cells in three different batches (Mixed Cell Lines).

Importing scbean package

Here, we’ll import scbean along with other popular packages.

[1]:
import pandas as pd
import scbean.model.davae as davae
import scbean.tools.utils as tl
import scanpy as sc
import matplotlib
from numpy.random import seed
seed(2021)
matplotlib.use('TkAgg')

# Command for Jupyter notebooks only
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from matplotlib.axes._axes import _log as matplotlib_axes_logger
matplotlib_axes_logger.setLevel('ERROR')
/Users/wangyuwei/opt/anaconda3/lib/python3.8/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.0)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
/Users/wangyuwei/opt/anaconda3/lib/python3.8/site-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

Loading data

This tutorial uses Mixed Cell Line datasets from 10xgenomics with non-overlapping populations from three batches, two of which contain 293t (2885 cells) and jurkat (3258 cells) cells respectively, and the third batch contains a 1:1 mixture of 293t and jurkat cells (3388 cells).

  • Read from 10x mtx file The file in 10x mtx format can be downloaded here. Set the fmt parameter of pp.read_sc_data() function to ‘10x_mtx’ to read the data downloaded from 10XGenomics. If the file downloaded from 10XGenomics is in h5 format, the dataset can be loaded by setting the fmt parameter to ‘10x_h5’.

[2]:
file1 = './data/' + "293t/hg19/"
file2 = './data/' + "jurkat/hg19/"
file3 = './data/' + "jurkat_293t/hg19/"

adata_b1 = tl.read_sc_data(file1, fmt='10x_mtx', batch_name="293t")
adata_b2 = tl.read_sc_data(file2, fmt='10x_mtx', batch_name="jurkat")
adata_b3 = tl.read_sc_data(file3, fmt='10x_mtx', batch_name="mixed")

Data preprocessing

Here, we filter and normalize each data separately and concatenate them into one AnnData object. For more details, please check the preprocessing API.

[3]:
adata_all = tl.davae_preprocessing([adata_b1, adata_b2, adata_b3], index_unique="-")
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.

DAVAE Integration

The code for integration using davae is as following:

[4]:
# Command for Jupyter notebooks only
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

adata_integrate = davae.fit_integration(
    adata_all,
    batch_num=3,
    domain_lambda=0.7,
    epochs=25,
    sparse=True,
    hidden_layers=[64, 32, 6]
)
Model: "vae_mlp"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
inputs (InputLayer)             [(None, 2000)]       0
__________________________________________________________________________________________________
inputs_batch (InputLayer)       [(None, 3)]          0
__________________________________________________________________________________________________
encoder_hx (Functional)         [(None, 6), (None, 6 131020      inputs[0][0]
                                                                 inputs_batch[0][0]
                                                                 inputs[0][0]
                                                                 inputs_batch[0][0]
__________________________________________________________________________________________________
inputs_weights (InputLayer)     [(None, 1)]          0
__________________________________________________________________________________________________
decoder_x (Functional)          (None, 2000)         132720      encoder_hx[0][2]
                                                                 inputs_batch[0][0]
__________________________________________________________________________________________________
domain_classifier (Functional)  (None, 3)            211         encoder_hx[1][2]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 2003)         0           inputs[0][0]
                                                                 inputs_batch[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 64)           128256      concatenate[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 64)           192         dense[0][0]
__________________________________________________________________________________________________
activation (Activation)         (None, 64)           0           batch_normalization[0][0]
__________________________________________________________________________________________________
dropout (Dropout)               (None, 64)           0           activation[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 32)           2080        dropout[0][0]
__________________________________________________________________________________________________
tf.math.multiply_1 (TFOpLambda) (None, 2000)         0           decoder_x[0][0]
                                                                 inputs_weights[0][0]
__________________________________________________________________________________________________
tf.math.multiply (TFOpLambda)   (None, 2000)         0           inputs[0][0]
                                                                 inputs_weights[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32)           96          dense_1[0][0]
__________________________________________________________________________________________________
tf.math.subtract (TFOpLambda)   (None, 2000)         0           tf.math.multiply[0][0]
                                                                 tf.math.multiply_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 32)           0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_1 (TFOpLamb (1, 1)               0           tf.math.subtract[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 32)           0           activation_1[0][0]
__________________________________________________________________________________________________
tf.math.subtract_1 (TFOpLambda) (None, 2000)         0           tf.math.subtract[0][0]
                                                                 tf.math.reduce_mean_1[0][0]
__________________________________________________________________________________________________
hx_log_var (Dense)              (None, 6)            198         dropout_1[0][0]
__________________________________________________________________________________________________
hx_mean (Dense)                 (None, 6)            198         dropout_1[0][0]
__________________________________________________________________________________________________
tf.convert_to_tensor (TFOpLambd (None, 2000)         0           tf.math.multiply_1[0][0]
__________________________________________________________________________________________________
tf.cast (TFOpLambda)            (None, 2000)         0           tf.math.multiply[0][0]
__________________________________________________________________________________________________
tf.math.square (TFOpLambda)     (None, 2000)         0           tf.math.subtract_1[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 6)            0           hx_log_var[0][0]
__________________________________________________________________________________________________
tf.math.square_1 (TFOpLambda)   (None, 6)            0           hx_mean[0][0]
__________________________________________________________________________________________________
tf.math.squared_difference (TFO (None, 2000)         0           tf.convert_to_tensor[0][0]
                                                                 tf.cast[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_2 (TFOpLamb ()                   0           tf.math.square[0][0]
__________________________________________________________________________________________________
tf.math.subtract_2 (TFOpLambda) (None, 6)            0           tf.__operators__.add_1[0][0]
                                                                 tf.math.square_1[0][0]
__________________________________________________________________________________________________
tf.math.exp (TFOpLambda)        (None, 6)            0           hx_log_var[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean (TFOpLambda (None,)              0           tf.math.squared_difference[0][0]
__________________________________________________________________________________________________
tf.math.truediv (TFOpLambda)    ()                   0           tf.math.reduce_mean_2[0][0]
__________________________________________________________________________________________________
tf.math.truediv_1 (TFOpLambda)  ()                   0           tf.math.reduce_mean_2[0][0]
__________________________________________________________________________________________________
tf.math.log (TFOpLambda)        ()                   0           tf.math.reduce_mean_2[0][0]
__________________________________________________________________________________________________
tf.math.subtract_3 (TFOpLambda) (None, 6)            0           tf.math.subtract_2[0][0]
                                                                 tf.math.exp[0][0]
__________________________________________________________________________________________________
tf.math.multiply_2 (TFOpLambda) (None,)              0           tf.math.reduce_mean[0][0]
                                                                 tf.math.truediv[0][0]
__________________________________________________________________________________________________
tf.math.multiply_3 (TFOpLambda) ()                   0           tf.math.truediv_1[0][0]
                                                                 tf.math.log[0][0]
__________________________________________________________________________________________________
tf.math.reduce_sum (TFOpLambda) (None,)              0           tf.math.subtract_3[0][0]
__________________________________________________________________________________________________
tf.keras.backend.categorical_cr (None,)              0           inputs_batch[0][0]
                                                                 domain_classifier[0][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None,)              0           tf.math.multiply_2[0][0]
                                                                 tf.math.multiply_3[0][0]
__________________________________________________________________________________________________
tf.math.multiply_4 (TFOpLambda) (None,)              0           tf.math.reduce_sum[0][0]
__________________________________________________________________________________________________
tf.math.multiply_5 (TFOpLambda) (None,)              0           tf.keras.backend.categorical_cros
__________________________________________________________________________________________________
tf.__operators__.add_2 (TFOpLam (None,)              0           tf.__operators__.add[0][0]
                                                                 tf.math.multiply_4[0][0]
__________________________________________________________________________________________________
tf.math.multiply_6 (TFOpLambda) (None,)              0           tf.math.multiply_5[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_3 (TFOpLam (None,)              0           tf.__operators__.add_2[0][0]
                                                                 tf.math.multiply_6[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_3 (TFOpLamb ()                   0           tf.__operators__.add_3[0][0]
__________________________________________________________________________________________________
add_loss (AddLoss)              ()                   0           tf.math.reduce_mean_3[0][0]
==================================================================================================
Total params: 263,951
Trainable params: 263,535
Non-trainable params: 416
__________________________________________________________________________________________________
Epoch 1/25
75/75 [==============================] - 3s 10ms/step - loss: 1101.9201
Epoch 2/25
75/75 [==============================] - 1s 9ms/step - loss: 203.6247
Epoch 3/25
75/75 [==============================] - 1s 10ms/step - loss: 130.9191
Epoch 4/25
75/75 [==============================] - 1s 9ms/step - loss: -35.7390
Epoch 5/25
75/75 [==============================] - 1s 9ms/step - loss: 386.6588
Epoch 6/25
75/75 [==============================] - 1s 9ms/step - loss: -184.9128
Epoch 7/25
75/75 [==============================] - 1s 9ms/step - loss: -161.0310
Epoch 8/25
75/75 [==============================] - 1s 9ms/step - loss: 24.7651
Epoch 9/25
75/75 [==============================] - 1s 10ms/step - loss: 9.3653
Epoch 10/25
75/75 [==============================] - 1s 10ms/step - loss: -378.8922
Epoch 11/25
75/75 [==============================] - 1s 9ms/step - loss: -208.0573
Epoch 12/25
75/75 [==============================] - 1s 9ms/step - loss: -341.3141
Epoch 13/25
75/75 [==============================] - 1s 10ms/step - loss: -302.1562
Epoch 14/25
75/75 [==============================] - 1s 9ms/step - loss: -294.0502
Epoch 15/25
75/75 [==============================] - 1s 10ms/step - loss: -192.3871
Epoch 16/25
75/75 [==============================] - 1s 9ms/step - loss: -190.0139
Epoch 17/25
75/75 [==============================] - 1s 10ms/step - loss: -264.8898
Epoch 18/25
75/75 [==============================] - 1s 9ms/step - loss: -178.6996
Epoch 19/25
75/75 [==============================] - 1s 10ms/step - loss: -209.5481
Epoch 20/25
75/75 [==============================] - 1s 10ms/step - loss: -222.0254
Epoch 21/25
75/75 [==============================] - 1s 10ms/step - loss: -158.8237
Epoch 22/25
75/75 [==============================] - 1s 10ms/step - loss: -395.8481
Epoch 23/25
75/75 [==============================] - 1s 9ms/step - loss: -261.4448
Epoch 24/25
75/75 [==============================] - 1s 10ms/step - loss: -252.9768
Epoch 25/25
75/75 [==============================] - 1s 10ms/step - loss: -279.0494
[5]:
adata_integrate
[5]:
AnnData object with n_obs × n_vars = 9530 × 2000
    obs: '_batch', 'n_genes', 'percent_mito', 'n_counts', 'size_factor', 'loss_weight', 'batch_label', 'batch'
    var: 'gene_ids', 'n_cells-0-0', 'highly_variable-0-0', 'means-0-0', 'dispersions-0-0', 'dispersions_norm-0-0', 'n_cells-1-0', 'highly_variable-1-0', 'means-1-0', 'dispersions-1-0', 'dispersions_norm-1-0', 'n_cells-1', 'highly_variable-1', 'means-1', 'dispersions-1', 'dispersions_norm-1'
    obsm: 'X_davae'

1.The meta.data of each cell has been saved in adata.obs

2.The embedding representation of davae for each cell have been saved in adata.obsm(‘X_davae’)

Loading result from h5ad file: You can also download and use the integrated results. The output.h5ad file of the DAVAE result can be downloaded here

[28]:
adata_integrate = sc.read_h5ad('./adata_integrate.h5ad')
[29]:
adata_integrate
[29]:
AnnData object with n_obs × n_vars = 9530 × 2000
    obs: '_batch', 'n_genes', 'percent_mito', 'n_counts', 'size_factor', 'loss_weight', 'batch_label', 'batch', 'leiden', 'celltype'
    var: 'gene_ids', 'n_cells-0-0', 'highly_variable-0-0', 'means-0-0', 'dispersions-0-0', 'dispersions_norm-0-0', 'n_cells-1-0', 'highly_variable-1-0', 'means-1-0', 'dispersions-1-0', 'dispersions_norm-1-0', 'n_cells-1', 'highly_variable-1', 'means-1', 'dispersions-1', 'dispersions_norm-1'
    uns: '_batch_colors', 'celltype_colors', 'leiden', 'leiden_colors', 'neighbors'
    obsm: 'X_davae', 'X_umap'
    obsp: 'connectivities', 'distances'

UMAP Visualization

We use UMAP to reduce the embedding feature output by DAVAE in 2 dimensions.

[30]:
import umap
adata_integrate.obsm['X_umap']=umap.UMAP().fit_transform(adata_integrate.obsm['X_davae'])
adata_integrate.uns['_batch_colors'] = ['#FF34FF', '#4FC601', '#3B5DFF']
sc.pl.umap(adata_integrate, color=['_batch', 'celltype'], s=3)
../_images/tutorials_davae_mcl_tutorial_16_0.png