Integrating multiple scRNA-seq data

This tutorial shows loading, preprocessing, DAVAE integration and visualization of 293T and Jurkat cells in three different batches (Mixed Cell Lines).

Importing scbean package

Here, we’ll import scbean along with other popular packages.

[1]:

import pandas as pd
import scbean.model.davae as davae
import scbean.tools.utils as tl
import scanpy as sc
import matplotlib
from numpy.random import seed
seed(2021)
matplotlib.use('TkAgg')

# Command for Jupyter notebooks only
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from matplotlib.axes._axes import _log as matplotlib_axes_logger
matplotlib_axes_logger.setLevel('ERROR')

/Users/wangyuwei/opt/anaconda3/lib/python3.8/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.0)
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
/Users/wangyuwei/opt/anaconda3/lib/python3.8/site-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.9) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

Loading data

This tutorial uses Mixed Cell Line datasets from 10xgenomics with non-overlapping populations from three batches, two of which contain 293t (2885 cells) and jurkat (3258 cells) cells respectively, and the third batch contains a 1:1 mixture of 293t and jurkat cells (3388 cells).

Read from 10x mtx file The file in 10x mtx format can be downloaded here. Set the fmt parameter of pp.read_sc_data() function to ‘10x_mtx’ to read the data downloaded from 10XGenomics. If the file downloaded from 10XGenomics is in h5 format, the dataset can be loaded by setting the fmt parameter to ‘10x_h5’.

[2]:

file1 = './data/' + "293t/hg19/"
file2 = './data/' + "jurkat/hg19/"
file3 = './data/' + "jurkat_293t/hg19/"

adata_b1 = tl.read_sc_data(file1, fmt='10x_mtx', batch_name="293t")
adata_b2 = tl.read_sc_data(file2, fmt='10x_mtx', batch_name="jurkat")
adata_b3 = tl.read_sc_data(file3, fmt='10x_mtx', batch_name="mixed")

Data preprocessing

Here, we filter and normalize each data separately and concatenate them into one AnnData object. For more details, please check the preprocessing API.

[3]:

adata_all = tl.davae_preprocessing([adata_b1, adata_b2, adata_b3], index_unique="-")

Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.

DAVAE Integration

The code for integration using davae is as following:

[4]:

# Command for Jupyter notebooks only
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

adata_integrate = davae.fit_integration(
    adata_all,
    batch_num=3,
    domain_lambda=0.7,
    epochs=25,
    sparse=True,
    hidden_layers=[64, 32, 6]
)

Model: "vae_mlp"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
inputs (InputLayer)             [(None, 2000)]       0
__________________________________________________________________________________________________
inputs_batch (InputLayer)       [(None, 3)]          0
__________________________________________________________________________________________________
encoder_hx (Functional)         [(None, 6), (None, 6 131020      inputs[0][0]
                                                                 inputs_batch[0][0]
                                                                 inputs[0][0]
                                                                 inputs_batch[0][0]
__________________________________________________________________________________________________
inputs_weights (InputLayer)     [(None, 1)]          0
__________________________________________________________________________________________________
decoder_x (Functional)          (None, 2000)         132720      encoder_hx[0][2]
                                                                 inputs_batch[0][0]
__________________________________________________________________________________________________
domain_classifier (Functional)  (None, 3)            211         encoder_hx[1][2]
__________________________________________________________________________________________________
concatenate (Concatenate)       (None, 2003)         0           inputs[0][0]
                                                                 inputs_batch[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 64)           128256      concatenate[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 64)           192         dense[0][0]
__________________________________________________________________________________________________
activation (Activation)         (None, 64)           0           batch_normalization[0][0]
__________________________________________________________________________________________________
dropout (Dropout)               (None, 64)           0           activation[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 32)           2080        dropout[0][0]
__________________________________________________________________________________________________
tf.math.multiply_1 (TFOpLambda) (None, 2000)         0           decoder_x[0][0]
                                                                 inputs_weights[0][0]
__________________________________________________________________________________________________
tf.math.multiply (TFOpLambda)   (None, 2000)         0           inputs[0][0]
                                                                 inputs_weights[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 32)           96          dense_1[0][0]
__________________________________________________________________________________________________
tf.math.subtract (TFOpLambda)   (None, 2000)         0           tf.math.multiply[0][0]
                                                                 tf.math.multiply_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 32)           0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_1 (TFOpLamb (1, 1)               0           tf.math.subtract[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 32)           0           activation_1[0][0]
__________________________________________________________________________________________________
tf.math.subtract_1 (TFOpLambda) (None, 2000)         0           tf.math.subtract[0][0]
                                                                 tf.math.reduce_mean_1[0][0]
__________________________________________________________________________________________________
hx_log_var (Dense)              (None, 6)            198         dropout_1[0][0]
__________________________________________________________________________________________________
hx_mean (Dense)                 (None, 6)            198         dropout_1[0][0]
__________________________________________________________________________________________________
tf.convert_to_tensor (TFOpLambd (None, 2000)         0           tf.math.multiply_1[0][0]
__________________________________________________________________________________________________
tf.cast (TFOpLambda)            (None, 2000)         0           tf.math.multiply[0][0]
__________________________________________________________________________________________________
tf.math.square (TFOpLambda)     (None, 2000)         0           tf.math.subtract_1[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_1 (TFOpLam (None, 6)            0           hx_log_var[0][0]
__________________________________________________________________________________________________
tf.math.square_1 (TFOpLambda)   (None, 6)            0           hx_mean[0][0]
__________________________________________________________________________________________________
tf.math.squared_difference (TFO (None, 2000)         0           tf.convert_to_tensor[0][0]
                                                                 tf.cast[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_2 (TFOpLamb ()                   0           tf.math.square[0][0]
__________________________________________________________________________________________________
tf.math.subtract_2 (TFOpLambda) (None, 6)            0           tf.__operators__.add_1[0][0]
                                                                 tf.math.square_1[0][0]
__________________________________________________________________________________________________
tf.math.exp (TFOpLambda)        (None, 6)            0           hx_log_var[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean (TFOpLambda (None,)              0           tf.math.squared_difference[0][0]
__________________________________________________________________________________________________
tf.math.truediv (TFOpLambda)    ()                   0           tf.math.reduce_mean_2[0][0]
__________________________________________________________________________________________________
tf.math.truediv_1 (TFOpLambda)  ()                   0           tf.math.reduce_mean_2[0][0]
__________________________________________________________________________________________________
tf.math.log (TFOpLambda)        ()                   0           tf.math.reduce_mean_2[0][0]
__________________________________________________________________________________________________
tf.math.subtract_3 (TFOpLambda) (None, 6)            0           tf.math.subtract_2[0][0]
                                                                 tf.math.exp[0][0]
__________________________________________________________________________________________________
tf.math.multiply_2 (TFOpLambda) (None,)              0           tf.math.reduce_mean[0][0]
                                                                 tf.math.truediv[0][0]
__________________________________________________________________________________________________
tf.math.multiply_3 (TFOpLambda) ()                   0           tf.math.truediv_1[0][0]
                                                                 tf.math.log[0][0]
__________________________________________________________________________________________________
tf.math.reduce_sum (TFOpLambda) (None,)              0           tf.math.subtract_3[0][0]
__________________________________________________________________________________________________
tf.keras.backend.categorical_cr (None,)              0           inputs_batch[0][0]
                                                                 domain_classifier[0][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None,)              0           tf.math.multiply_2[0][0]
                                                                 tf.math.multiply_3[0][0]
__________________________________________________________________________________________________
tf.math.multiply_4 (TFOpLambda) (None,)              0           tf.math.reduce_sum[0][0]
__________________________________________________________________________________________________
tf.math.multiply_5 (TFOpLambda) (None,)              0           tf.keras.backend.categorical_cros
__________________________________________________________________________________________________
tf.__operators__.add_2 (TFOpLam (None,)              0           tf.__operators__.add[0][0]
                                                                 tf.math.multiply_4[0][0]
__________________________________________________________________________________________________
tf.math.multiply_6 (TFOpLambda) (None,)              0           tf.math.multiply_5[0][0]
__________________________________________________________________________________________________
tf.__operators__.add_3 (TFOpLam (None,)              0           tf.__operators__.add_2[0][0]
                                                                 tf.math.multiply_6[0][0]
__________________________________________________________________________________________________
tf.math.reduce_mean_3 (TFOpLamb ()                   0           tf.__operators__.add_3[0][0]
__________________________________________________________________________________________________
add_loss (AddLoss)              ()                   0           tf.math.reduce_mean_3[0][0]
==================================================================================================
Total params: 263,951
Trainable params: 263,535
Non-trainable params: 416
__________________________________________________________________________________________________
Epoch 1/25
75/75 [==============================] - 3s 10ms/step - loss: 1101.9201
Epoch 2/25
75/75 [==============================] - 1s 9ms/step - loss: 203.6247
Epoch 3/25
75/75 [==============================] - 1s 10ms/step - loss: 130.9191
Epoch 4/25
75/75 [==============================] - 1s 9ms/step - loss: -35.7390
Epoch 5/25
75/75 [==============================] - 1s 9ms/step - loss: 386.6588
Epoch 6/25
75/75 [==============================] - 1s 9ms/step - loss: -184.9128
Epoch 7/25
75/75 [==============================] - 1s 9ms/step - loss: -161.0310
Epoch 8/25
75/75 [==============================] - 1s 9ms/step - loss: 24.7651
Epoch 9/25
75/75 [==============================] - 1s 10ms/step - loss: 9.3653
Epoch 10/25
75/75 [==============================] - 1s 10ms/step - loss: -378.8922
Epoch 11/25
75/75 [==============================] - 1s 9ms/step - loss: -208.0573
Epoch 12/25
75/75 [==============================] - 1s 9ms/step - loss: -341.3141
Epoch 13/25
75/75 [==============================] - 1s 10ms/step - loss: -302.1562
Epoch 14/25
75/75 [==============================] - 1s 9ms/step - loss: -294.0502
Epoch 15/25
75/75 [==============================] - 1s 10ms/step - loss: -192.3871
Epoch 16/25
75/75 [==============================] - 1s 9ms/step - loss: -190.0139
Epoch 17/25
75/75 [==============================] - 1s 10ms/step - loss: -264.8898
Epoch 18/25
75/75 [==============================] - 1s 9ms/step - loss: -178.6996
Epoch 19/25
75/75 [==============================] - 1s 10ms/step - loss: -209.5481
Epoch 20/25
75/75 [==============================] - 1s 10ms/step - loss: -222.0254
Epoch 21/25
75/75 [==============================] - 1s 10ms/step - loss: -158.8237
Epoch 22/25
75/75 [==============================] - 1s 10ms/step - loss: -395.8481
Epoch 23/25
75/75 [==============================] - 1s 9ms/step - loss: -261.4448
Epoch 24/25
75/75 [==============================] - 1s 10ms/step - loss: -252.9768
Epoch 25/25
75/75 [==============================] - 1s 10ms/step - loss: -279.0494

[5]:

adata_integrate

[5]:

AnnData object with n_obs × n_vars = 9530 × 2000
    obs: '_batch', 'n_genes', 'percent_mito', 'n_counts', 'size_factor', 'loss_weight', 'batch_label', 'batch'
    var: 'gene_ids', 'n_cells-0-0', 'highly_variable-0-0', 'means-0-0', 'dispersions-0-0', 'dispersions_norm-0-0', 'n_cells-1-0', 'highly_variable-1-0', 'means-1-0', 'dispersions-1-0', 'dispersions_norm-1-0', 'n_cells-1', 'highly_variable-1', 'means-1', 'dispersions-1', 'dispersions_norm-1'
    obsm: 'X_davae'

1.The meta.data of each cell has been saved in adata.obs

2.The embedding representation of davae for each cell have been saved in adata.obsm(‘X_davae’)

Loading result from h5ad file: You can also download and use the integrated results. The output.h5ad file of the DAVAE result can be downloaded here

[28]:

adata_integrate = sc.read_h5ad('./adata_integrate.h5ad')

[29]:

adata_integrate

[29]:

AnnData object with n_obs × n_vars = 9530 × 2000
    obs: '_batch', 'n_genes', 'percent_mito', 'n_counts', 'size_factor', 'loss_weight', 'batch_label', 'batch', 'leiden', 'celltype'
    var: 'gene_ids', 'n_cells-0-0', 'highly_variable-0-0', 'means-0-0', 'dispersions-0-0', 'dispersions_norm-0-0', 'n_cells-1-0', 'highly_variable-1-0', 'means-1-0', 'dispersions-1-0', 'dispersions_norm-1-0', 'n_cells-1', 'highly_variable-1', 'means-1', 'dispersions-1', 'dispersions_norm-1'
    uns: '_batch_colors', 'celltype_colors', 'leiden', 'leiden_colors', 'neighbors'
    obsm: 'X_davae', 'X_umap'
    obsp: 'connectivities', 'distances'

UMAP Visualization

We use UMAP to reduce the embedding feature output by DAVAE in 2 dimensions.

[30]:

import umap
adata_integrate.obsm['X_umap']=umap.UMAP().fit_transform(adata_integrate.obsm['X_davae'])
adata_integrate.uns['_batch_colors'] = ['#FF34FF', '#4FC601', '#3B5DFF']
sc.pl.umap(adata_integrate, color=['_batch', 'celltype'], s=3)

../_images/tutorials_davae_mcl_tutorial_16_0.png