scRNA-seq+scATAC-seq integration

This tutorial shows loading, preprocessing, VIPCCA integration and visualization of scRNA-seq and scATAC-seq dataset.

we obtained a scRNA-seq data consisting of gene expression measurements on 33,538 genes in 11,769 cells and a scATAC-seq data consisting of 89,796 open chromatin peaks on 8,728 nuclei, both were produced by 10X Genomics Chromium system and were on PBMCs.

Preprocessing with R

For the scRNA-seq data: Seurat have previously pre-processed and clustered a scRNA-seq dataset and annotate 13 celltype, and provide the object here. The 13 cell types include 460 B cell progenitor, 2,992 CD14+ Monocytes, 328 CD16+ Monocytes, 1,596 CD4 Memory, 1,047 CD4 Naïve, 383 CD8 effector, 337 CD8 Naïve, 74 Dendritic cell, 592 Double negative T cell, 544 NK cell, 68 pDC, 52 Plateletes, and 599 pre-B cell.

For the scATAC-seq data: First, we load in the provided peak matrix and collapse the peak matrix to a “gene activity matrix” using Signac. Then we filtered out cells that have with fewer than 5,000 total peak counts to focus on a final set of 7,866 cells for analysis. See the Signac website for tutorial and documentation for analysing scATAC-seq data.

After preprocessing, you can converting Seurat Object to AnnData via h5Seurat using R packages (SeuratDisk). In this case, the atac.h5ad file will be generated in the corresponding path.

library(SeuratDisk)
SaveH5Seurat(atac, filename = "atac.h5Seurat")
Convert("atac.h5Seurat", dest = "h5ad")

Besides, you can also directly download the processed scRNA-seq dataset and scATAC-seq dataset files in H5ad format.

Importing scbean package

[1]:
import scbean.model.vipcca as vip
import scbean.tools.utils as tl
import scbean.tools.plotting as pl
import scbean.tools.transferLabel as tfl
import scanpy as sc
from sklearn.preprocessing import LabelEncoder

import matplotlib
matplotlib.use('TkAgg')

# Command for Jupyter notebooks only
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from matplotlib.axes._axes import _log as matplotlib_axes_logger
matplotlib_axes_logger.setLevel('ERROR')
Using TensorFlow backend.

Loading data in python

[2]:
adata_rna = tl.read_sc_data("/Users/zhongyuanke/data/vipcca/atac/rna.h5ad")
adata_atac = tl.read_sc_data("/Users/zhongyuanke/data/vipcca/atac/geneact.h5ad")

Data preprocessing

Here, we filter and normalize each data separately and concatenate them into one AnnData object. For more details, please check the preprocessing API.

[3]:
adata_all= tl.preprocessing([adata_rna, adata_atac])
Trying to set attribute `.obs` of view, copying.
Trying to set attribute `.obs` of view, copying.

VIPCCA Integration

[4]:
# Command for Jupyter notebooks only
import os
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"

# The four hyperparameters epochs, lambda_regulizer, batch_input_size, batch_input_size2 are user-defined parameters.
# Given a dataset with ~10K cells, we suggest to pick up a value larger than 600 for epochs,
# a value in the range of [2,10] for lambda_regulizer,
# a value at least less than the number of input features for batch_input_size,
# a value in the range of [8,16] for batch_input_size2.
handle = vip.VIPCCA(adata_all=adata_all,
                           res_path='/Users/zhongyuanke/data/vipcca/atac_result/',
                           mode='CVAE',
                           split_by="_batch",
                           epochs=20,
                           lambda_regulizer=2,
                           batch_input_size=64,
                           batch_input_size2=14,
                           )
adata_integrate=handle.fit_integrate()
WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "vae_mlp"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
encoder_input (InputLayer)      (None, 2000)         0
__________________________________________________________________________________________________
batch_input1 (InputLayer)       (None, 64)           0
__________________________________________________________________________________________________
encoder_mlp (Model)             [(None, 16), (None,  285088      encoder_input[0][0]
                                                                 batch_input1[0][0]
__________________________________________________________________________________________________
batch_input2 (InputLayer)       (None, 14)           0
__________________________________________________________________________________________________
decoder_mlp (Model)             (None, 2000)         542748      encoder_mlp[1][2]
                                                                 batch_input2[0][0]
==================================================================================================
Total params: 827,836
Trainable params: 826,080
Non-trainable params: 1,756
__________________________________________________________________________________________________
WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/tensorflow_core/python/ops/math_grad.py:1424: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:431: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead.

WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:438: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead.

WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/keras/callbacks/tensorboard_v1.py:200: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead.

WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/keras/callbacks/tensorboard_v1.py:203: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Epoch 1/20
17298/17298 [==============================] - 4s 231us/step - loss: 5511.6127

Epoch 00001: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0001.h5
WARNING:tensorflow:From /Users/zhongyuanke/anaconda3/envs/test-vipcca/lib/python3.7/site-packages/keras/callbacks/tensorboard_v1.py:343: The name tf.Summary is deprecated. Please use tf.compat.v1.Summary instead.

Epoch 2/20
17298/17298 [==============================] - 2s 102us/step - loss: 5372.6625

Epoch 00002: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0002.h5
Epoch 3/20
17298/17298 [==============================] - 2s 100us/step - loss: 5339.4333

Epoch 00003: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0003.h5
Epoch 4/20
17298/17298 [==============================] - 2s 99us/step - loss: 5321.1605

Epoch 00004: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0004.h5
Epoch 5/20
17298/17298 [==============================] - 2s 98us/step - loss: 5312.9137

Epoch 00005: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0005.h5
Epoch 6/20
17298/17298 [==============================] - 2s 101us/step - loss: 5306.5446

Epoch 00006: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0006.h5
Epoch 7/20
17298/17298 [==============================] - 2s 101us/step - loss: 5300.9233

Epoch 00007: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0007.h5
Epoch 8/20
17298/17298 [==============================] - 2s 104us/step - loss: 5297.4187

Epoch 00008: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0008.h5
Epoch 9/20
17298/17298 [==============================] - 2s 103us/step - loss: 5294.1055

Epoch 00009: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0009.h5
Epoch 10/20
17298/17298 [==============================] - 2s 99us/step - loss: 5290.6049

Epoch 00010: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0010.h5
Epoch 11/20
17298/17298 [==============================] - 2s 102us/step - loss: 5289.7595

Epoch 00011: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0011.h5
Epoch 12/20
17298/17298 [==============================] - 2s 102us/step - loss: 5286.7163

Epoch 00012: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0012.h5
Epoch 13/20
17298/17298 [==============================] - 2s 100us/step - loss: 5284.6173

Epoch 00013: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0013.h5
Epoch 14/20
17298/17298 [==============================] - 2s 100us/step - loss: 5282.4393

Epoch 00014: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0014.h5
Epoch 15/20
17298/17298 [==============================] - 2s 108us/step - loss: 5280.8482

Epoch 00015: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0015.h5
Epoch 16/20
17298/17298 [==============================] - 2s 108us/step - loss: 5282.3294

Epoch 00016: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0016.h5
Epoch 17/20
17298/17298 [==============================] - 2s 107us/step - loss: 5278.4776

Epoch 00017: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0017.h5
Epoch 18/20
17298/17298 [==============================] - 2s 108us/step - loss: 5277.1156

Epoch 00018: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0018.h5
Epoch 19/20
17298/17298 [==============================] - 2s 107us/step - loss: 5276.4124

Epoch 00019: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0019.h5
Epoch 20/20
17298/17298 [==============================] - 2s 107us/step - loss: 5276.8685

Epoch 00020: saving model to /Users/zhongyuanke/data/vipcca/atac_result/model_0020.h5
... storing '_batch' as categorical
... storing 'celltype' as categorical
... storing 'tech' as categorical

Cell type prediction

[5]:
atac=tfl.findNeighbors(adata_integrate)

store celltype in adata_integrate.obs[‘celltype’]

[6]:
adata_atac.obs['celltype'] = atac['celltype']
adata = adata_rna.concatenate(adata_atac)
adata_integrate.obs['celltype'] = adata.obs['celltype']

Loading result

Loading result from h5ad file: The output.h5ad file of the trained result can be downloaded here

[7]:
adata_integrate = tl.read_sc_data('/Users/zhongyuanke/data/vipcca/atac_result/output_save.h5ad')

UMAP Visualization

[8]:
sc.pp.neighbors(adata_integrate, use_rep='X_vipcca')
sc.tl.umap(adata_integrate)

sc.set_figure_params(figsize=[5.5, 4.5])
sc.pl.umap(adata_integrate, color=['_batch', 'celltype'])

../_images/tutorials_vipcca_atac_tutorial_18_0.png