API

Preprocessing

scbean.tools.utils.davae_preprocessing(datasets, min_cells=1, min_genes=1, n_top_genes=2000, mt_ratio=0.8, lognorm=True, hvg=True, index_unique=None)[source]

Preprocess and merge data sets from different batches

Parameters:
  • datasets (list, optional (default: None)) – the list of anndata objects from different batches

  • min_cells (int, optional (default: 1)) – Minimum number of counts required for a cell to pass filtering.

  • min_genes (int, optional (default: 1)) – Minimum number of counts required for a gene to pass filtering.

  • n_top_genes (int, optional (default: 2000)) – Number of highly-variable genes to keep.

  • mt_ratio (double, optional (default: 0.8)) – Maximum proportion of mito genes for a cell to pass filtering.

  • lognorm (bool, optional (default: True)) – If True, execute lognorm() function.

  • hvg (bool, optional (default: True)) – If True, choose hypervariable genes for AnnData object.

  • index_unique (string, optional (default: None)) – Make the index unique by joining the existing index names with the batch category, using index_unique=’-’, for instance. Provide None to keep existing indices.

Returns:

adata

Return type:

AnnData

scbean.tools.utils.preprocessing(datasets, min_cells=1, min_genes=1, n_top_genes=2000, mt_ratio=0.8, lognorm=True, hvg=True, index_unique=None)[source]

Preprocess and merge data sets from different batches

Parameters:
  • datasets (list, optional (default: None)) – the list of anndata objects from different batches

  • min_cells (int, optional (default: 1)) – Minimum number of counts required for a cell to pass filtering.

  • min_genes (int, optional (default: 1)) – Minimum number of counts required for a gene to pass filtering.

  • n_top_genes (int, optional (default: 2000)) – Number of highly-variable genes to keep.

  • mt_ratio (double, optional (default: 0.8)) – Maximum proportion of mito genes for a cell to pass filtering.

  • lognorm (bool, optional (default: True)) – If True, execute lognorm() function.

  • hvg (bool, optional (default: True)) – If True, choose hypervariable genes for AnnData object.

  • index_unique (string, optional (default: None)) – Make the index unique by joining the existing index names with the batch category, using index_unique=’-’, for instance. Provide None to keep existing indices.

Returns:

adata_norm

Return type:

AnnData

scbean.tools.utils.read_sc_data(input_file, fmt='h5ad', backed=None, transpose=False, sparse=False, delimiter=' ', unique_name=True, batch_name=None, var_names='gene_symbols')[source]

Read single cell dataset

Parameters:
  • input_file (string) – The path of the file to be read.

  • fmt (string, optional (default: 'h5ad')) – The file type of the file to be read.

  • backed (Union[Literal[‘r’, ‘r+’], bool, None] (default: None)) – If ‘r’, load AnnData in backed mode instead of fully loading it into memory (memory mode). If you want to modify backed attributes of the AnnData object, you need to choose ‘r+’.

  • transpose (bool, optional (default: False)) – Whether to transpose the read data.

  • sparse (bool, optional (default: False)) – Whether the data in the dataset is stored in sparse matrix format.

  • delimiter (str, optional (default: ' ')) – Delimiter that separates data within text file. If None, will split at arbitrary number of white spaces, which is different from enforcing splitting at single white space ‘ ‘.

  • unique_name (bool, optional (default: False)) – If Ture, AnnData object execute var_names_make_unique() and obs_names_make_unique() functions.

  • batch_name (string, optional (default: None)) – Batch name of current batch data

  • var_names (Literal[‘gene_symbols’, ‘gene_ids’] (default: 'gene_symbols')) – The variables index when the file type is ‘mtx’.

Returns:

adata

Return type:

AnnData

scbean.tools.utils.spatial_preprocessing(datasets, min_cells=1, min_genes=1, n_top_genes=2000, lognorm=True, hvg=True)[source]

Preprocess and merge two visium datasets from different batches

Parameters:
  • datasets (list, optional (default: None)) – The list of anndata objects from different batches

  • min_cells (int, optional (default: 1)) – Minimum number of counts required for a cell to pass filtering.

  • min_genes (int, optional (default: 1)) – Minimum number of counts required for a gene to pass filtering.

  • n_top_genes (int, optional (default: 2000)) – Number of highly-variable genes to keep.

  • lognorm (bool, optional (default: True)) – If True, execute lognorm() function.

  • hvg (bool, optional (default: True)) – If True, choose hypervariable genes for AnnData object.

Returns:

adata

Return type:

AnnData

scbean.tools.utils.spatial_rna_preprocessing(adata_spatial, adata_rna, lognorm=True, hvg=True, n_top_genes=2000)[source]

Preprocess and merge visium dataset with scRNA-seq dataset.

Parameters:
  • adata_spatial (AnnData) – AnnData object of visium dataset.

  • adata_rna (AnnData) – AnnData object of scRNA-seq dataset.

  • lognorm (bool, optional (default: True)) – If True, execute lognorm() function.

  • hvg (bool, optional (default: True)) – If True, choose hypervariable genes for AnnData object.

  • n_top_genes (int, optional (default: 2000)) – Number of highly-variable genes to keep.

Returns:

adata

Return type:

AnnData

Plotting

scbean.tools.plotting.plotCorrelation(y, y_pred, save=True, result_path='./', show=True, rnum=10000.0, lim=20)[source]

Plot correlation between original data and corrected data

Parameters:
  • y (matrix or csr_matrix) – The original data matrix.

  • y_pred (matrix or csr_matrix) – The data matrix integrated by vipcca.

  • save (bool, optional (default: True)) – If True, save the figure into result_path.

  • result_path (string, optional (default: './')) – The path for saving the figure.

  • show (bool, optional (default: True)) – If True, show the figure.

  • rnum (double, optional (default: 1e4)) – The number of points you want to sample randomly in the matrix.

  • lim (int, optional (default: 20)) – the right parameter of matplotlib.pyplot.xlim(left, right)

DAVAE

scbean.model.davae.fit_integration(adata, batch_num=2, mode='DACVAE', split_by='batch_label', epochs=20, batch_size=128, domain_lambda=1.0, sparse=True, hidden_layers=[128, 64, 32, 5])[source]

/ Build DAVAE model and fit the data to the model for training.

Parameters:
  • adata (AnnData) – AnnData object need to be integrated.

  • batch_num (int, optional (default: 2)) – Number of batches of datasets to be integrated.

  • mode (string, optional (default: 'DACVAE')) – if ‘DACVAE’, construct a DACVAE model if ‘DAVAE’, construct a DAVAE model

  • split_by (string, optional (default: '_batch')) – the obsm_name of obsm used to distinguish different batches.

  • epochs (int, optional (default: 200)) – Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.

  • batch_size (int or None, optional (default: 256)) – Number of samples per gradient update. If unspecified, batch_size will default to 32.

  • domain_lambda (double, optional (default: 1.0)) – The coefficient multiplied by the loss value of the domian classifier of DAVAE model.

  • sparse (bool, optional (default: True)) – If True, Matrix X in the AnnData object is stored as a sparse matrix.

  • hidden_layers (list of integers, (default: [128,64,32,5])) – Number of hidden layer neurons in the model.

Returns:

out_adata

Return type:

AnnData

VIPCCA

class scbean.model.vipcca.VIPCCA(adata_all=None, patience_es=50, patience_lr=25, epochs=1000, res_path=None, split_by='_batch', method='lognorm', hvg=True, batch_input_size=128, batch_input_size2=16, activation='softplus', dropout_rate=0.01, hidden_layers=[128, 64, 32, 16], lambda_regulizer=5.0, initializer='glorot_uniform', l1_l2=(0.0, 0.0), mode='CVAE', model_file=None, save=True)[source]

Bases: object

Initialize VIPCCA object

Parameters


patience_es: int, optional (default: 50)

number of epochs with no improvement after which training will be stopped.

patience_lr: int, optional (default: 25)

number of epochs with no improvement after which learning rate will be reduced.

epochs: int, optional (default: 1000)

Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.

res_path: string, (default: None)

Folder path to save model training results model.h5 and output data adata.h5ad.

split_by: string, optional (default: ‘_batch’)

the obsm_name of obsm used to distinguish different batches.

method: string, optional (default: ‘lognorm’)

the normalization method for input data, one of {“qqnorm”,”count”, other}.

batch_input_size: int, optional (default: 128)

the length of the batch vector that concatenate with the input layer.

batch_input_size2: int, optional (default: 16)

the length of the batch vector that concatenate with the latent layer.

activation: string, optional (default: “softplus”)

the activation function of hidden layers.

dropout_rate: double, optional (default: 0.01)

the dropout rate of hidden layers.

hidden_layers: list, optional (default: [128,64,32,16])

Number of hidden layer neurons in the model

lambda_regulizer: double, optional (default: 5.0)

The coefficient multiplied by KL_loss

initializer: string, optional (default: “glorot_uniform”)

Regularizer function applied to the kernel weights matrix.

l1_l2: tuple, optional (default: (0.0, 0.0))

[L1 regularization factor, L2 regularization factor].

mode: string, optional (default: ‘CVAE’)

one of {“CVAE”, “CVAE2”, “CVAE3”}

model_file: string, optional (default: None)

The file name of the trained model, the default is None

save: bool, optional (default: True)

If true, save output adata file.

build()[source]

build VIPCCA model

fit_integrate()[source]

Train the constructed VIPCCA model, integrate the data with the trained model, and return the integrated anndata object

Returns:

adata produced by function self.conf.net.integrate(self.conf.adata_all, save=self.conf.save)

Return type:

AnnData

vipcca_preprocessing()[source]

Generate the required random batch id for the VIPCCA model

VIMCCA

scbean.model.vimcca.fit_integration(adata_x, adata_y, hidden_layers=[128, 64, 32, 5], epochs=30, weight=5, sparse_x=False, sparse_y=False, batch_size=128)[source]

/ Build VIMCCA model and fit the data to the model for training.

Parameters:
  • adata_x (AnnData) – AnnData object for one of the modals.

  • adata_y (AnnData) – AnnData object for another of the modals.

  • epochs (int, optional (default: 200)) – Number of epochs to train the model. An epoch is an iteration over the entire x and y data provided.

  • batch_size (int or None, optional (default: 128)) – Number of samples per gradient update. If unspecified, batch_size will default to 128.

  • weight (double, optional (default: 5.0)) – The weights of the reconstruction loss for the second modality data.

  • sparse_x (bool, optional (default: False)) – If True, Matrix X in the AnnData object is stored as a sparse matrix.

  • sparse_y (bool, optional (default: False)) – If True, Matrix Y in the AnnData object is stored as a sparse matrix.

  • hidden_layers (list of integers, (default: [128,64,32,5])) – Number of hidden layer neurons in the model.

Returns:

z

Return type:

Numpy array(s)

VISGP

class scbean.model.visgp.VISGP(adata=None, inducing_points=20, iters=1000, processes=1)[source]

Bases: object

Initialize VISGP object

Parameters


adata: anndata, (default: None)

The input data for the model, including gene expression levels (adata.X, genes*spots), spatial location coordinates (adata.var) and genes name (adata.obs).

inducing_points: int, optional (default: 20)

The number of inducing points.

iters: int, optional (default: 1000)

The number of iters.

processes: int, optional (default: 1)

The number of concurrent processes.

build(k, y)[source]

Build and training model.

Parameters:
  • k (int) – Used to mark a gene

  • y – A vector representing the expression value of a gene

Returns:

k and p-value

Return type:

int and ~float

covariance_matrix(length)[source]

Calculate the covariance matrix.

Parameters:

length (int) – kernel parameter

Returns:

Covariance matrix

Return type:

Numpy array(s)

qvalue(pv)[source]

Calculate Q values using BH adjustment.

Parameters:

pv – P values of all genes

Returns:

Q values of all genes

Return type:

Numpy array(s)

run()[source]

Run VISGP.

Returns:

results, including [‘gene’, ‘p_value’, ‘q_value’]

Return type:

DataFrame

score_test(K, y)[source]

Score statistics test.

Parameters:
  • K – Covariance matrix

  • y – A vector representing the expression value of a gene

Returns:

P value of a gene

Return type:

float