phylovelo

Submodules

Package Contents

Classes

GeneExpr

Gene expression program

Cell

Cell class

Reaction

Cell division/differentiation type

Gillespie

Gillespie simulation

Gene

Gene class

GeneExpr

Gene expression program

scData

Data structure for PhyloVelo analysis

Gillespie

Gillespie simulation

Functions

get_annotation(file)

Get simulation data annotation

sim_base_expr(tree, cell_states, Ngene, ...[, ...])

Simulation base expression

add_lineage_noise(tree, base_expr_mat[, scale])

Simulate lineage noise

get_count_from_base_expr(base_expr_mat[, alpha])

Draw gene expression count from base expression matrix

get_count(paras)

Draw random sample form NB distribution with paras = (r, p)

reconstruct(file[, output, seed, is_balance])

Reconstruct phylogenetic tree of simulation data

wirte_lineage_info(filepath, anc_cells, curr_cells, ...)

Record lineage infomation in simulation

loadtree(file)

Reformat tree file from simulation data

logNormalize(data[, scaling])

Log normalize data

plot_tree(tree, colors, ax, colortab, , , , ], stain)

Draw phylogenetic tree

get_weight(x, distance, scale, length)

Weight sum the velocity to grid

generate_grid([xlim, ylim, density])

Generate grid to project velocities.

velocity_embedding_to_grid(pts, vel, nn[, radius, ...])

Project velocities to grid

velocity_plot(pts, vel, ax, figtype, grid[, point, ...])

Project velocities into embedding

mullerplot(data, label, color[, absolute, alpha, ax])

Draw mullerplot

label_name(loc, cell_types, ax[, fontsize, font])

Label cell type names on figures.

corr_plot(x, y, ax[, stats, r0_x, r0_y, r1_x, r1_y, ...])

Draw a scatter plot of the two sets of data and show their correlation coefficients

velocity_inference(sd[, time, cutoff, alpha, target, ...])

Inference phylogenetic velocity

velocity_embedding(sd[, target, n_neigh])

Project velocity into embedding

calc_phylo_pseudotime(sd[, n_neighbors, r_sample])

Calculate the phyloVelo pseudotime

class GeneExpr(Ngene: int, r_variant_gene: float, diff_map: dict, state_time: dict, forward_map: dict = None)

Gene expression program

Args:
Ngene:

Gene number

r_variant_gene:

Ratio of gene changes with differentiation

diff_map:

Differentiation relationships between different cell types {a:[b,c]} means ‘a’ is differentiated from ‘b’ and ‘c’

state_time:

Pseudo time of each states

forward_map:

Only use in convergent model simulation {a:b} means ‘a’ will differentiated to ‘b’

generate_genes(mu0_loc: float = 20, mu0_scale: float = 3, drift_loc: float = 0, drift_scale: float = 1)

Generate genes

Args:
mu0_loc:

Mean of initial expression

mu0_scale:

Variation of initial expression

drift_loc:

Mean of gene drift

drift_scale:

Variation of drift

expr(state, time)
get_annotation(file)

Get simulation data annotation

Args:
file:

Tree file from simulation script

Return:
list:

Cell names

list:

Cell states

list:

Cell generations

sim_base_expr(tree: bio.phylo.tree, cell_states: pandas.DataFrame, Ngene: int, r_variant_gene: float, diff_map: dict, forward_map: dict = {}, mu0_loc=20, mu0_scale=3, drift_loc=0, drift_scale=1, pseudo_state_time: dict = None)

Simulation base expression

Args:
tree:

Phylogenetic tree

cell_states:

DataFrame of cell types with index of cell names

Ngene:

Gene number

r_variant_gene:

Ratio of gene changes with differentiation

diff_map:

Differentiation relationships between different cell types {a:[b,c]} means ‘a’ is differentiated from ‘b’ and ‘c’

state_time:

Pseudo time of each states

forward_map:

Only use in convergent model simulation {a:b} means ‘a’ will differentiated to ‘b’

mu0_loc:

Mean of initial expression

mu0_scale:

Variation of initial expression

drift_loc:

Mean of gene drift

drift_scale:

Variation of drift

Returns:
class:

Gene expr program

pd.DataFrame:

base expression matrix

add_lineage_noise(tree: bio.phylo.tree, base_expr_mat: pandas.DataFrame, scale=0.0001)

Simulate lineage noise

Args:
tree:

Phylogenetic tree

base_expr_mat:

Base expression matrix from sim_base_expr

scale:

Lineage noise scale

Return:
pd.DataFrame:

Base expression matrix with lineage noise

get_count_from_base_expr(base_expr_mat: pandas.DataFrame, alpha: int = 3)

Draw gene expression count from base expression matrix

Args:
base_expr_mat:

Base expression matrix

alpha:

Scale parameter of NB distribution

Return:

Gene count matrix

get_count(paras: list)

Draw random sample form NB distribution with paras = (r, p)

Args:
paras:

NB parameters, [(r,p)]

Return:
int:

Random sample

reconstruct(file: str, output: str = None, seed: int = None, is_balance: bool = False, **kwargs)

Reconstruct phylogenetic tree of simulation data

Args:
file:

Simulation file path

output:

Output newick file path

seed:

Random seed

is_balance:

Is all cell types’ cell number equal

ratio:

How many cells to reconstruct

Return:

newick tree at output file

wirte_lineage_info(filepath, anc_cells, curr_cells, curr_time)

Record lineage infomation in simulation

class Cell(Ngene: int = None, state: int = 0, gen: int = None, cid: int = None, parent: int = None, tb: float = None, td: float = None)

Cell class

Args:
Ngene:

Gene number

state:

Cell type

gen:

Cell generation

cid:

Cell id

parent:

Cell’s parent

tb:

Birth time

td:

Death time

class Reaction(rate: callable = None, num_lefts: list = None, num_rights: list = None, index: int = None)

Cell division/differentiation type

Args:
rate:

reaction rate function

num_lefts:

Cell numbers before reaction

num_right:

Cell numbers after reaction

index:

Reaction index

combine(n, s)
propensity(n, t)
class Gillespie(num_elements: int, inits: list = None, max_cell_num: int = 20000)

Gillespie simulation

Args:
num_elements:

Cell type number

inits:

Initial cell number

max_cell_num:

Maximum cell number

add_reaction(rate: callable = None, num_lefts: list = None, num_rights: list = None, index: int = None)

Add reactions to simulation

Args:
rate:

reaction rate function

num_lefts:

Cell numbers before reaction

num_right:

Cell numbers after reaction

index:

Reaction index

evolute(steps: int)

Run simulation

Args:
steps:

How many steps to evolute before step

class Gene(mu0: float, drift: float, sigma: float = None, t0: int = 0)

Gene class

Args:
mu0:

Initial expression

drift:

Drift coefficient of DP

sigma:

Diffusion coefficient of DP

t0:

Gene initial time

diffusion()

Diffusion one step.

base_expr_calc(t: int)

Calculate base expression

Args
t:

time

Return:

Base expression at time t

class GeneExpr(Ngene: int, r_variant_gene: float, diff_map: dict, state_time: dict, forward_map: dict = None)

Gene expression program

Args:
Ngene:

Gene number

r_variant_gene:

Ratio of gene changes with differentiation

diff_map:

Differentiation relationships between different cell types {a:[b,c]} means ‘a’ is differentiated from ‘b’ and ‘c’

state_time:

Pseudo time of each states

forward_map:

Only use in convergent model simulation {a:b} means ‘a’ will differentiated to ‘b’

generate_genes(mu0_loc: float = 20, mu0_scale: float = 3, drift_loc: float = 0, drift_scale: float = 1)

Generate genes

Args:
mu0_loc:

Mean of initial expression

mu0_scale:

Variation of initial expression

drift_loc:

Mean of gene drift

drift_scale:

Variation of drift

expr(state, time)
loadtree(file)

Reformat tree file from simulation data

Args:
file(str):

File path generated by simulation code

Returns:
Bio.Phylo.Tree:

biopython’s phylo tree

list[str]:

cell types of leave nodes

logNormalize(data, scaling=1)

Log normalize data

Arg:
data(pandas.DataFrame, numpy.array):

expression data

scaling(int):

Normalization scale

Return:

normalized data

plot_tree(tree, colors, ax: matplotlib.axes, colortab: list = ['gray', 'blue', 'green', 'orange', 'purple'], stain: str: 'all' or 'terminals' = 'all')

Draw phylogenetic tree

Args:
tree:

Load from loadtree

colors:

Load from loadtree

ax:

matplotlib axes to draw on

colortab:

A list of colors to paint different cell types

stain:

‘all’ for color all branches, ‘terminals’ for color only terminals branches

Return:

matplotlib.axes

get_weight(x: list, distance: list, scale, length: int)

Weight sum the velocity to grid

Args:
x:

neighbors

distance:

List of distance to neighbors

scale:

Scale factor

length:

Length of neighbors

Return:

Weighted velocities

generate_grid(xlim=(-1, 1), ylim=(-1, 1), density: int = 20)

Generate grid to project velocities.

Args:
xlim:

Grid bound on x axis

ylim:

Grid bound on y axis

density:

How much grid to split

Return:

grid_X, grid_Y, grid_XY

velocity_embedding_to_grid(pts: numpy.array, vel: numpy.array, nn: str:knn, radius = 'radius', grid_density: int = 20, n_neighbors: int = 4, radius: float = 2, xlim=(None, None), ylim=(None, None))

Project velocities to grid

Args:
pts:

UMAP/tSNE coordinates

vel:

Velocity vector

nn:

knn or radius neighbors to use

grid_density:

density of the grid

n_neighbors:

How much neighbors, works when nn==’knn’

radius:

How large radius, works when nn=’radius’

xlim:

Grid bound on x axis

ylim:

Grid bound on y axis

Return:

velocity_plot(pts, vel, ax, figtype: str:stream, grid, point = 'grid', nn: str:knn, radius = 'radius', grid_density: int = 20, n_neighbors: int = 4, radius: float = 2, streamdensity: float = 1.5, xlim=(None, None), ylim=(None, None), **kwargs)

Project velocities into embedding

Args:
pts:

UMAP/tSNE coordinates

vel:

Velocity vector

ax:

matplotlib.axes

figtype:

‘stream’, ‘grid’ or ‘point’(single cell)

nn:

knn or radius neighbors to use

grid_density:

density of the grid

n_neighbors:

How much neighbors, works when nn==’knn’

radius:

How large radius, works when nn=’radius’

streamdensity:

Density of streamplot, works when figtype==stream

xlim:

Grid bound on x axis

ylim:

Grid bound on y axis

Return:

matplotlib.axes

mullerplot(data: numpy.ndarray, label: list, color: list, absolute: bool = 0, alpha: float = 0.8, ax: matplotlib.axes = None)

Draw mullerplot

Args:
data:

Population size array. rows for cell type, columns for time point

label:

Cell type names

color:

Colors list

absolute:

False: show frequency; True: show cell number

alpha:

[0-1], transparent

ax:

axes to draw mullerplot

Return:

matplotlib.axes

label_name(loc, cell_types, ax, fontsize=12, font='DejaVu Sans')

Label cell type names on figures.

Args:
loc:

x, y locations in embedding

cell_types:

Cell type names

ax:

axes to label cell type name

fontsize:

fontsize

font:

font

Return:

matplotlib.axes

corr_plot(x, y, ax, stats='pearson', r0_x=None, r0_y=None, r1_x=None, r1_y=None, fontsize=10)

Draw a scatter plot of the two sets of data and show their correlation coefficients

Args:
x:

data1

y:

data2

ax:

axes to draw scatter on

stats:

pearson or spearman

r0_x, r0_y, r1_x, r1_y:

locations to label the correlation coefficient and the p-value

fontsize:

fontsize

Return:

matplotlib.axes

class scData(count: pandas DataFrame = None, x_normed: pandas DataFrame = None, latent_z: pandas DataFrame = None, Xdr: pandas DataFrame = None, phylo_tree: phylo.tree = None, cell_states: list = None, cell_names: list = None, cell_generation: list = None, megs: list = None, velocity: list = None, velocity_embeded: list = None, phylo_pseudotime: list = None, pvals: list = None, qvals: list = None)

Data structure for PhyloVelo analysis

Args:

count: Read/UMI count. Index: cell names, columns: gene names x_normed: Normalized count. Index: cell names, columns: gene names latent_z: Inferenced latent expression Xdr: PCA/UMAP or tSNE coordinate, n cells * 2 phylo_tree: Phylogenetic tree cell_states: Cell types cell_names: Same to count’s index cell_generation: Generation time of cells megs: MEGs velocity: PhyloVelo velocity velocity_embeded: PhyloVelo velocity project into embedding phylo_pseudotime: Pseudotime inferenced by PhyloVelo

drop_duplicate_genes(target='count')

Remove duplicated genes

Args:

target: count or x_normed

normalize_filter(is_normalize=True, is_log=True, min_count=10, target_sum=None)

normalize read/umi count and filter genes

Args:

is_normalize: Similiar to normalize_total in scanpy. True for normalize is_log: log(1+X) min_count: filter genes total count < min_count target_sum: if None, use median

Return:

self.x_normed

dimensionality_reduction(target: count, x_normed = 'count', method: pca, tsne, umap = 'tsne', n_components: int = 2, scale: float = 1, pc: bool = True, **kwags)

PCA/tSNE or UMAP

Args:

target: count method: use PCA/tSNE or UMAP n_components: Reduce the dimension to ‘n_components’ scale: normalize scale pc: Use PCA to tSNE/UMAP or not pc_components: How many PC to use when tSNE/UMAP perplexity: tSNE perplexity n_neighbors: UMAP n_neighbors min_dist: UMAP min_dist

Return:

self.Xdr

velocity_inference(sd: scData, time: list = None, cutoff: float = 0.97, alpha: float = 0.05, target: str = 'x_normed', exact: bool = False)

Inference phylogenetic velocity

Args:
sd:

scData

time:

if None, cell generation will be automatically calculated from phylo tree

cutoff:

Only calculate genes with top ‘cutoff’ correlation

alpha:

Significance level

target:

which data to inference, ‘count’ for nb model or ‘x_normed’ for normal model

exact:

True to use ‘is_meg’ function; False do not use

Return:

sd.velocity

velocity_embedding(sd: scData, target: str = 'count', n_neigh: int = None)

Project velocity into embedding

Args:
sd:

scData

target:

count or x_normed

n_neigh:

kNN pooling. Default: Ncells//3

calc_phylo_pseudotime(sd: scData, n_neighbors: int = 30, r_sample: float = 1)

Calculate the phyloVelo pseudotime

Args:
sd:

sc data

n_neighbors:

N nearest neighbors to build MST. The smaller the number, the faster the calculation, but there is a chance of error

r_sample:

[0-1], random sample a subset calculate pseudotime.

Return:

scData.phylo_pseudotime

class Gillespie(num_elements: int, inits: list = None, max_cell_num: int = 20000)

Gillespie simulation

Args:
num_elements:

Cell type number

inits:

Initial cell number

max_cell_num:

Maximum cell number

add_reaction(rate: callable = None, num_lefts: list = None, num_rights: list = None, index: int = None)

Add reactions to simulation

Args:
rate:

reaction rate function

num_lefts:

Cell numbers before reaction

num_right:

Cell numbers after reaction

index:

Reaction index

evolute(steps: int)

Run simulation

Args:
steps:

How many steps to evolute before step