phylovelo
Submodules
Package Contents
Classes
Gene expression program |
|
Cell class |
|
Cell division/differentiation type |
|
Gillespie simulation |
|
Gene class |
|
Gene expression program |
|
Data structure for PhyloVelo analysis |
|
Gillespie simulation |
Functions
|
Get simulation data annotation |
|
Simulation base expression |
|
Simulate lineage noise |
|
Draw gene expression count from base expression matrix |
|
Draw random sample form NB distribution with paras = (r, p) |
|
Reconstruct phylogenetic tree of simulation data |
|
Record lineage infomation in simulation |
|
Reformat tree file from simulation data |
|
Log normalize data |
|
Draw phylogenetic tree |
|
Weight sum the velocity to grid |
|
Generate grid to project velocities. |
|
Project velocities to grid |
|
Project velocities into embedding |
|
Draw mullerplot |
|
Label cell type names on figures. |
|
Draw a scatter plot of the two sets of data and show their correlation coefficients |
|
Inference phylogenetic velocity |
|
Project velocity into embedding |
|
Calculate the phyloVelo pseudotime |
- class GeneExpr(Ngene: int, r_variant_gene: float, diff_map: dict, state_time: dict, forward_map: dict = None)
Gene expression program
- Args:
- Ngene:
Gene number
- r_variant_gene:
Ratio of gene changes with differentiation
- diff_map:
Differentiation relationships between different cell types {a:[b,c]} means ‘a’ is differentiated from ‘b’ and ‘c’
- state_time:
Pseudo time of each states
- forward_map:
Only use in convergent model simulation {a:b} means ‘a’ will differentiated to ‘b’
- generate_genes(mu0_loc: float = 20, mu0_scale: float = 3, drift_loc: float = 0, drift_scale: float = 1)
Generate genes
- Args:
- mu0_loc:
Mean of initial expression
- mu0_scale:
Variation of initial expression
- drift_loc:
Mean of gene drift
- drift_scale:
Variation of drift
- expr(state, time)
- get_annotation(file)
Get simulation data annotation
- Args:
- file:
Tree file from simulation script
- Return:
- list:
Cell names
- list:
Cell states
- list:
Cell generations
- sim_base_expr(tree: bio.phylo.tree, cell_states: pandas.DataFrame, Ngene: int, r_variant_gene: float, diff_map: dict, forward_map: dict = {}, mu0_loc=20, mu0_scale=3, drift_loc=0, drift_scale=1, pseudo_state_time: dict = None)
Simulation base expression
- Args:
- tree:
Phylogenetic tree
- cell_states:
DataFrame of cell types with index of cell names
- Ngene:
Gene number
- r_variant_gene:
Ratio of gene changes with differentiation
- diff_map:
Differentiation relationships between different cell types {a:[b,c]} means ‘a’ is differentiated from ‘b’ and ‘c’
- state_time:
Pseudo time of each states
- forward_map:
Only use in convergent model simulation {a:b} means ‘a’ will differentiated to ‘b’
- mu0_loc:
Mean of initial expression
- mu0_scale:
Variation of initial expression
- drift_loc:
Mean of gene drift
- drift_scale:
Variation of drift
- Returns:
- class:
Gene expr program
- pd.DataFrame:
base expression matrix
- add_lineage_noise(tree: bio.phylo.tree, base_expr_mat: pandas.DataFrame, scale=0.0001)
Simulate lineage noise
- Args:
- tree:
Phylogenetic tree
- base_expr_mat:
Base expression matrix from sim_base_expr
- scale:
Lineage noise scale
- Return:
- pd.DataFrame:
Base expression matrix with lineage noise
- get_count_from_base_expr(base_expr_mat: pandas.DataFrame, alpha: int = 3)
Draw gene expression count from base expression matrix
- Args:
- base_expr_mat:
Base expression matrix
- alpha:
Scale parameter of NB distribution
- Return:
Gene count matrix
- get_count(paras: list)
Draw random sample form NB distribution with paras = (r, p)
- Args:
- paras:
NB parameters, [(r,p)]
- Return:
- int:
Random sample
- reconstruct(file: str, output: str = None, seed: int = None, is_balance: bool = False, **kwargs)
Reconstruct phylogenetic tree of simulation data
- Args:
- file:
Simulation file path
- output:
Output newick file path
- seed:
Random seed
- is_balance:
Is all cell types’ cell number equal
- ratio:
How many cells to reconstruct
- Return:
newick tree at output file
- wirte_lineage_info(filepath, anc_cells, curr_cells, curr_time)
Record lineage infomation in simulation
- class Cell(Ngene: int = None, state: int = 0, gen: int = None, cid: int = None, parent: int = None, tb: float = None, td: float = None)
Cell class
- Args:
- Ngene:
Gene number
- state:
Cell type
- gen:
Cell generation
- cid:
Cell id
- parent:
Cell’s parent
- tb:
Birth time
- td:
Death time
- class Reaction(rate: callable = None, num_lefts: list = None, num_rights: list = None, index: int = None)
Cell division/differentiation type
- Args:
- rate:
reaction rate function
- num_lefts:
Cell numbers before reaction
- num_right:
Cell numbers after reaction
- index:
Reaction index
- combine(n, s)
- propensity(n, t)
- class Gillespie(num_elements: int, inits: list = None, max_cell_num: int = 20000)
Gillespie simulation
- Args:
- num_elements:
Cell type number
- inits:
Initial cell number
- max_cell_num:
Maximum cell number
- add_reaction(rate: callable = None, num_lefts: list = None, num_rights: list = None, index: int = None)
Add reactions to simulation
- Args:
- rate:
reaction rate function
- num_lefts:
Cell numbers before reaction
- num_right:
Cell numbers after reaction
- index:
Reaction index
- evolute(steps: int)
Run simulation
- Args:
- steps:
How many steps to evolute before step
- class Gene(mu0: float, drift: float, sigma: float = None, t0: int = 0)
Gene class
- Args:
- mu0:
Initial expression
- drift:
Drift coefficient of DP
- sigma:
Diffusion coefficient of DP
- t0:
Gene initial time
- diffusion()
Diffusion one step.
- base_expr_calc(t: int)
Calculate base expression
- Args
- t:
time
- Return:
Base expression at time t
- class GeneExpr(Ngene: int, r_variant_gene: float, diff_map: dict, state_time: dict, forward_map: dict = None)
Gene expression program
- Args:
- Ngene:
Gene number
- r_variant_gene:
Ratio of gene changes with differentiation
- diff_map:
Differentiation relationships between different cell types {a:[b,c]} means ‘a’ is differentiated from ‘b’ and ‘c’
- state_time:
Pseudo time of each states
- forward_map:
Only use in convergent model simulation {a:b} means ‘a’ will differentiated to ‘b’
- generate_genes(mu0_loc: float = 20, mu0_scale: float = 3, drift_loc: float = 0, drift_scale: float = 1)
Generate genes
- Args:
- mu0_loc:
Mean of initial expression
- mu0_scale:
Variation of initial expression
- drift_loc:
Mean of gene drift
- drift_scale:
Variation of drift
- expr(state, time)
- loadtree(file)
Reformat tree file from simulation data
- Args:
- file(str):
File path generated by simulation code
- Returns:
- Bio.Phylo.Tree:
biopython’s phylo tree
- list[str]:
cell types of leave nodes
- logNormalize(data, scaling=1)
Log normalize data
- Arg:
- data(pandas.DataFrame, numpy.array):
expression data
- scaling(int):
Normalization scale
- Return:
normalized data
- plot_tree(tree, colors, ax: matplotlib.axes, colortab: list = ['gray', 'blue', 'green', 'orange', 'purple'], stain: str: 'all' or 'terminals' = 'all')
Draw phylogenetic tree
- Args:
- tree:
Load from loadtree
- colors:
Load from loadtree
- ax:
matplotlib axes to draw on
- colortab:
A list of colors to paint different cell types
- stain:
‘all’ for color all branches, ‘terminals’ for color only terminals branches
- Return:
matplotlib.axes
- get_weight(x: list, distance: list, scale, length: int)
Weight sum the velocity to grid
- Args:
- x:
neighbors
- distance:
List of distance to neighbors
- scale:
Scale factor
- length:
Length of neighbors
- Return:
Weighted velocities
- generate_grid(xlim=(-1, 1), ylim=(-1, 1), density: int = 20)
Generate grid to project velocities.
- Args:
- xlim:
Grid bound on x axis
- ylim:
Grid bound on y axis
- density:
How much grid to split
- Return:
grid_X, grid_Y, grid_XY
- velocity_embedding_to_grid(pts: numpy.array, vel: numpy.array, nn: str:knn, radius = 'radius', grid_density: int = 20, n_neighbors: int = 4, radius: float = 2, xlim=(None, None), ylim=(None, None))
Project velocities to grid
- Args:
- pts:
UMAP/tSNE coordinates
- vel:
Velocity vector
- nn:
knn or radius neighbors to use
- grid_density:
density of the grid
- n_neighbors:
How much neighbors, works when nn==’knn’
- radius:
How large radius, works when nn=’radius’
- xlim:
Grid bound on x axis
- ylim:
Grid bound on y axis
Return:
- velocity_plot(pts, vel, ax, figtype: str:stream, grid, point = 'grid', nn: str:knn, radius = 'radius', grid_density: int = 20, n_neighbors: int = 4, radius: float = 2, streamdensity: float = 1.5, xlim=(None, None), ylim=(None, None), **kwargs)
Project velocities into embedding
- Args:
- pts:
UMAP/tSNE coordinates
- vel:
Velocity vector
- ax:
matplotlib.axes
- figtype:
‘stream’, ‘grid’ or ‘point’(single cell)
- nn:
knn or radius neighbors to use
- grid_density:
density of the grid
- n_neighbors:
How much neighbors, works when nn==’knn’
- radius:
How large radius, works when nn=’radius’
- streamdensity:
Density of streamplot, works when figtype==stream
- xlim:
Grid bound on x axis
- ylim:
Grid bound on y axis
- Return:
matplotlib.axes
- mullerplot(data: numpy.ndarray, label: list, color: list, absolute: bool = 0, alpha: float = 0.8, ax: matplotlib.axes = None)
Draw mullerplot
- Args:
- data:
Population size array. rows for cell type, columns for time point
- label:
Cell type names
- color:
Colors list
- absolute:
False: show frequency; True: show cell number
- alpha:
[0-1], transparent
- ax:
axes to draw mullerplot
- Return:
matplotlib.axes
- label_name(loc, cell_types, ax, fontsize=12, font='DejaVu Sans')
Label cell type names on figures.
- Args:
- loc:
x, y locations in embedding
- cell_types:
Cell type names
- ax:
axes to label cell type name
- fontsize:
fontsize
- font:
font
- Return:
matplotlib.axes
- corr_plot(x, y, ax, stats='pearson', r0_x=None, r0_y=None, r1_x=None, r1_y=None, fontsize=10)
Draw a scatter plot of the two sets of data and show their correlation coefficients
- Args:
- x:
data1
- y:
data2
- ax:
axes to draw scatter on
- stats:
pearson or spearman
- r0_x, r0_y, r1_x, r1_y:
locations to label the correlation coefficient and the p-value
- fontsize:
fontsize
- Return:
matplotlib.axes
- class scData(count: pandas DataFrame = None, x_normed: pandas DataFrame = None, latent_z: pandas DataFrame = None, Xdr: pandas DataFrame = None, phylo_tree: phylo.tree = None, cell_states: list = None, cell_names: list = None, cell_generation: list = None, megs: list = None, velocity: list = None, velocity_embeded: list = None, phylo_pseudotime: list = None, pvals: list = None, qvals: list = None)
Data structure for PhyloVelo analysis
- Args:
count: Read/UMI count. Index: cell names, columns: gene names x_normed: Normalized count. Index: cell names, columns: gene names latent_z: Inferenced latent expression Xdr: PCA/UMAP or tSNE coordinate, n cells * 2 phylo_tree: Phylogenetic tree cell_states: Cell types cell_names: Same to count’s index cell_generation: Generation time of cells megs: MEGs velocity: PhyloVelo velocity velocity_embeded: PhyloVelo velocity project into embedding phylo_pseudotime: Pseudotime inferenced by PhyloVelo
- drop_duplicate_genes(target='count')
Remove duplicated genes
- Args:
target: count or x_normed
- normalize_filter(is_normalize=True, is_log=True, min_count=10, target_sum=None)
normalize read/umi count and filter genes
- Args:
is_normalize: Similiar to normalize_total in scanpy. True for normalize is_log: log(1+X) min_count: filter genes total count < min_count target_sum: if None, use median
- Return:
self.x_normed
- dimensionality_reduction(target: count, x_normed = 'count', method: pca, tsne, umap = 'tsne', n_components: int = 2, scale: float = 1, pc: bool = True, **kwags)
PCA/tSNE or UMAP
- Args:
target: count method: use PCA/tSNE or UMAP n_components: Reduce the dimension to ‘n_components’ scale: normalize scale pc: Use PCA to tSNE/UMAP or not pc_components: How many PC to use when tSNE/UMAP perplexity: tSNE perplexity n_neighbors: UMAP n_neighbors min_dist: UMAP min_dist
- Return:
self.Xdr
- velocity_inference(sd: scData, time: list = None, cutoff: float = 0.97, alpha: float = 0.05, target: str = 'x_normed', exact: bool = False)
Inference phylogenetic velocity
- Args:
- sd:
scData
- time:
if None, cell generation will be automatically calculated from phylo tree
- cutoff:
Only calculate genes with top ‘cutoff’ correlation
- alpha:
Significance level
- target:
which data to inference, ‘count’ for nb model or ‘x_normed’ for normal model
- exact:
True to use ‘is_meg’ function; False do not use
- Return:
sd.velocity
- velocity_embedding(sd: scData, target: str = 'count', n_neigh: int = None)
Project velocity into embedding
- Args:
- sd:
scData
- target:
count or x_normed
- n_neigh:
kNN pooling. Default: Ncells//3
- calc_phylo_pseudotime(sd: scData, n_neighbors: int = 30, r_sample: float = 1)
Calculate the phyloVelo pseudotime
- Args:
- sd:
sc data
- n_neighbors:
N nearest neighbors to build MST. The smaller the number, the faster the calculation, but there is a chance of error
- r_sample:
[0-1], random sample a subset calculate pseudotime.
- Return:
scData.phylo_pseudotime
- class Gillespie(num_elements: int, inits: list = None, max_cell_num: int = 20000)
Gillespie simulation
- Args:
- num_elements:
Cell type number
- inits:
Initial cell number
- max_cell_num:
Maximum cell number
- add_reaction(rate: callable = None, num_lefts: list = None, num_rights: list = None, index: int = None)
Add reactions to simulation
- Args:
- rate:
reaction rate function
- num_lefts:
Cell numbers before reaction
- num_right:
Cell numbers after reaction
- index:
Reaction index
- evolute(steps: int)
Run simulation
- Args:
- steps:
How many steps to evolute before step