Pseudotime
PhyloVelo provides two ways to calculate pseudotime. Both methods write the
result to sd.phylo_pseudotime and scale it to the interval [0, 1].
Velocity-graph pseudotime
The default method estimates pseudotime from the projected velocity field in a
two-dimensional embedding. It builds a k-nearest-neighbor graph over
sd.Xdr, orients local time intervals using sd.velocity_embeded, and then
propagates time along a minimum-spanning structure.
pv.velocity_embedding(sd, target="count", n_neigh=100)
pv.calc_phylo_pseudotime(sd, n_neighbors=100)
For larger datasets, a random subset can be used to estimate the graph pseudotime and then interpolate values back to all cells:
pv.calc_phylo_pseudotime(
sd,
n_neighbors=100,
r_sample=0.8,
random_state=0,
)
The current implementation performs one batched nearest-neighbor search and uses a heap-based minimum-spanning-tree routine. If the kNN graph contains isolated points or separated cell populations, PhyloVelo connects the components with nearest bridging edges instead of failing.
Recommended use
Use velocity-graph pseudotime when the embedding and projected velocity field are reliable and you want pseudotime to reflect the inferred flow direction in the cell-state manifold.
Important parameters
n_neighborsNumber of embedding neighbors used to build the graph. Larger values make disconnected components less likely, but can smooth over local structure.
r_sampleFraction of cells sampled for graph construction. Use values below
1for large datasets.random_stateSeed used when
r_sample < 1.
MEG expression pseudotime
PhyloVelo can also estimate pseudotime directly from monotonically expressed
genes (MEGs). This method orients each MEG by the sign of its inferred velocity,
clips expression values by per-gene quantiles to reduce outlier influence,
aggregates the oriented gene-wise scores, and scales the result to [0, 1].
pv.velocity_inference(sd, time=sd.cell_generation, target="x_normed")
pv.calc_meg_pseudotime(sd, target="x_normed")
The same estimator can be selected through calc_phylo_pseudotime:
pv.calc_phylo_pseudotime(sd, method="meg", target="x_normed")
The MEG expression method can also transfer a pseudotime clock from a reference
dataset to an independent dataset that only has transcriptome data. In this mode,
the reference sd provides the MEGs, velocity directions, per-gene robust
scaling, and final score calibration. The query dataset only needs an expression
matrix with matching gene names.
query_pseudotime = pv.calc_meg_pseudotime(
sd,
target="x_normed",
query_data=query_x_normed, # cells x genes DataFrame
)
If the independent data are stored in another scData object, pass
query_sd. PhyloVelo will use query_target or, by default, the same
target as the reference dataset.
pv.calc_meg_pseudotime(
sd,
target="x_normed",
query_sd=query_sd,
query_target="x_normed",
)
query_time = query_sd.phylo_pseudotime
Recommended use
Use MEG expression pseudotime when you want a direct transcriptomic-clock score, when the embedding has separated clusters, or when the velocity graph is too slow or too sensitive to the manifold geometry.
Important parameters
targetExpression matrix used for scoring, usually
"x_normed"or"count".genesOptional list of MEGs. By default, PhyloVelo uses
sd.megsfromvelocity_inference.robust_quantilesLower and upper quantiles used to clip each gene before scaling. The default is
(0.05, 0.95).aggregation"median"is the default and is robust to noisy genes."weighted_mean"uses the absolute inferred velocity as a gene weight.query_data/query_sdOptional independent transcriptome data to score with the reference clock. Query genes are matched by column name, so column order does not need to be the same as the reference dataset.
Choosing a method
Situation |
Suggested method |
|---|---|
Reliable continuous embedding and velocity field |
|
Large datasets where graph construction is expensive |
Graph method with |
Separated clusters or isolated cells |
MEG expression pseudotime, or graph method with enough neighbors |
Direct clock-like ordering from MEGs |
|
Notes
sd.phylo_pseudotimeis always normalized to[0, 1].The graph method requires
sd.Xdrandsd.velocity_embeded.The MEG expression method requires MEGs and velocity directions, typically produced by
velocity_inference.