Pseudotime

PhyloVelo provides two ways to calculate pseudotime. Both methods write the result to sd.phylo_pseudotime and scale it to the interval [0, 1].

Velocity-graph pseudotime

The default method estimates pseudotime from the projected velocity field in a two-dimensional embedding. It builds a k-nearest-neighbor graph over sd.Xdr, orients local time intervals using sd.velocity_embeded, and then propagates time along a minimum-spanning structure.

pv.velocity_embedding(sd, target="count", n_neigh=100)
pv.calc_phylo_pseudotime(sd, n_neighbors=100)

For larger datasets, a random subset can be used to estimate the graph pseudotime and then interpolate values back to all cells:

pv.calc_phylo_pseudotime(
    sd,
    n_neighbors=100,
    r_sample=0.8,
    random_state=0,
)

The current implementation performs one batched nearest-neighbor search and uses a heap-based minimum-spanning-tree routine. If the kNN graph contains isolated points or separated cell populations, PhyloVelo connects the components with nearest bridging edges instead of failing.

Important parameters

n_neighbors

Number of embedding neighbors used to build the graph. Larger values make disconnected components less likely, but can smooth over local structure.

r_sample

Fraction of cells sampled for graph construction. Use values below 1 for large datasets.

random_state

Seed used when r_sample < 1.

MEG expression pseudotime

PhyloVelo can also estimate pseudotime directly from monotonically expressed genes (MEGs). This method orients each MEG by the sign of its inferred velocity, clips expression values by per-gene quantiles to reduce outlier influence, aggregates the oriented gene-wise scores, and scales the result to [0, 1].

pv.velocity_inference(sd, time=sd.cell_generation, target="x_normed")
pv.calc_meg_pseudotime(sd, target="x_normed")

The same estimator can be selected through calc_phylo_pseudotime:

pv.calc_phylo_pseudotime(sd, method="meg", target="x_normed")

The MEG expression method can also transfer a pseudotime clock from a reference dataset to an independent dataset that only has transcriptome data. In this mode, the reference sd provides the MEGs, velocity directions, per-gene robust scaling, and final score calibration. The query dataset only needs an expression matrix with matching gene names.

query_pseudotime = pv.calc_meg_pseudotime(
    sd,
    target="x_normed",
    query_data=query_x_normed,  # cells x genes DataFrame
)

If the independent data are stored in another scData object, pass query_sd. PhyloVelo will use query_target or, by default, the same target as the reference dataset.

pv.calc_meg_pseudotime(
    sd,
    target="x_normed",
    query_sd=query_sd,
    query_target="x_normed",
)
query_time = query_sd.phylo_pseudotime

Recommended use

Use MEG expression pseudotime when you want a direct transcriptomic-clock score, when the embedding has separated clusters, or when the velocity graph is too slow or too sensitive to the manifold geometry.

Important parameters

target

Expression matrix used for scoring, usually "x_normed" or "count".

genes

Optional list of MEGs. By default, PhyloVelo uses sd.megs from velocity_inference.

robust_quantiles

Lower and upper quantiles used to clip each gene before scaling. The default is (0.05, 0.95).

aggregation

"median" is the default and is robust to noisy genes. "weighted_mean" uses the absolute inferred velocity as a gene weight.

query_data / query_sd

Optional independent transcriptome data to score with the reference clock. Query genes are matched by column name, so column order does not need to be the same as the reference dataset.

Choosing a method

Situation

Suggested method

Reliable continuous embedding and velocity field

calc_phylo_pseudotime(sd, method="graph")

Large datasets where graph construction is expensive

Graph method with r_sample < 1 or MEG expression pseudotime

Separated clusters or isolated cells

MEG expression pseudotime, or graph method with enough neighbors

Direct clock-like ordering from MEGs

calc_meg_pseudotime(sd)

Notes

  • sd.phylo_pseudotime is always normalized to [0, 1].

  • The graph method requires sd.Xdr and sd.velocity_embeded.

  • The MEG expression method requires MEGs and velocity directions, typically produced by velocity_inference.