Pseudotime ========== PhyloVelo provides two ways to calculate pseudotime. Both methods write the result to ``sd.phylo_pseudotime`` and scale it to the interval ``[0, 1]``. Velocity-graph pseudotime ------------------------- The default method estimates pseudotime from the projected velocity field in a two-dimensional embedding. It builds a k-nearest-neighbor graph over ``sd.Xdr``, orients local time intervals using ``sd.velocity_embeded``, and then propagates time along a minimum-spanning structure. .. code-block:: python pv.velocity_embedding(sd, target="count", n_neigh=100) pv.calc_phylo_pseudotime(sd, n_neighbors=100) For larger datasets, a random subset can be used to estimate the graph pseudotime and then interpolate values back to all cells: .. code-block:: python pv.calc_phylo_pseudotime( sd, n_neighbors=100, r_sample=0.8, random_state=0, ) The current implementation performs one batched nearest-neighbor search and uses a heap-based minimum-spanning-tree routine. If the kNN graph contains isolated points or separated cell populations, PhyloVelo connects the components with nearest bridging edges instead of failing. Recommended use ~~~~~~~~~~~~~~~ Use velocity-graph pseudotime when the embedding and projected velocity field are reliable and you want pseudotime to reflect the inferred flow direction in the cell-state manifold. Important parameters ~~~~~~~~~~~~~~~~~~~~ ``n_neighbors`` Number of embedding neighbors used to build the graph. Larger values make disconnected components less likely, but can smooth over local structure. ``r_sample`` Fraction of cells sampled for graph construction. Use values below ``1`` for large datasets. ``random_state`` Seed used when ``r_sample < 1``. MEG expression pseudotime ------------------------- PhyloVelo can also estimate pseudotime directly from monotonically expressed genes (MEGs). This method orients each MEG by the sign of its inferred velocity, clips expression values by per-gene quantiles to reduce outlier influence, aggregates the oriented gene-wise scores, and scales the result to ``[0, 1]``. .. code-block:: python pv.velocity_inference(sd, time=sd.cell_generation, target="x_normed") pv.calc_meg_pseudotime(sd, target="x_normed") The same estimator can be selected through ``calc_phylo_pseudotime``: .. code-block:: python pv.calc_phylo_pseudotime(sd, method="meg", target="x_normed") The MEG expression method can also transfer a pseudotime clock from a reference dataset to an independent dataset that only has transcriptome data. In this mode, the reference ``sd`` provides the MEGs, velocity directions, per-gene robust scaling, and final score calibration. The query dataset only needs an expression matrix with matching gene names. .. code-block:: python query_pseudotime = pv.calc_meg_pseudotime( sd, target="x_normed", query_data=query_x_normed, # cells x genes DataFrame ) If the independent data are stored in another ``scData`` object, pass ``query_sd``. PhyloVelo will use ``query_target`` or, by default, the same ``target`` as the reference dataset. .. code-block:: python pv.calc_meg_pseudotime( sd, target="x_normed", query_sd=query_sd, query_target="x_normed", ) query_time = query_sd.phylo_pseudotime Recommended use ~~~~~~~~~~~~~~~ Use MEG expression pseudotime when you want a direct transcriptomic-clock score, when the embedding has separated clusters, or when the velocity graph is too slow or too sensitive to the manifold geometry. Important parameters ~~~~~~~~~~~~~~~~~~~~ ``target`` Expression matrix used for scoring, usually ``"x_normed"`` or ``"count"``. ``genes`` Optional list of MEGs. By default, PhyloVelo uses ``sd.megs`` from ``velocity_inference``. ``robust_quantiles`` Lower and upper quantiles used to clip each gene before scaling. The default is ``(0.05, 0.95)``. ``aggregation`` ``"median"`` is the default and is robust to noisy genes. ``"weighted_mean"`` uses the absolute inferred velocity as a gene weight. ``query_data`` / ``query_sd`` Optional independent transcriptome data to score with the reference clock. Query genes are matched by column name, so column order does not need to be the same as the reference dataset. Choosing a method ----------------- .. list-table:: :header-rows: 1 * - Situation - Suggested method * - Reliable continuous embedding and velocity field - ``calc_phylo_pseudotime(sd, method="graph")`` * - Large datasets where graph construction is expensive - Graph method with ``r_sample < 1`` or MEG expression pseudotime * - Separated clusters or isolated cells - MEG expression pseudotime, or graph method with enough neighbors * - Direct clock-like ordering from MEGs - ``calc_meg_pseudotime(sd)`` Notes ----- - ``sd.phylo_pseudotime`` is always normalized to ``[0, 1]``. - The graph method requires ``sd.Xdr`` and ``sd.velocity_embeded``. - The MEG expression method requires MEGs and velocity directions, typically produced by ``velocity_inference``.