zea.models.flow_matching¶
Flow matching generative model for ultrasound image generation and posterior sampling.
Replaces the cosine diffusion schedule and noise-prediction objective of
DiffusionModel with a linear flow-matching
schedule and a velocity-field prediction objective.
See also
DiffusionModel: DDIM-based counterpart.Liu et al., Flow Straight and Fast, 2022. https://arxiv.org/abs/2209.03003
Lipman et al., Flow Matching for Generative Modeling, 2022. https://arxiv.org/abs/2210.02747
Esser et al., Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, 2024. https://arxiv.org/abs/2403.03206
Classes
|
Flow matching generative model. |
- class zea.models.flow_matching.FlowMatchingModel(*args, **kwargs)[source]¶
Bases:
DiffusionModelFlow matching generative model.
Implements conditional flow matching (CFM) with straight-line (linear) interpolation paths between data and noise. The forward process is:
\[x_t = (1 - t)\, x_0 + t\, \varepsilon, \qquad \varepsilon \sim \mathcal{N}(0, I)\]The network is trained to predict the velocity field
\[v_\theta(x_t, t) \approx v = \varepsilon - x_0\]from which the clean-image estimate follows as
\[\hat{x}_0 = x_t - t\, v_\theta(x_t, t)\]At inference, images are generated by integrating the probability flow ODE
\[\frac{dx}{dt} = v_\theta(x_t, t)\]backwards from \(t = 1\) (pure noise) to \(t = 0\) (clean data) using a simple Euler discretisation (identical to the DDIM update rule under this linear schedule).
Noise samples are drawn independently from \(\mathcal{N}(0, I)\) and paired with data samples via independent (random) coupling, i.e. vanilla CFM / Rectified Flow (Liu et al. 2022). Minibatch Optimal Transport coupling (OT-CFM, Tong et al. 2023) is not currently implemented.
All sampling, guidance (DPS/DDS), and posterior-sampling machinery from
DiffusionModelis inherited unchanged.Initialize a flow matching model.
- Parameters:
input_shape – Shape of the input data, typically
(height, width, channels)for images.input_range – Range of the input data. Default
(0, 1).network_name – Network architecture. One of
"unet_time_conditional","dense_time_conditional", or"dit_time_conditional"(Diffusion Transformer).network_kwargs – Extra keyword arguments forwarded to the network constructor.
name – Model name. Default
"flow_matching_model".guidance – Guidance method. Can be a string (e.g.
"dps"), a dict with"name"and optional"params"keys, or aDiffusionGuidanceinstance.operator – Forward operator. Same format as
guidance.ema_val – Exponential moving average coefficient for the inference network weights. Default
0.999.min_t – Lower bound of the flow time interval. Default
0.0.max_t – Upper bound of the flow time interval. Default
1.0.solver – ODE solver used for (unconditional) sampling. One of
"heun"(second-order Euler–Heun, the default) or"euler"(first-order). Heun evaluates the velocity field twice per step for higher accuracy and is purely an inference-time choice (no retraining needed). Seesolver_step().**kwargs – Additional arguments forwarded to
DiffusionModel.
- denoise(noisy_images, noise_rates, signal_rates, training, network=None)[source]¶
Predict the velocity field and derive the clean-image estimate.
The network predicts the velocity \(v_\theta(x_t, t)\). The clean-image estimate follows as
\[\hat{x}_0 = x_t - t\, v_\theta(x_t, t)\]To keep full compatibility with the parent’s sampling and guidance machinery (which expects a
(pred_noises, pred_images)return value), the method also returns the corresponding noise estimate\[\hat{\varepsilon} = \hat{x}_0 + v_\theta = x_t + (1 - t)\, v_\theta\]under the name
pred_noises. The parent’sreverse_diffusion_step()formula\[x_{t - \Delta t} = \alpha_{t-\Delta t}\,\hat{x}_0 + \sigma_{t-\Delta t}\,\hat{\varepsilon}\]is algebraically equivalent to the Euler step \(x_{t-\Delta t} = x_t - \Delta t\, v_\theta\) under the linear schedule, so no changes to the sampling loop are needed.
- Parameters:
noisy_images – Noisy images
x_tof shape(n_images, *input_shape).noise_rates – Flow times
t, broadcastable tonoisy_images.signal_rates –
1 - t, broadcastable tonoisy_images.training (
bool) – Whether to call the network in training mode.network – Explicit network to use. If
None, chosen based ontraining(seecall()).
- Returns:
A
(pred_noises_est, pred_images)tuple wherepred_noises_estis \(\hat{\varepsilon}\) andpred_imagesis \(\hat{x}_0\).
- diffusion_schedule(diffusion_times)[source]¶
Linear flow-matching schedule.
\[\text{noise\_rates} = t, \qquad \text{signal\_rates} = 1 - t\]- Parameters:
diffusion_times – Tensor of flow times in
[min_t, max_t].- Returns:
A
(noise_rates, signal_rates)tuple with the same shape asdiffusion_times.
- get_config()[source]¶
Returns the config of the object.
An object config is a Python dictionary (serializable) containing the information needed to re-instantiate it.
- property metrics¶
Metrics for training.
- reverse_diffusion_step(shape, pred_images, pred_noises, signal_rates, next_signal_rates, next_noise_rates, seed=None, stochastic_sampling=False)[source]¶
A single reverse flow-matching step.
The deterministic (ODE) step is inherited unchanged from the parent. The stochastic step adds isotropic Langevin noise on top of the Euler update, turning the probability-flow ODE into a Langevin SDE:
\[x_{t - \Delta t} = x_t - \Delta t\, v_\theta(x_t, t) + \sqrt{2\,\Delta t}\; \mathbf{z}, \qquad \mathbf{z} \sim \mathcal{N}(0, I)\]Under the linear schedule \(\alpha_t = 1 - t\), the time step is recovered as \(\Delta t = \alpha_{t-\Delta t} - \alpha_t\) (i.e.
next_signal_rates - signal_rates), which is always positive during reverse sampling.- Parameters:
shape – Shape of the output tensor.
pred_images – Clean-image estimate \(\hat{x}_0\).
pred_noises – Noise estimate \(\hat{\varepsilon}\) (equal to \(\hat{x}_0 + v_\theta\)).
signal_rates – Current signal rates \(\alpha_t = 1 - t\).
next_signal_rates – Next signal rates \(\alpha_{t - \Delta t} = 1 - (t - \Delta t)\).
next_noise_rates – Next noise rates \(t - \Delta t\).
seed – Random seed generator.
stochastic_sampling – Whether to add Langevin noise. Default
False(deterministic Euler step).
- Returns:
Updated noisy images \(x_{t - \Delta t}\).
- solver_step(noisy_images, noise_rates, signal_rates, next_noise_rates, next_signal_rates, shape, network=None, training=False, seed=None, stochastic_sampling=False)[source]¶
Single ODE solver step for the probability-flow ODE.
Integrates \(\frac{dx}{dt} = v_\theta(x_t, t)\) backwards by one step using either a first-order Euler update (
solver="euler") or a second-order Euler–Heun (improved-Euler) update (solver="heun", the default).The Heun update evaluates the velocity field twice per step:
\[\begin{split}\tilde{x}_{t-\Delta t} &= x_t - \Delta t\, v_\theta(x_t, t) \qquad\text{(Euler predictor)} \\ x_{t-\Delta t} &= x_t - \tfrac{\Delta t}{2}\big( v_\theta(x_t, t) + v_\theta(\tilde{x}_{t-\Delta t},\, t-\Delta t) \big) \qquad\text{(trapezoidal corrector)}\end{split}\]where \(\Delta t = t - (t-\Delta t)\) equals
noise_rates - next_noise_rates(positive during reverse sampling). Heun’s method only changes inference; it reuses the same trained velocity network and requires no retraining.Stochastic sampling falls back to the first-order Euler–Maruyama update inherited from
DiffusionModel, since the deterministic Heun corrector does not apply to the Langevin SDE.- Parameters:
noisy_images – Current noisy images
x_t.noise_rates – Flow times
tat the current step.signal_rates –
1 - tat the current step.next_noise_rates – Flow times
t - Δtat the next step.next_signal_rates –
1 - (t - Δt)at the next step.shape – Shape of the image tensor.
network – Explicit network to use (
Noneselects based ontraining).training (
bool) – Whether to call the network in training mode.seed – Random seed generator (for stochastic sampling).
stochastic_sampling (
bool) – Whether to use stochastic (Langevin) sampling.
- Returns:
A
(next_noisy_images, pred_images)tuple wherenext_noisy_imagesisx_{t-Δt}andpred_imagesis the clean-image estimatex̂₀at the current step.
- train_step(data)[source]¶
Custom train step for Rectified Flow (independent coupling).
Trains the network to predict the velocity field \(v = \varepsilon - x_0\) from noisy observations \(x_t = (1 - t)\,x_0 + t\,\varepsilon\), where \(\varepsilon \sim \mathcal{N}(0, I)\) is sampled independently of \(x_0\).
Note
Only implemented for the TensorFlow backend.