Hello

meta
A first essay to validate the setup
Author

Dell Zhang

Published

2026-05-11

This is a test post. Equations work: \nabla_\theta J(\theta) = \mathbb{E}_\pi[\nabla_\theta \log \pi_\theta(a|s) \cdot Q^\pi(s,a)].

Code chunks execute:

import numpy as np
np.random.seed(42)
samples = np.random.randn(1000)
print(f"mean={samples.mean():.4f}, std={samples.std():.4f}")
mean=0.0193, std=0.9787

And display math too: J(\theta) = \mathbb{E}_{\tau \sim \pi_\theta}\left[\sum_{t=0}^{T} r(s_t, a_t)\right]