Q1. Variational Inference — 10 pts
Let $p(x,z)$ be a joint distribution and let $q(z \mid x)$ be any distribution that is positive wherever $p(x,z)$ is positive.
1. Starting from
$$ p(x)=\int p(x,z)\,\mathrm{d}z, $$derive the inequality
$$ \mathcal{L}(q)=\mathbb{E}_{q(z \mid x)}[\log p(x,z)-\log q(z \mid x)]\le \log p(x). $$2. State the exact condition under which equality holds.
3. In one sentence, explain why replacing $q(z \mid x)$ by a distribution $q(z)$ that does not depend on $x$ is generally not an adequate posterior approximation.
Q2. ELBO algebra and KL direction — 10 pts
Assume the latent-variable model
$$ p(x,z)=p(x \mid z)p(z). $$1. Show that
$$ \mathcal{L}(q)=\mathbb{E}_q[\log p(x \mid z)]-D_{\mathrm{KL}}(q(z \mid x)\Vert p(z)). $$2. Starting from Bayes’ rule, prove that
$$ \log p(x)=\mathcal{L}(q)+D_{\mathrm{KL}}(q(z \mid x)\Vert p(z \mid x)). $$3. Which KL direction is minimized by standard variational inference:
$$ D_{\mathrm{KL}}(q\Vert p)\quad \text{or}\quad D_{\mathrm{KL}}(p\Vert q)? $$Give one consequence of that choice in terms of support-covering or mode-seeking behavior.
Q3. Mean-field update — 10 pts
Let $z=(z_1,z_2,z_3)$, and suppose we restrict the variational family to
$$ q(z)=q_1(z_1)q_2(z_2)q_3(z_3). $$1. Derive the optimal coordinate update for $q_2^*(z_2)$ up to proportionality.
2. Your final answer must have the form
$$ q_2^*(z_2)\propto \exp\Big(\mathbb{E}_{q_1 q_3}[\log p(x,z)]\Big). $$3. State clearly what is treated as a constant when deriving this update.
Q4. EM for a two-component Gaussian mixture — 14 pts
Let $x_1,\dots,x_N$ be observed. Introduce latent variables $z_n\in\{0,1\}$ with
$$ P(z_n=1)=\pi,\qquad P(z_n=0)=1-\pi. $$Conditionally,
$$ x_n\mid z_n=0 \sim \mathcal{N}(\mu_0,\sigma^2),\qquad x_n\mid z_n=1 \sim \mathcal{N}(\mu_1,\sigma^2), $$where $\sigma^2$ is known.
1. Write the complete-data joint distribution $p(X,Z \mid \theta)$, where $\theta=(\pi,\mu_0,\mu_1)$.
2. Derive the E-step responsibilities
$$ r_n=P(z_n=1 \mid x_n,\theta^{\text{old}}). $$3. Write the $Q$-function
$$ Q(\theta,\theta^{\text{old}})=\mathbb{E}_{p(Z \mid X,\theta^{\text{old}})}[\log p(X,Z \mid \theta)] $$up to additive constants independent of $\theta$.
4. Derive the M-step updates for $\pi$, $\mu_0$, and $\mu_1$.
Q5. EM as variational inference — 10 pts
For the same model as in Q4, define a variational distribution
$$ q(Z)=\prod_{n=1}^N q_n(z_n),\qquad q_n(z_n=1)=r_n. $$1. Show that
$$ \log p(X \mid \theta)=\mathcal{L}(q,\theta)+D_{\mathrm{KL}}(q(Z)\Vert p(Z \mid X,\theta)). $$2. Show that for fixed $\theta$, the maximizer over all $q$ is
$$ q^*(Z)=p(Z \mid X,\theta). $$3. Explain why the E-step and M-step can be viewed as coordinate ascent on $\mathcal{L}(q,\theta)$.
Q6. Short diagnostic — 6 pts
For each statement, write True or False, and give a one-line justification.
1. In the ELBO derivation,
$$ p(x)=\int p(x,z)\,\mathrm{d}x. $$2. Since $\log$ is concave,
$$ \log \mathbb{E}_q[f(z)]\le \mathbb{E}_q[\log f(z)]. $$3. If $q(z \mid x)=p(z \mid x)$, then
$$ \mathcal{L}(q)=\log p(x). $$4. The identity
$$ \begin{aligned} &\mathbb{E}_{p(x \mid \theta)}[\log p(x \mid \theta)]-\mathbb{E}_{p(x \mid \theta)}[\log p(x \mid \hat{\theta})] \\ &\quad = D_{\mathrm{KL}}(p(x \mid \theta)\Vert p(x \mid \hat{\theta})) \end{aligned} $$is valid.
5. In the E-step of EM, the distribution over latent variables depends on $\theta^{\text{old}}$.
6. In the M-step of EM, the responsibilities $r_n$ are treated as fixed.
Q7. One mixed long question — 10 pts
Consider again the model in Q4.
1. Starting from
$$ \log p(X \mid \theta)=\log \sum_Z p(X,Z \mid \theta), $$insert an arbitrary $q(Z)$, derive an ELBO, and identify the KL remainder term.
2. Specialize $q(Z)$ to
$$ q(Z)=\prod_{n=1}^N \mathrm{Bernoulli}(r_n). $$3. Show that optimizing the ELBO with respect to $r_n$ recovers the EM responsibility formula.
4. State in one sentence what prevents this optimization from being a closed-form exact posterior update in a generic latent-variable model.