Supplementary Materials for: The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks

Jakub Swiatkowski,Kevin Roth,Bastiaan S. Veeling,Linh Tran,Joshua V. Dillon,Jasper Snoek,Stephan Mandt,Tim Salimans,Rodolphe Jenatton,Sebastian Nowozin

semanticscholar（2020）

引用 2|浏览17

暂无评分

摘要

This document contains additional details for the main ICML 2020 paper. A. Proof of the Matrix Variate Normal Parameterization In this section of the appendix, we formally explain the connections between the k-tied Normal distribution and the matrix variate Gaussian distribution (Gupta & Nagar, 2018), referred to as MN . Consider positive definite matrices Q ∈ Rr×r and P ∈ Rc×c and some arbitrary matrix M ∈ Rr×c . We have by definition that W ∈ Rr×c ∼ MN (M, Q, P) if and only if vec(W) ∼ N (vec(M), P ⊗ Q), where vec(·) stacks the columns of a matrix and ⊗ is the Kronecker product The MN has already been used for variational inference by Louizos & Welling (2016) and Sun et al. (2017). In particular, Louizos & Welling (2016) consider the case where both P and Q are restricted to be diagonal matrices. In that case, the resulting distribution corresponds to our k-tied Normal distribution with k = 1 since P ⊗ Q = diag(p) ⊗ diag(q) = diag(vec(qp >)). Importantly, we prove below that, in the case where k ≥ 2, the k-tied Normal distribution cannot be represented as a matrix variate Gaussian distribution. Lemma (Rank-2 matrix and Kronecker product). Let B be a rank-2 matrix in Rr×c . There do not exist matrices + Q ∈ Rr×r and P ∈ Rc×c such that diag(vec(B)) = P ⊗ Q. Work done while at Google University of Warsaw ETH Zurich University of Amsterdam Google Research Imperial College London University of California, Irvine Microsoft Research. Correspondence to: Jakub Swiatkowski . Proceedings of the 37 th International Conference on Machine Learning, Online, PMLR 119, 2020. Copyright 2020 by the author(s). Proof. Let us introduce the shorthand D = diag(vec(B)). By construction, D is diagonal and has its diagonal terms strictly positive (it is assumed that B ∈ Rr×c, i.e., bij > 0 + for all i, j). We proceed by contradiction. Assume there exist Q ∈ Rr×r and P ∈ Rc×c such that D = P ⊗ Q. This implies that all diagonal blocks of P⊗Q are themselves diagonal with strictly positive diagonal terms. Thus, pjj Q is diagonal for all j ∈ {1, . . . , c}, which implies in turn that Q is diagonal, with non-zero diagonal terms and pjj 6= 0. Moreover, since the off-diagonal blocks pij Q for i 6= j must be zero and Q 6= 0, we have pij = 0 and P is also diagonal. To summarize, if there exist Q ∈ Rr×r and P ∈ Rc×c such that D = P ⊗ Q, then it holds that D = diag(p) ⊗ diag(q) with p ∈ R and q ∈ R . This last equality can be rewritten as bij = pj qi for all i ∈ {1, . . . , r} and j ∈ {1, . . . , c}, or equivalently

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要