GPy.likelihoods package¶

Introduction¶

The likelihood is $p(y|f,X)$ which is how well we will predict target values given inputs $X$ and our latent function $f$ ($y$ without noise). Marginal likelihood $p(y|X)$, is the same as likelihood except we marginalize out the model $f$. The importance of likelihoods in Gaussian Processes is in determining the ‘best’ values of kernel and noise hyperparamters to relate known, observed and unobserved data. The purpose of optimizing a model (e.g. GPy.models.GPRegression) is to determine the ‘best’ hyperparameters i.e. those that minimize negative log marginal likelihood.

Inheritance diagram of GPy.likelihoods.likelihood, GPy.likelihoods.mixed_noise.MixedNoise

Most likelihood classes inherit directly from GPy.likelihoods.likelihood, although an intermediary class GPy.likelihoods.mixed_noise.MixedNoise is used by GPy.likelihoods.multioutput_likelihood.

Submodules¶

GPy.likelihoods.bernoulli module¶

class Bernoulli(gp_link=None)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Bernoulli likelihood

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]

Note

Y takes values in either {-1, 1} or {0, 1}. link function should have the domain [0, 1], e.g. probit (default) or Heaviside

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]¶

Hessian at y, given inv_link_f, w.r.t inv_link_f the hessian will be 0 unless i == j i.e. second derivative logpdf at y given inverse link of f_i and inverse link of f_j w.r.t inverse link of f_i and inverse link of f_j.

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{-y_{i}}{\lambda(f)^{2}} - \frac{(1-y_{i})}{(1-\lambda(f))^{2}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata not used in bernoulli
Returns:	Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given inverse link of f w.r.t inverse link of f

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2y_{i}}{\lambda(f)^{3}} - \frac{2(1-y_{i}}{(1-\lambda(f))^{3}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables passed through inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata not used in bernoulli
Returns:	third derivative of log likelihood evaluated at points inverse_link(f)
Return type:	Nx1 array

dlogpdf_dlink(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the pdf at y, given inverse link of f w.r.t inverse link of f.

\[\frac{d\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{y_{i}}{\lambda(f_{i})} - \frac{(1 - y_{i})}{(1 - \lambda(f_{i}))}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata not used in bernoulli
Returns:	gradient of log likelihood evaluated at points inverse link of f.
Return type:	Nx1 array

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]¶

logpdf_link(inv_link_f, y, Y_metadata=None)[source]¶

Log Likelihood function given inverse link of f.

\[\ln p(y_{i}|\lambda(f_{i})) = y_{i}\log\lambda(f_{i}) + (1-y_{i})\log (1-f_{i})\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata not used in bernoulli
Returns:	log likelihood evaluated at points inverse link of f.
Return type:	float

moments_match_ep(Y_i, tau_i, v_i, Y_metadata_i=None)[source]¶

Moments match of the marginal approximation in EP algorithm

Parameters:	i – number of observation (int) tau_i – precision of the cavity distribution (float) v_i – mean/variance of the cavity distribution (float)

pdf_link(inv_link_f, y, Y_metadata=None)[source]¶

Likelihood function given inverse link of f.

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata not used in bernoulli
Returns:	likelihood evaluated for this point
Return type:	float

predictive_mean(mu, variance, Y_metadata=None)[source]¶

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:	mu – mean of posterior sigma – standard deviation of posterior

predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]¶: Get the “quantiles” of the binary labels (Bernoulli draws). all the quantiles must be either 0 or 1, since those are the only values the draw can take!

predictive_variance(mu, variance, pred_mean, Y_metadata=None)[source]¶

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Predictive_mean:
Parameters:	mu – mean of posterior sigma – standard deviation of posterior
	output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]¶

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

GPy.likelihoods.binomial module¶

class Binomial(gp_link=None)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Binomial likelihood

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]

Note

Y takes values in either {-1, 1} or {0, 1}. link function should have the domain [0, 1], e.g. probit (default) or Heaviside

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]¶

Hessian at y, given inv_link_f, w.r.t inv_link_f the hessian will be 0 unless i == j i.e. second derivative logpdf at y given inverse link of f_i and inverse link of f_j w.r.t inverse link of f_i and inverse link of f_j.

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{-y_{i}}{\lambda(f)^{2}} - \frac{(N-y_{i})}{(1-\lambda(f))^{2}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata not used in binomial
Returns:	Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given inverse link of f w.r.t inverse link of f

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{2y_{i}}{\lambda(f)^{3}} - \frac{2(N-y_{i})}{(1-\lambda(f))^{3}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata not used in binomial
Returns:	Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points inverse link of f.
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on inverse link of f_i not on inverse link of f_(j!=i)

dlogpdf_dlink(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the pdf at y, given inverse link of f w.r.t inverse link of f.

\[\frac{d^{2}\ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)^{2}} = \frac{y_{i}}{\lambda(f)} - \frac{(N-y_{i})}{(1-\lambda(f))}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata must contain ‘trials’
Returns:	gradient of log likelihood evaluated at points inverse link of f.
Return type:	Nx1 array

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]¶

logpdf_link(inv_link_f, y, Y_metadata=None)[source]¶

Log Likelihood function given inverse link of f.

\[\ln p(y_{i}|\lambda(f_{i})) = y_{i}\log\lambda(f_{i}) + (1-y_{i})\log (1-f_{i})\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata must contain ‘trials’
Returns:	log likelihood evaluated at points inverse link of f.
Return type:	float

moments_match_ep(obs, tau, v, Y_metadata_i=None)[source]¶: Calculation of moments using quadrature :param obs: observed output :param tau: cavity distribution 1st natural parameter (precision) :param v: cavity distribution 2nd natural paramenter (mu*precision)

pdf_link(inv_link_f, y, Y_metadata)[source]¶

Likelihood function given inverse link of f.

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})^{y_{i}}(1-f_{i})^{1-y_{i}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inverse link of f. y (Nx1 array) – data Y_metadata – Y_metadata must contain ‘trials’
Returns:	likelihood evaluated for this point
Return type:	float

samples(gp, Y_metadata=None, **kw)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]¶

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

GPy.likelihoods.exponential module¶

class Exponential(gp_link=None)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Expoential likelihood Y is expected to take values in {0,1,2,…} —– $$ L(x) = exp(lambda) * lambda**Y_i / Y_i! $$

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\frac{1}{\lambda(f_{i})^{2}}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in exponential distribution
Returns:	Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2}{\lambda(f_{i})^{3}}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in exponential distribution
Returns:	third derivative of likelihood evaluated at points f
Return type:	Nx1 array

dlogpdf_dlink(link_f, y, Y_metadata=None)[source]¶

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{1}{\lambda(f)} - y_{i}\]

Parameters:	link_f (Nx1 array) – latent variables (f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in exponential distribution
Returns:	gradient of likelihood evaluated at points
Return type:	Nx1 array

logpdf_link(link_f, y, Y_metadata=None)[source]¶

Log Likelihood Function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = \ln \lambda(f_{i}) - y_{i}\lambda(f_{i})\]

Parameters:	link_f (Nx1 array) – latent variables (link(f)) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in exponential distribution
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(link_f, y, Y_metadata=None)[source]¶

Likelihood function given link(f)

\[p(y_{i}|\lambda(f_{i})) = \lambda(f_{i})\exp (-y\lambda(f_{i}))\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in exponential distribution
Returns:	likelihood evaluated for this point
Return type:	float

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

GPy.likelihoods.gamma module¶

class Gamma(gp_link=None, beta=1.0)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Gamma likelihood

\[\begin{split}p(y_{i}|\lambda(f_{i})) = \frac{\beta^{\alpha_{i}}}{\Gamma(\alpha_{i})}y_{i}^{\alpha_{i}-1}e^{-\beta y_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\begin{split}\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\beta^{2}\frac{d\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in gamma distribution
Returns:	Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\begin{split}\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = -\beta^{3}\frac{d^{2}\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in gamma distribution
Returns:	third derivative of likelihood evaluated at points f
Return type:	Nx1 array

dlogpdf_dlink(link_f, y, Y_metadata=None)[source]¶

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\begin{split}\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \beta (\log \beta y_{i}) - \Psi(\alpha_{i})\beta\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables (f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in gamma distribution
Returns:	gradient of likelihood evaluated at points
Return type:	Nx1 array

logpdf_link(link_f, y, Y_metadata=None)[source]¶

Log Likelihood Function given link(f)

\[\begin{split}\ln p(y_{i}|\lambda(f_{i})) = \alpha_{i}\log \beta - \log \Gamma(\alpha_{i}) + (\alpha_{i} - 1)\log y_{i} - \beta y_{i}\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables (link(f)) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(link_f, y, Y_metadata=None)[source]¶

Likelihood function given link(f)

\[\begin{split}p(y_{i}|\lambda(f_{i})) = \frac{\beta^{\alpha_{i}}}{\Gamma(\alpha_{i})}y_{i}^{\alpha_{i}-1}e^{-\beta y_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	likelihood evaluated for this point
Return type:	float

GPy.likelihoods.gaussian module¶

A lot of this code assumes that the link function is the identity.

I think laplace code is okay, but I’m quite sure that the EP moments will only work if the link is identity.

Furthermore, exact Guassian inference can only be done for the identity link, so we should be asserting so for all calls which relate to that.

James 11/12/13

class Gaussian(gp_link=None, variance=1.0, name='Gaussian_noise')[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Gaussian likelihood

\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]

Parameters:	variance – variance value of the Gaussian distribution N (int) – Number of data points

betaY(Y, Y_metadata=None)[source]¶

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link_f, w.r.t link_f. i.e. second derivative logpdf at y given link(f_i) link(f_j) w.r.t link(f_i) and link(f_j)

The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}f} = -\frac{1}{\sigma^{2}}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	Diagonal of log hessian matrix (second derivative of log likelihood evaluated at points link(f))
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]¶

d2logpdf_dlink2_dvar(link_f, y, Y_metadata=None)[source]¶

Gradient of the hessian (d2logpdf_dlink2) w.r.t variance parameter (noise_variance)

\[\frac{d}{d\sigma^{2}}(\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)}) = \frac{1}{\sigma^{4}}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log hessian evaluated at points link(f_i) and link(f_j) w.r.t variance parameter
Return type:	Nx1 array

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = 0\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	third derivative of log likelihood evaluated at points link(f)
Return type:	Nx1 array

dlogpdf_dlink(link_f, y, Y_metadata=None)[source]¶

Gradient of the pdf at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{1}{\sigma^{2}}(y_{i} - \lambda(f_{i}))\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	gradient of log likelihood evaluated at points link(f)
Return type:	Nx1 array

dlogpdf_dlink_dtheta(f, y, Y_metadata=None)[source]¶

dlogpdf_dlink_dvar(link_f, y, Y_metadata=None)[source]¶

Derivative of the dlogpdf_dlink w.r.t variance parameter (noise_variance)

\[\frac{d}{d\sigma^{2}}(\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)}) = \frac{1}{\sigma^{4}}(-y_{i} + \lambda(f_{i}))\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type:	Nx1 array

dlogpdf_link_dtheta(f, y, Y_metadata=None)[source]¶

dlogpdf_link_dvar(link_f, y, Y_metadata=None)[source]¶

Gradient of the log-likelihood function at y given link(f), w.r.t variance parameter (noise_variance)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\sigma^{2}} = -\frac{N}{2\sigma^{2}} + \frac{(y_{i} - \lambda(f_{i}))^{2}}{2\sigma^{4}}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type:	float

ep_gradients(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]¶

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]¶

gaussian_variance(Y_metadata=None)[source]¶

log_predictive_density(y_test, mu_star, var_star, Y_metadata=None)[source]¶: assumes independence

logpdf_link(link_f, y, Y_metadata=None)[source]¶

Log likelihood function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	log likelihood evaluated for this point
Return type:	float

moments_match_ep(data_i, tau_i, v_i, Y_metadata_i=None)[source]¶

Moments match of the marginal approximation in EP algorithm

Parameters:	i – number of observation (int) tau_i – precision of the cavity distribution (float) v_i – mean/variance of the cavity distribution (float)

pdf_link(link_f, y, Y_metadata=None)[source]¶

Likelihood function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = -\frac{N \ln 2\pi}{2} - \frac{\ln |K|}{2} - \frac{(y_{i} - \lambda(f_{i}))^{T}\sigma^{-2}(y_{i} - \lambda(f_{i}))}{2}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	likelihood evaluated for this point
Return type:	float

predictive_mean(mu, sigma)[source]¶

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:	mu – mean of posterior sigma – standard deviation of posterior

predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]¶

predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]¶

Compute mean, variance of the predictive distibution.

Parameters:	mu – mean of the latent variable, f, of posterior var – variance of the latent variable, f, of posterior full_cov (Boolean) – whether to use the full covariance or just the diagonal

predictive_variance(mu, sigma, predictive_mean=None)[source]¶

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Predictive_mean:
Parameters:	mu – mean of posterior sigma – standard deviation of posterior
	output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients(grad)[source]¶

variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]¶

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

class HeteroscedasticGaussian(Y_metadata, gp_link=None, variance=1.0, name='het_Gauss')[source]¶

Bases: GPy.likelihoods.gaussian.Gaussian

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]¶

gaussian_variance(Y_metadata=None)[source]¶

predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]¶

predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]¶

Compute mean, variance of the predictive distibution.

Parameters:	mu – mean of the latent variable, f, of posterior var – variance of the latent variable, f, of posterior full_cov (Boolean) – whether to use the full covariance or just the diagonal

GPy.likelihoods.likelihood module¶

class Likelihood(gp_link, name)[source]¶

Bases: GPy.core.parameterization.parameterized.Parameterized

Likelihood base class, used to defing p(y|f).

All instances use _inverse_ link functions, which can be swapped out. It is expected that inheriting classes define a default inverse link function

To use this class, inherit and define missing functionality.

Inheriting classes must implement:: pdf_link : a bound method which turns the output of the link function into the pdf logpdf_link : the logarithm of the above
To enable use with EP, inheriting classes must define:: TODO: a suitable derivative function for any parameters of the class
It is also desirable to define:: moments_match_ep : a function to compute the EP moments If this isn’t defined, the moments will be computed using 1D quadrature.
To enable use with Laplace approximation, inheriting classes must define:: Some derivative functions AS TODO

For exact Gaussian inference, define JH TODO

MCMC_pdf_samples(fNew, num_samples=1000, starting_loc=None, stepsize=0.1, burn_in=1000, Y_metadata=None)[source]¶

Simple implementation of Metropolis sampling algorithm

Will run a parallel chain for each input dimension (treats each f independently) Thus assumes f*_1 independant of f*_2 etc.

Parameters:

num_samples – Number of samples to take
fNew – f at which to sample around
starting_loc – Starting locations of the independant chains (usually will be conditional_mean of likelihood), often link_f
stepsize – Stepsize for the normal proposal distribution (will need modifying)
burnin – number of samples to use for burnin (will need modifying)
Y_metadata – Y_metadata for pdf

conditional_mean(gp)[source]¶: The mean of the random variable conditioned on one value of the GP

conditional_variance(gp)[source]¶: The variance of the random variable conditioned on one value of the GP

d2logpdf_df2(*args, **kwargs)¶

d2logpdf_df2_dtheta(f, y, Y_metadata=None)[source]¶: TODO: Doc strings

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]¶

d2logpdf_dlink2_dtheta(inv_link_f, y, Y_metadata=None)[source]¶

d3logpdf_df3(*args, **kwargs)¶

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]¶

dlogpdf_df(f, y, Y_metadata=None)[source]¶

Evaluates the link function link(f) then computes the derivative of log likelihood using it Uses the Faa di Bruno’s formula for the chain rule

\[\frac{d\log p(y|\lambda(f))}{df} = \frac{d\log p(y|\lambda(f))}{d\lambda(f)}\frac{d\lambda(f)}{df}\]

Parameters:	f (Nx1 array) – latent variables f y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:	derivative of log likelihood evaluated for this point
Return type:	1xN array

dlogpdf_df_dtheta(f, y, Y_metadata=None)[source]¶: TODO: Doc strings

dlogpdf_dlink(inv_link_f, y, Y_metadata=None)[source]¶

dlogpdf_dlink_dtheta(inv_link_f, y, Y_metadata=None)[source]¶

dlogpdf_dtheta(f, y, Y_metadata=None)[source]¶: TODO: Doc strings

dlogpdf_link_dtheta(inv_link_f, y, Y_metadata=None)[source]¶

ep_gradients(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]¶

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]¶

static from_dict(input_dict)[source]¶

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:	input_dict (dict) – Dictionary with all the information needed to instantiate the object.

integrate_gh(Y, mu, sigma, Y_metadata_i=None, gh_points=None)[source]¶

integrate_gk(Y, mu, sigma, Y_metadata_i=None)[source]¶

log_predictive_density(y_test, mu_star, var_star, Y_metadata=None)[source]¶

Calculation of the log predictive density

Parameters:	y_test ((Nx1) array) – test observations (y_{}) mu_star* ((Nx1) array) – predictive mean of gaussian p(f_{}\|mu_{}, var_{}) var_star* ((Nx1) array) – predictive variance of gaussian p(f_{}\|mu_{}, var_{*})

log_predictive_density_sampling(y_test, mu_star, var_star, Y_metadata=None, num_samples=1000)[source]¶

Calculation of the log predictive density via sampling

Parameters:	y_test ((Nx1) array) – test observations (y_{}) mu_star* ((Nx1) array) – predictive mean of gaussian p(f_{}\|mu_{}, var_{}) var_star* ((Nx1) array) – predictive variance of gaussian p(f_{}\|mu_{}, var_{}) num_samples* (int) – num samples of p(f_{}\|mu_{}, var_{*}) to take

logpdf(f, y, Y_metadata=None)[source]¶

Evaluates the link function link(f) then computes the log likelihood (log pdf) using it

Parameters:	f (Nx1 array) – latent variables f y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:	log likelihood evaluated for this point
Return type:	float

logpdf_link(inv_link_f, y, Y_metadata=None)[source]¶

logpdf_sum(f, y, Y_metadata=None)[source]¶: Convenience function that can overridden for functions where this could be computed more efficiently

moments_match_ep(obs, tau, v, Y_metadata_i=None)[source]¶

Calculation of moments using quadrature

Parameters:	obs – observed output tau – cavity distribution 1st natural parameter (precision) v – cavity distribution 2nd natural paramenter (mu*precision)

pdf(f, y, Y_metadata=None)[source]¶

Evaluates the link function link(f) then computes the likelihood (pdf) using it

Parameters:	f (Nx1 array) – latent variables f y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(inv_link_f, y, Y_metadata=None)[source]¶

predictive_mean(mu, variance, Y_metadata=None)[source]¶

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:	mu – mean of posterior sigma – standard deviation of posterior

predictive_quantiles(mu, var, quantiles, Y_metadata=None)[source]¶

predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]¶

Compute mean, variance of the predictive distibution.

Parameters:	mu – mean of the latent variable, f, of posterior var – variance of the latent variable, f, of posterior full_cov (Boolean) – whether to use the full covariance or just the diagonal

predictive_variance(mu, variance, predictive_mean=None, Y_metadata=None)[source]¶

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Predictive_mean:
Parameters:	mu – mean of posterior sigma – standard deviation of posterior
	output’s predictive mean, if None _predictive_mean function will be called.

request_num_latent_functions(Y)[source]¶

The likelihood should infer how many latent functions are needed for the likelihood

Default is the number of outputs

samples(gp, Y_metadata=None, samples=1)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable samples – number of samples to take for each f location

to_dict()[source]¶

update_gradients(partial)[source]¶

variational_expectations(Y, m, v, gh_points=None, Y_metadata=None)[source]¶

Use Gauss-Hermite Quadrature to compute

E_p(f) [ log p(y|f) ] d/dm E_p(f) [ log p(y|f) ] d/dv E_p(f) [ log p(y|f) ]

where p(f) is a Gaussian with mean m and variance v. The shapes of Y, m and v should match.

if no gh_points are passed, we construct them using defualt options

GPy.likelihoods.link_functions module¶

class Cloglog[source]¶

Bases: GPy.likelihoods.link_functions.GPTransformation

Complementary log-log link .. math:

p(f) = 1 - e^{-e^f}

or

f = \log (-\log(1-p))

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class GPTransformation[source]¶

Bases: object

Link function class for doing non-Gaussian likelihoods approximation

Parameters:	Y – observed output (Nx1 numpy.darray)

Note

Y values allowed depend on the likelihood_function used

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

static from_dict(input_dict)[source]¶

Instantiate an object of a derived class using the information in input_dict (built by the to_dict method of the derived class). More specifically, after reading the derived class from input_dict, it calls the method _build_from_input_dict of the derived class. Note: This method should not be overrided in the derived class. In case it is needed, please override _build_from_input_dict instate.

Parameters:	input_dict (dict) – Dictionary with all the information needed to instantiate the object.

to_dict()[source]¶

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class Heaviside[source]¶

Bases: GPy.likelihoods.link_functions.GPTransformation

\[g(f) = I_{x \geq 0}\]

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class Identity[source]¶

Bases: GPy.likelihoods.link_functions.GPTransformation

\[g(f) = f\]

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class Log[source]¶

Bases: GPy.likelihoods.link_functions.GPTransformation

\[g(f) = \log(\mu)\]

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class Log_ex_1[source]¶

Bases: GPy.likelihoods.link_functions.GPTransformation

\[g(f) = \log(\exp(\mu) - 1)\]

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class Probit[source]¶

Bases: GPy.likelihoods.link_functions.GPTransformation

\[g(f) = \Phi^{-1} (mu)\]

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class Reciprocal[source]¶

Bases: GPy.likelihoods.link_functions.GPTransformation

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

class ScaledProbit(nu=1.0)[source]¶

Bases: GPy.likelihoods.link_functions.Probit

\[g(f) = \Phi^{-1} (nu*mu)\]

d2transf_df2(f)[source]¶: second derivative of transf(f) w.r.t. f

d3transf_df3(f)[source]¶: third derivative of transf(f) w.r.t. f

dtransf_df(f)[source]¶: derivative of transf(f) w.r.t. f

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

transf(f)[source]¶: Gaussian process tranformation function, latent space -> output space

GPy.likelihoods.loggaussian module¶

class LogGaussian(gp_link=None, sigma=1.0)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

\[$$ p(y_{i}|f_{i}, z_{i}) = \prod_{i=1}^{n} (\frac{ry^{r-1}}{\exp{f(x_{i})}})^{1-z_i} (1 + (\frac{y}{\exp(f(x_{i}))})^{r})^{z_i-2} $$\]

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]¶

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type:	Nx1 array

d2logpdf_dlink2_dvar(link_f, y, Y_metadata=None)[source]¶

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type:	Nx1 array

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]¶

Gradient of the log-likelihood function at y given f, w.r.t shape parameter

\[\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	derivative of likelihood evaluated at points f w.r.t variance parameter
Return type:	float

dlogpdf_dlink(link_f, y, Y_metadata=None)[source]¶

derivative of logpdf wrt link_f param .. math:

:param link_f: latent variables link(f)
:type link_f: Nx1 array
:param y: data
:type y: Nx1 array
:param Y_metadata: includes censoring information in dictionary key 'censored'
:returns: likelihood evaluated for this point
:rtype: float

dlogpdf_dlink_dtheta(f, y, Y_metadata=None)[source]¶

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type:	Nx1 array

dlogpdf_dlink_dvar(link_f, y, Y_metadata=None)[source]¶

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type:	Nx1 array

dlogpdf_link_dtheta(f, y, Y_metadata=None)[source]¶

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata not used in gaussian
Returns:	derivative of log likelihood evaluated at points link(f) w.r.t variance parameter
Return type:	Nx1 array

dlogpdf_link_dvar(link_f, y, Y_metadata=None)[source]¶

Gradient of the log-likelihood function at y given f, w.r.t variance parameter

\[\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	derivative of likelihood evaluated at points f w.r.t variance parameter
Return type:	float

logpdf_link(link_f, y, Y_metadata=None)[source]¶

Parameters:	link_f (Nx1 array) – latent variables (link(f)) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(link_f, y, Y_metadata=None)[source]¶

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	likelihood evaluated for this point
Return type:	float

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

update_gradients(grads)[source]¶: Pull out the gradients, be careful as the order must match the order in which the parameters are added

GPy.likelihoods.loglogistic module¶

class LogLogistic(gp_link=None, r=1.0)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

\[$$ p(y_{i}|f_{i}, z_{i}) = \prod_{i=1}^{n} (\frac{ry^{r-1}}{\exp{f(x_{i})}})^{1-z_i} (1 + (\frac{y}{\exp(f(x_{i}))})^{r})^{z_i-2} $$\]

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dr(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the hessian (d2logpdf_dlink2) w.r.t shape parameter

\[\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	derivative of hessian evaluated at points f and f_j w.r.t variance parameter
Return type:	Nx1 array

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]¶

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	third derivative of likelihood evaluated at points f
Return type:	Nx1 array

dlogpdf_dlink(link_f, y, Y_metadata=None)[source]¶

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\]

Parameters:	link_f (Nx1 array) – latent variables (f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	gradient of likelihood evaluated at points
Return type:	Nx1 array

dlogpdf_dlink_dr(inv_link_f, y, Y_metadata=None)[source]¶

Derivative of the dlogpdf_dlink w.r.t shape parameter

\[\]

Parameters:	inv_link_f (Nx1 array) – latent variables inv_link_f y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	derivative of likelihood evaluated at points f w.r.t variance parameter
Return type:	Nx1 array

dlogpdf_dlink_dtheta(f, y, Y_metadata=None)[source]¶

dlogpdf_link_dr(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the log-likelihood function at y given f, w.r.t shape parameter

\[\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	derivative of likelihood evaluated at points f w.r.t variance parameter
Return type:	float

dlogpdf_link_dtheta(f, y, Y_metadata=None)[source]¶

logpdf_link(link_f, y, Y_metadata=None)[source]¶

Log Likelihood Function given link(f)

\[\]

Parameters:	link_f (Nx1 array) – latent variables (link(f)) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(link_f, y, Y_metadata=None)[source]¶

Likelihood function given link(f)

\[\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	likelihood evaluated for this point
Return type:	float

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

update_gradients(grads)[source]¶: Pull out the gradients, be careful as the order must match the order in which the parameters are added

GPy.likelihoods.mixed_noise module¶

class MixedNoise(likelihoods_list, name='mixed_noise')[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

betaY(Y, Y_metadata)[source]¶

exact_inference_gradients(dL_dKdiag, Y_metadata)[source]¶

gaussian_variance(Y_metadata)[source]¶

predictive_quantiles(mu, var, quantiles, Y_metadata)[source]¶

predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]¶

Compute mean, variance of the predictive distibution.

Parameters:	mu – mean of the latent variable, f, of posterior var – variance of the latent variable, f, of posterior full_cov (Boolean) – whether to use the full covariance or just the diagonal

predictive_variance(mu, sigma, Y_metadata)[source]¶

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Predictive_mean:
Parameters:	mu – mean of posterior sigma – standard deviation of posterior
	output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

to_dict()[source]¶

Convert the object into a json serializable dictionary.

Note: It uses the private method _save_to_input_dict of the parent.

Return dict:	json serializable dictionary containing the needed information to instantiate the object

update_gradients(gradients)[source]¶

GPy.likelihoods.multioutput_likelihood module¶

class MultioutputLikelihood(likelihoods_list, name='multioutput_likelihood')[source]¶

Bases: GPy.likelihoods.mixed_noise.MixedNoise

CombinedLikelihood is used to combine different likelihoods for multioutput models, where different outputs have different observation models.

As input the likelihood takes a list of likelihoods used. The likelihood uses “output_index” in Y_metadata to connect observations to likelihoods.

d2logpdf_df2(f, y, Y_metadata)[source]¶

d2logpdf_df2_dtheta(f, y, Y_metadata=None)[source]¶: TODO: Doc strings

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]¶

d3logpdf_df3(f, y, Y_metadata=None)[source]¶

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]¶

dlogpdf_df(f, y, Y_metadata)[source]¶

Evaluates the link function link(f) then computes the derivative of log likelihood using it Uses the Faa di Bruno’s formula for the chain rule

\[\frac{d\log p(y|\lambda(f))}{df} = \frac{d\log p(y|\lambda(f))}{d\lambda(f)}\frac{d\lambda(f)}{df}\]

Parameters:	f (Nx1 array) – latent variables f y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:	derivative of log likelihood evaluated for this point
Return type:	1xN array

dlogpdf_df_dtheta(f, y, Y_metadata=None)[source]¶: TODO: Doc strings

dlogpdf_dlink(inv_link_f, y, Y_metadata=None)[source]¶

dlogpdf_dtheta(f, y, Y_metadata=None)[source]¶: TODO: Doc strings

ep_gradients(Y, cav_tau, cav_v, dL_dKdiag, Y_metadata=None, quad_mode='gk', boost_grad=1.0)[source]¶

exact_inference_gradients(dL_dKdiag, Y_metadata)[source]¶

log_predictive_density(y_test, mu_star, var_star, Y_metadata=None)[source]¶

Calculation of the log predictive density

Parameters:	y_test ((Nx1) array) – test observations (y_{}) mu_star* ((Nx1) array) – predictive mean of gaussian p(f_{}\|mu_{}, var_{}) var_star* ((Nx1) array) – predictive variance of gaussian p(f_{}\|mu_{}, var_{*})

logpdf(f, y, Y_metadata=None)[source]¶

Evaluates the link function link(f) then computes the log likelihood (log pdf) using it

Parameters:	f (Nx1 array) – latent variables f y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:	log likelihood evaluated for this point
Return type:	float

logpdf_link(inv_link_f, y, Y_metadata=None)[source]¶

moments_match_ep(data_i, tau_i, v_i, Y_metadata_i)[source]¶

Calculation of moments using quadrature

Parameters:	obs – observed output tau – cavity distribution 1st natural parameter (precision) v – cavity distribution 2nd natural paramenter (mu*precision)

pdf(f, y, Y_metadata=None)[source]¶

Evaluates the link function link(f) then computes the likelihood (pdf) using it

Parameters:	f (Nx1 array) – latent variables f y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution - not used
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(inv_link_f, y, Y_metadata=None)[source]¶

predictive_values(mu, var, full_cov=False, Y_metadata=None)[source]¶

Compute mean, variance of the predictive distibution.

Parameters:	mu – mean of the latent variable, f, of posterior var – variance of the latent variable, f, of posterior full_cov (Boolean) – whether to use the full covariance or just the diagonal

predictive_variance(mu, sigma, Y_metadata)[source]¶

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Predictive_mean:
Parameters:	mu – mean of posterior sigma – standard deviation of posterior
	output’s predictive mean, if None _predictive_mean function will be called.

GPy.likelihoods.poisson module¶

class Poisson(gp_link=None)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Poisson likelihood

\[p(y_{i}|\lambda(f_{i})) = \frac{\lambda(f_{i})^{y_{i}}}{y_{i}!}e^{-\lambda(f_{i})}\]

Note

Y is expected to take values in {0,1,2,…}

conditional_mean(gp)[source]¶: The mean of the random variable conditioned on one value of the GP

conditional_variance(gp)[source]¶: The variance of the random variable conditioned on one value of the GP

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = \frac{-y_{i}}{\lambda(f_{i})^{2}}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{2y_{i}}{\lambda(f_{i})^{3}}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	third derivative of likelihood evaluated at points f
Return type:	Nx1 array

dlogpdf_dlink(link_f, y, Y_metadata=None)[source]¶

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{y_{i}}{\lambda(f_{i})} - 1\]

Parameters:	link_f (Nx1 array) – latent variables (f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	gradient of likelihood evaluated at points
Return type:	Nx1 array

logpdf_link(link_f, y, Y_metadata=None)[source]¶

Log Likelihood Function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = -\lambda(f_{i}) + y_{i}\log \lambda(f_{i}) - \log y_{i}!\]

Parameters:	link_f (Nx1 array) – latent variables (link(f)) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(link_f, y, Y_metadata=None)[source]¶

Likelihood function given link(f)

\[p(y_{i}|\lambda(f_{i})) = \frac{\lambda(f_{i})^{y_{i}}}{y_{i}!}e^{-\lambda(f_{i})}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	likelihood evaluated for this point
Return type:	float

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

GPy.likelihoods.student_t module¶

class StudentT(gp_link=None, deg_free=5, sigma2=2)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Student T likelihood

For nomanclature see Bayesian Data Analysis 2003 p576

\[p(y_{i}|\lambda(f_{i})) = \frac{\Gamma\left(\frac{v+1}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{v\pi\sigma^{2}}}\left(1 + \frac{1}{v}\left(\frac{(y_{i} - f_{i})^{2}}{\sigma^{2}}\right)\right)^{\frac{-v+1}{2}}\]

conditional_mean(gp)[source]¶: The mean of the random variable conditioned on one value of the GP

conditional_variance(gp)[source]¶: The variance of the random variable conditioned on one value of the GP

d2logpdf_dlink2(inv_link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = \frac{(v+1)((y_{i}-\lambda(f_{i}))^{2} - \sigma^{2}v)}{((y_{i}-\lambda(f_{i}))^{2} + \sigma^{2}v)^{2}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inv_link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]¶

d2logpdf_dlink2_dv(inv_link_f, y, Y_metadata=None)[source]¶

d2logpdf_dlink2_dvar(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the hessian (d2logpdf_dlink2) w.r.t variance parameter (t_noise)

\[\frac{d}{d\sigma^{2}}(\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}f}) = \frac{v(v+1)(\sigma^{2}v - 3(y_{i} - \lambda(f_{i}))^{2})}{(\sigma^{2}v + (y_{i} - \lambda(f_{i}))^{2})^{3}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	derivative of hessian evaluated at points f and f_j w.r.t variance parameter
Return type:	Nx1 array

d3logpdf_dlink3(inv_link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = \frac{-2(v+1)((y_{i} - \lambda(f_{i}))^3 - 3(y_{i} - \lambda(f_{i})) \sigma^{2} v))}{((y_{i} - \lambda(f_{i})) + \sigma^{2} v)^3}\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	third derivative of likelihood evaluated at points f
Return type:	Nx1 array

dlogpdf_dlink(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \frac{(v+1)(y_{i}-\lambda(f_{i}))}{(y_{i}-\lambda(f_{i}))^{2} + \sigma^{2}v}\]

Parameters:	inv_link_f (Nx1 array) – latent variables (f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	gradient of likelihood evaluated at points
Return type:	Nx1 array

dlogpdf_dlink_dtheta(f, y, Y_metadata=None)[source]¶

dlogpdf_dlink_dv(inv_link_f, y, Y_metadata=None)[source]¶

dlogpdf_dlink_dvar(inv_link_f, y, Y_metadata=None)[source]¶

Derivative of the dlogpdf_dlink w.r.t variance parameter (t_noise)

\[\frac{d}{d\sigma^{2}}(\frac{d \ln p(y_{i}|\lambda(f_{i}))}{df}) = \frac{-2\sigma v(v + 1)(y_{i}-\lambda(f_{i}))}{(y_{i}-\lambda(f_{i}))^2 + \sigma^2 v)^2}\]

Parameters:	inv_link_f (Nx1 array) – latent variables inv_link_f y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	derivative of likelihood evaluated at points f w.r.t variance parameter
Return type:	Nx1 array

dlogpdf_link_dtheta(f, y, Y_metadata=None)[source]¶

dlogpdf_link_dv(inv_link_f, y, Y_metadata=None)[source]¶

dlogpdf_link_dvar(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the log-likelihood function at y given f, w.r.t variance parameter (t_noise)

\[\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\sigma^{2}} = \frac{v((y_{i} - \lambda(f_{i}))^{2} - \sigma^{2})}{2\sigma^{2}(\sigma^{2}v + (y_{i} - \lambda(f_{i}))^{2})}\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	derivative of likelihood evaluated at points f w.r.t variance parameter
Return type:	float

logpdf_link(inv_link_f, y, Y_metadata=None)[source]¶

Log Likelihood Function given link(f)

\[\ln p(y_{i}|\lambda(f_{i})) = \ln \Gamma\left(\frac{v+1}{2}\right) - \ln \Gamma\left(\frac{v}{2}\right) - \ln \sqrt{v \pi\sigma^{2}} - \frac{v+1}{2}\ln \left(1 + \frac{1}{v}\left(\frac{(y_{i} - \lambda(f_{i}))^{2}}{\sigma^{2}}\right)\right)\]

Parameters:	inv_link_f (Nx1 array) – latent variables (link(f)) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(inv_link_f, y, Y_metadata=None)[source]¶

Likelihood function given link(f)

\[p(y_{i}|\lambda(f_{i})) = \frac{\Gamma\left(\frac{v+1}{2}\right)}{\Gamma\left(\frac{v}{2}\right)\sqrt{v\pi\sigma^{2}}}\left(1 + \frac{1}{v}\left(\frac{(y_{i} - \lambda(f_{i}))^{2}}{\sigma^{2}}\right)\right)^{\frac{-v+1}{2}}\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in student t distribution
Returns:	likelihood evaluated for this point
Return type:	float

predictive_mean(mu, sigma, Y_metadata=None)[source]¶

Quadrature calculation of the predictive mean: E(Y_star|Y) = E( E(Y_star|f_star, Y) )

Parameters:	mu – mean of posterior sigma – standard deviation of posterior

predictive_variance(mu, variance, predictive_mean=None, Y_metadata=None)[source]¶

Approximation to the predictive variance: V(Y_star)

The following variance decomposition is used: V(Y_star) = E( V(Y_star|f_star)**2 ) + V( E(Y_star|f_star) )**2

Predictive_mean:
Parameters:	mu – mean of posterior sigma – standard deviation of posterior
	output’s predictive mean, if None _predictive_mean function will be called.

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations based on a given value of the latent variable.

Parameters:	gp – latent variable

update_gradients(grads)[source]¶: Pull out the gradients, be careful as the order must match the order in which the parameters are added

GPy.likelihoods.weibull module¶

class Weibull(gp_link=None, beta=1.0)[source]¶

Bases: GPy.likelihoods.likelihood.Likelihood

Implementing Weibull likelihood function …

d2logpdf_dlink2(link_f, y, Y_metadata=None)[source]¶

Hessian at y, given link(f), w.r.t link(f) i.e. second derivative logpdf at y given link(f_i) and link(f_j) w.r.t link(f_i) and link(f_j) The hessian will be 0 unless i == j

\[\begin{split}\frac{d^{2} \ln p(y_{i}|\lambda(f_{i}))}{d^{2}\lambda(f)} = -\beta^{2}\frac{d\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in gamma distribution
Returns:	Diagonal of hessian matrix (second derivative of likelihood evaluated at points f)
Return type:	Nx1 array

Note

Will return diagonal of hessian, since every where else it is 0, as the likelihood factorizes over cases (the distribution for y_i depends only on link(f_i) not on link(f_(j!=i))

d2logpdf_dlink2_dr(link_f, y, Y_metadata=None)[source]¶: Derivative of hessian of loglikelihood wrt r-shape parameter. :param link_f: :param y: :param Y_metadata: :return:

d2logpdf_dlink2_dtheta(f, y, Y_metadata=None)[source]¶

Parameters:	f – y – Y_metadata –
Returns:

d3logpdf_dlink3(link_f, y, Y_metadata=None)[source]¶

Third order derivative log-likelihood function at y given link(f) w.r.t link(f)

\[\begin{split}\frac{d^{3} \ln p(y_{i}|\lambda(f_{i}))}{d^{3}\lambda(f)} = -\beta^{3}\frac{d^{2}\Psi(\alpha_{i})}{d\alpha_{i}}\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in gamma distribution
Returns:	third derivative of likelihood evaluated at points f
Return type:	Nx1 array

d3logpdf_dlink3_dr(link_f, y, Y_metadata=None)[source]¶

Parameters:	link_f – y – Y_metadata –
Returns:

dlogpdf_dlink(link_f, y, Y_metadata=None)[source]¶

Gradient of the log likelihood function at y, given link(f) w.r.t link(f)

\[\begin{split}\frac{d \ln p(y_{i}|\lambda(f_{i}))}{d\lambda(f)} = \beta (\log \beta y_{i}) - \Psi(\alpha_{i})\beta\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables (f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in gamma distribution
Returns:	gradient of likelihood evaluated at points
Return type:	Nx1 array

dlogpdf_dlink_dr(inv_link_f, y, Y_metadata=None)[source]¶

First order derivative derivative of loglikelihood wrt r:shape parameter

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in gamma distribution
Returns:	third derivative of likelihood evaluated at points f
Return type:	Nx1 array

dlogpdf_dlink_dtheta(f, y, Y_metadata=None)[source]¶

Parameters:	f – y – Y_metadata –
Returns:

dlogpdf_link_dr(inv_link_f, y, Y_metadata=None)[source]¶

Gradient of the log-likelihood function at y given f, w.r.t shape parameter

\[\]

Parameters:	inv_link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – includes censoring information in dictionary key ‘censored’
Returns:	derivative of likelihood evaluated at points f w.r.t variance parameter
Return type:	float

dlogpdf_link_dtheta(f, y, Y_metadata=None)[source]¶

Parameters:	f – y – Y_metadata –
Returns:

exact_inference_gradients(dL_dKdiag, Y_metadata=None)[source]¶

logpdf_link(link_f, y, Y_metadata=None)[source]¶

Log Likelihood Function given link(f)

\[\begin{split}\ln p(y_{i}|\lambda(f_{i})) = \alpha_{i}\log \beta - \log \Gamma(\alpha_{i}) + (\alpha_{i} - 1)\log y_{i} - \beta y_{i}\\ \alpha_{i} = \beta y_{i}\end{split}\]

Parameters:	link_f (Nx1 array) – latent variables (link(f)) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in poisson distribution
Returns:	likelihood evaluated for this point
Return type:	float

pdf_link(link_f, y, Y_metadata=None)[source]¶

Likelihood function given link(f)

Parameters:	link_f (Nx1 array) – latent variables link(f) y (Nx1 array) – data Y_metadata – Y_metadata which is not used in weibull distribution
Returns:	likelihood evaluated for this point
Return type:	float

samples(gp, Y_metadata=None)[source]¶

Returns a set of samples of observations conditioned on a given value of latent variable f.

Parameters:	gp – latent variable

update_gradients(grads)[source]¶: Pull out the gradients, be careful as the order must match the order in which the parameters are added