Bayesian Logistic Regression

Video created by University of California, Santa Cruz for the course 'Bayesian Statistics: Techniques and Models'. Linear regression, ANOVA, logistic regression, multiple factor ANOVA Learn online and earn valuable credentials from top. In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference.When the regression model has errors that have a normal distribution, and if a particular form of prior distribution is assumed, explicit results are available for the posterior probability distributions of the model's parameters.

This example shows how to make Bayesian inferences for a logistic regression model using slicesample.Statistical inferences are usually based on maximum likelihood estimation (MLE). MLE chooses the parameters that maximize the likelihood of the data, and is intuitively appealing. In MLE, parameters are assumed to be unknown but fixed, and are estimated with some confidence. In Bayesian statistics, the uncertainty about the unknown parameters is quantified using probability so that the unknown parameters are regarded as random variables. Car Experiment DataIn some simple problems such as the previous normal mean inference example, it is easy to figure out the posterior distribution in a closed form.

But in general problems that involve non-conjugate priors, the posterior distributions are difficult or impossible to compute analytically. We will consider logistic regression as an example. This example involves an experiment to help model the proportion of cars of various weights that fail a mileage test.

The data include observations of weight, number of cars tested, and number failed. We will work with a transformed version of the weights to reduce the correlation in our estimates of the regression parameters. % A set of car weightsweight = 2100 2300 2500 2700 2900 3100 3300 3500 3700 3900 4100 4300';weight = (weight-2800)/1000;% recenter and rescale% The number of cars tested at each weighttotal = 48 42 31 34 31 21 23 23 21 16 17 21';% The number of cars that have poor mpg performances at each weightpoor = 1 2 0 3 8 8 14 17 19 15 17 21';Logistic Regression ModelLogistic regression, a special case of a generalized linear model, is appropriate for these data since the response variable is binomial. The logistic regression model can be written as. This posterior is elongated along a diagonal in the parameter space, indicating that, after we look at the data, we believe that the parameters are correlated. This is interesting, since before we collected any data we assumed they were independent.

The correlation comes from combining our prior distribution with the likelihood function. Slice SamplingMonte Carlo methods are often used in Bayesian data analysis to summarize the posterior distribution. The idea is that, even if you cannot compute the posterior distribution analytically, you can generate a random sample from the distribution and use these random values to estimate the posterior distribution or derived statistics such as the posterior mean, median, standard deviation, etc. Slice sampling is an algorithm designed to sample from a distribution with an arbitrary density function, known only up to a constant of proportionality - exactly what is needed for sampling from a complicated posterior distribution whose normalization constant is unknown. The algorithm does not generate independent samples, but rather a Markovian sequence whose stationary distribution is the target distribution. Thus, the slice sampler is a Markov Chain Monte Carlo (MCMC) algorithm.

However, it differs from other well-known MCMC algorithms because only the scaled posterior need be specified - no proposal or marginal distributions are needed.This example shows how to use the slice sampler as part of a Bayesian analysis of the mileage test logistic regression model, including generating a random sample from the posterior distribution for the model parameters, analyzing the output of the sampler, and making inferences about the model parameters. The first step is to generate a random sample. Because these are moving averages over a window of 50 iterations, the first 50 values are not comparable to the rest of the plot. However, the remainder of each plot seems to confirm that the parameter posterior means have converged to stationarity after 100 or so iterations. It is also apparent that the two parameters are correlated with each other, in agreement with the earlier plot of the posterior density.Since the settling-in period represents samples that cannot reasonably be treated as random realizations from the target distribution, it's probably advisable not to use the first 50 or so values at the beginning of the slice sampler's output.

You could just delete those rows of the output, however, it's also possible to specify a 'burn-in' period. This is convenient when a suitable burn-in length is already known, perhaps from previous runs.

These trace plots do not seem to show any non-stationarity, indicating that the burn-in period has done its job.However, there is a second aspect of the trace plots that should also be explored. While the trace for the intercept looks like high frequency noise, the trace for the slope appears to have a lower frequency component, indicating there autocorrelation between values at adjacent iterations. We could still compute the mean from this autocorrelated sample, but it is often convenient to reduce the storage requirements by removing redundancy in the sample. If this eliminated the autocorrelation, it would also allow us to treat this as a sample of independent values.

For example, you can thin out the sample by keeping only every 10th value.