Bayesian Approach to Response Modeling

Can it compete with data mining regression?

Business Intelligence Solutions - March, 2015



Objectives

  • Understand SAS Bayesian capability with regard to response modeling (binary, count, continuous)

  • Compare Bayesian and Nonparametric data mining methods

  • Form a list of the most appropriate Bayesian response models with SAS and discuss pros and cons of selected models


SAS Provides Great Bayesian Regression Capabilities

  • Proc MCMC

  • to fit wide range of regression models: linear, generalize liner, nonlinear,  random effect and hierarchical models

  • Proc Genmod (Generalized Linear Models)

  • Proc Mixed

  • To fit mixed effect models (prior statement to sample from variance components distribution)

  • Proc FMM

  • to fit Finite Mixture Model regression

  • Survival Analysis procedures

  • PHREG (Cox proportional hazards models)

  • LIFEREG (accelerated failure time models)

  • Econometrics and Time Series procedures


Sample of Possible Bayesian Response Models

  • Multivariate Normal Random effect model

  • Mixed Effect Models with Random Intercept and Time as a Factor

  • Logistic Regression with a Diffuse Prior

  • Logistic Regression Random Effect Model

  • Finite Mixture Logistic/Poisson Regression

  • Zero-Inflated Poisson Regression

  • Hierarchical Logistic/Normal Regression

  • Hierarchical Poisson Regression for Overdispersed Data


Sample of Random Intercept Models

  • Model 1: Traditional Mixed Model

  • Model 2: Bayesian Mixed Model with flat prior

  • Model 3: Bayesian Mixed Model with informative prior regarding  promotion effect

  • Model 4: Bayesian Mixed Model with informative prior regarding customer’s inter-subject variability

  • Model 5: Bayesian Mixed Model with informative prior regarding customer’s inter-subject variability AND promotion effect


Pros and Cons of Bayesian Regression

  • Bayesian regression relaxes the assumption that the error must have a normal distribution (the error must still be independent across observations)

  • For small samples, a Bayesian approach with well-specified priors is often the only way to go

  • For medium to large samples (unless there is strong prior information) a robust frequentist approach is very appealing and in the majority of cases is preferred

  • Bayesian simulation for large data set is time consuming

  • With large samples both approaches produce practically identical results

    • Frequentist confidence intervals and Bayesian credible intervals are essentially identical



Bayesian and Frequentist Regression Methods - Jon Wakefield



Bayesian Analysis and Priors

  • Bayesian analysis does not tell you how to select a prior

  • There is no a correct way to select a prior

  • Bayesian inferences require great skills to formulate priors

  • Bayesian regression results can vary greatly depending upon which priors were chosen for the variance components

  • Analysts can easily generate misleading results



SAS/STAT 9.2 User



Bayesian Analysis and Convergence

  • Markov chains should eventually converge to the stationary distribution (which is also our target distribution)

  • There is no guarantee that our chain has converged even after large number of draws

  • We can only tell when something has not converged

  • How do we know whether our chain has actually converged?

    • there are several visual and statistical tests to see if the chain appears to be converged

  • Convergence inspection is necessary for each parameter of the Bayesian model

Classical Bayesian Hierarchical Response Model

  • If a classical Hierarchical Bayesian response model includes 4 predictors, then the number of posterior distributions to estimate is approximately:

    • 5,000 for 1,000 customers

    • 50,000 for 10,000 customers

    • 500,000 for 100,000 customers

Applicability of Bayesian Regression to Response Modeling

  • Stochastic Gradient Boosting models for observational data are among the most accurate models yet invented

    • Non-parametric non-linear for non-characterizable nonlinearities

    • No restrictions on the number of observations or the number of predictors

    • Practically any commercial data mining software now includes Stochastic Gradient Boosting approach; Widely used in diverse  applications of observational studies

    • No researcher’s intervention

    • There are no lacunas in model development settings

  • Bayesian regression is simple parametric and linear (or non-linear with characterizable nonlinearities) models

    • Data structure: small and medium size data with small number of predictors

    • Strong researcher intervention: model and priors selection

    • The method is not included in any commercial data mining software and therefore there are practically NO data mining applications 

    • There are lacunas in model development settings: how to select priors and how to choose prior’s parameters

  • Conclusion:  Bayesian regression cannot compete with traditional data mining regression (in particular, with Stochastic Gradient Boosting regression) for large observational data (thousands observations) and large numbers (hundreds and more) of predictors