Author: Stephanie Glen

**Logistic regression (LR)** models estimate the probability of a binary response, based on one or more predictor variables. Unlike linear regression models, the dependent variables are categorical. LR has become very popular, perhaps because of the wide availability of the procedure in software. Although LR is a good choice for many situations, it doesn’t work well for *all* situations. For example:

- In propensity score analysis where there are many covariates, LR performs poorly.
- For classifications, LR usually requires more variables than to achieve the same (or better) misclassification rate than Support Vector Machines (SVM) for multivariate and mixture distributions.

In addition, LR is prone to issues like overfitting and multicollinearity.

A **wide range of alternatives** are available, from statistics-based procedures (e.g. log binomial, ordinary or modified Poisson regression and Cox regression) to those rooted more deeply in data science such as machine learning and neural network theory. Which one you choose depends largely on what tools you have available to you, what theory (e.g. statistics vs. neural networks) you want to work with, and what you’re trying to achieve with your data. For example, tree-based methods are a good alternative for assessing risk factors, while Neural Networks (NN) and Support Vector Machines (SVM) work well for propensity score estimation and Categorization/Classification.

There are literally hundreds of viable alternatives to logistic regression, so it isn’t possible to discuss them all within the confines of a single blog post. What follows is an outline of some of the more popular choices.

- Tree-Based Methods
- Neural Networks and Support Vector Machines
- K-Nearest Neighbor
- Traditional Statistical Methods

## Tree-Based Methods

In machine-learning, perhaps the best known tree-based methods are AQ11 and ID3, which automatically generate trees from data. Classification And Regression Tree (CART) is perhaps the best well known in the statistics community. All of these tree-based methods work by recursively partitioning the sample space, which–put simply–creates a space that resembles a tree with branches and leaves.

For identifying risk factors, tree-based methods such as CART and conditional inference tree analysis may outperform logistic regression. The key difference between LR and tree-based methods is that while logistic regression makes assumptions about the underlying data structure, tree-based methods have no such assumptions. Another important difference is *how* the models identify risk factors: logistic regression derives odds ratios for significant factors, while tree-based methods use tree-splitting (“ramifications”) to represent the risk factors; A probability of occurrence is assigned to each end of branch in the tree.

As far as overall performance, there are some important differences. Nagy (2009) found that the trees outperformed logistic regression by identifying more risk factors and by correctly classifying items (which were horses in the author’s study). However, Nagy found “No difference…between the two tree-based methods regarding the structure and prediction accuracy of the trees.”

Tree-based methods may outperform LR when it comes to **classification**, but they are *more* prone to overfitting than LR. This can be combated by “pruning” the tree. Another option is to try both LR and a decision tree to see which gives you the most desirable results.

## Neural Networks and Support Vector Machines

Logistic regression is commonly used for **Propensity Score (PS) analysis**, but there are some cases where LR doesn’t work well. These circumstances include models that have many covariates and response surfaces that aren’t hyperplanes. Neural Networks (NN) and Support Vector Machines (SVM) are good alternatives, providing more stable estimates in most cases, although NNs tend to outperform SVMs. Keller et al. (2013) recommends estimating propensity scores with both LR *and* NNs. If a better balance is achieved with NNs, you then have an opportunity to re-specify the LR model or use the estimates provided by the NNs.

As mentioned above, tree-based methods tend to outperform LR when it comes to **classification**. However, SVMs are gaining popularity as an alternative. SVMs combine computer algorithms and theoretical results, which has resulted in a good reputation for classification purposes. Several authors (as cited in Salazar et al., 2012) found SVMs outperformed LR in several key areas, including the fact that—for multivariate and mixture distributions—SVM requires fewer variables than LR to achieve the same (or better) misclassification rate (MCR). Neural networks also perform well, especially if you have sparse binary data.

It’s important to note though, that SVMs and NNs aren’t a “miracle” alternative to LR; While some studies report the superiority of one method, other studies are often in direct contradiction. Several factors must be taken into account when deciding to switch methods, including your comfort level, your area of expertise, and specifics about your data. For example, If you are only using a single variable to classify new observations, SVM is a good alternative. However, the polynomial SVM is not recommended for this purpose because it produces a higher misclassification rate. SVM is also likely to perform better than LR if you have high correlation structures in your data.

## K-Nearest Neighbor

Widely available in statistics and data mining packages, **K-nearest neighbor (KNN)** is a simple, instance based learning (IBL) program. As it’s such a simple program to implement, it’s often a first choice for classification; As well as being easy, it usually gives results that are good enough for many applications. It was originally developed by Fix & Hodges (as cited in Kirk, 2014), whose work focused on classifications with unknown distributions.

KNN performs well in many situations, and for classifications is often the “outright winner” (Bichler et al., 2004). For ease to interpret output, calculation time, and predictive power, Srivastava (2018) reports that LR and KNN are practically identical.

One of the major problems with KNN is choosing a value for “k”, which can seem quite arbitrary. Many methods exist for choosing k, including guess and check (which is exactly as it sounds…you guess, and then check) and a multitude of algorithms that optimize k for any given training set. Kirk (2004) provides a great overview of the “choosing k” problem (pp. 25-29); For more detail about algorithms, he recommends Florian Nigsch et al,’s article Melting Point Prediction Employing K-Nearest Neighbor Algorithms and Genetic Parameter Optimization.

## Traditional Statistical Methods

Traditional statistical methods are time tested and shouldn’t be overlooked in favor of ML algorithms or Neural networks just for the sake of appearing “up to date”. In some cases, **traditional methods outperform even the most tried and trusted modern algorithms**. For example, in *Comparing Classification Methods for Campaign Management*, Bichler et al. concluded that “…**stepwise logistic regression** performed best and dominated all other methods.”

**Discriminant analysis** is a very popular longstanding tool for classification. In a practical sense, there are very minor differences between discriminant analysis and logistic regression (Michie et al. 1994, as cited in Bichler et al., 2004). In fact, LR and linear discrimination are identical for normally distributed data that have equal covariances and for independent binary attributes (Bichler, 2004).

Other, notable statistics-based alternatives:

**Log-Binomial regression**: The log-binomial naturally approximates the binomial distribution (which is the underlying mechanism for LR), but can end up with convergence problems.**Poisson regression**: Good for large sample sizes, but may estimate probabilities greater than 1. Also tends to provide conservative estimates for confidence intervals.**Poisson with robust variance estimator (modified Poisson)**: Good for large sample sizes, but may estimate probabilities greater than 1.**Cox regression:**while this is a good alternative, although it doesn’t estimate probabilities.

## References

Bichler, M. & Kiss, C. (2004). A Comparison of Logistic Regression, k-Nearest Neighbor, and Decision Tree Induction for Campaign Management. AMCIS 2004 Proceedings.

Bryan S. B. Keller , Jee-Seon Kim & Peter M. Steiner (2013) Abstract: Data Mining Alternatives to Logistic Regression for Propensity Score Estimation: Neural Networks and Support Vector Machines, Multivariate Behavioral Research, 48:1, 164-164, DOI: 10.1080/00273171.2013.752263

Kirk, M. Thoughtful Machine Learning: A Test-Driven Approach. O’Reilly Media.

Nagy, K. Chapter 3. Tree-based methods as an alternative to logistic regression in revealing risk factors of crib-biting in horses.

Nigsch, F. et al. (2006). Melting Point Prediction Employing K-Nearest Neighbor Algorithms and Genetic Parameter Optimization. Journal of Chemical Information Modeling. 46 (6), pp 2412–2422

Salazar, D. et al.(2012). Comparison between SVM and Logistic Regression: Which One is Better to Discriminate? Revista Colombiana de Estadística Número especial en Bioestadística

Junio 2012, volumen 35, no. 2, pp. 223 a 237.