Among the many text books and tutorials on logistic regression, the very preliminary one given by above link explains how the logistic regression model comes:
In the binary classification problem, it is intuitive to determine whether an instance x belongs to class 0 or class 1 by the ratio P(c=1|x) / P(c=0|x). Denoting P = P(c=1|x) and 1-P = P(c=0|x), the ratio becomes odds P/(1-P).
However, a bad property of odds is that it is asymmetric w.r.t. P. For example, swapping the values of P and 1-P does not negates the value of P/(1-P). However, the swapping does negates the logit ln P/(1-P). So, it becomes reasonable to make logit instead of odds our dependent variable.
By modeling the dependent variable by a linear form, we get:
ln P/(1-P) = a + bx
which is equivalent to
P = ea+bx / (1 + ea+bx)
Above tutorial also compares linear regression with logistic regression:
“If you use linear regression, the predicted values will become greater than one and less than zero if you move far enough on the X-axis. Such values are theoretically inadmissible.”
This explains that logistic regression does not estimate the relation between x and c, instead it estimates x and P(c|x), and uses P(c|x) to determine whether x is in c=1 or c=0. So logistic regression is not regression, it is a classifier.
- A C++ implementation of large-scale logistic regression (together with a tech-report) can be found at:
- A Mahout slides show that they have received a proposal to implement logistic regression in Hadoop from Google Summer school of Code, but I have not seen the result yet.
- Two papers on large-scale logistic regression was published in 2009:
1. Parallel Large-scale Feature Selection for Logistic Regression, and
2. Large-scale Sparse Logistic Regression