http://luna.cas.usf.edu/~mbrannic/files/regression/Logistic.html

Among the many text books and tutorials on logistic regression, the very preliminary one given by above link explains

*how the logistic regression model comes*:In the binary classification problem, it is intuitive to determine whether an instance x belongs to class 0 or class 1 by the ratio P(c=1|x) / P(c=0|x). Denoting P = P(c=1|x) and 1-P = P(c=0|x), the ratio becomes

**odds**P/(1-P).However, a bad property of odds is that it is asymmetric w.r.t. P. For example, swapping the values of P and 1-P does not negates the value of P/(1-P). However, the swapping does negates the

**logit**ln P/(1-P). So, it becomes reasonable to make logit instead of odds our dependent variable.By modeling the dependent variable by a linear form, we get:

ln P/(1-P) = a + bx

which is equivalent to

P = e

^{a+bx}/ (1 + e^{a+bx})Above tutorial also

*compares linear regression with logistic regression*:“If you use linear regression, the predicted values will become greater than one and less than zero if you move far enough on the X-axis. Such values are theoretically inadmissible.”

This explains that logistic regression does not estimate the relation between x and c, instead it estimates x and P(c|x), and uses P(c|x) to determine whether x is in c=1 or c=0. So logistic regression

**is not regression**, it is a**classifier**.Additional information:

- A C++ implementation of large-scale logistic regression (together with a tech-report) can be found at:

http://stat.rutgers.edu/~madigan/BBR - A Mahout slides show that they have received a proposal to implement logistic regression in Hadoop from Google Summer school of Code, but I have not seen the result yet.
- Two papers on large-scale logistic regression was published in 2009:

1. Parallel Large-scale Feature Selection for Logistic Regression, and

2. Large-scale Sparse Logistic Regression