- We observe Binomial event with two outcomes : 0 and 1 , with
*p*representing probability of event 1 and*(1-p)*representing probability of event 0 - When observing binary variable, we’re often interested at estimating ratio between events
*(1-p)/p*– in order to determine the bias on the event that is more likely to occur - In order to handle the scale of probability ratio – we’re actually interested in logarithmic ration :
*log((1-p)/p)* - Additionally – this ratio can be interpreted as measure of
*growth*of process (in direction of events described by given probability) - deriving
*p*as linearly dependent variable in the context above, naturally introduce*logistic distribution function*: - it seems that derivation of logit function in literature goes in reverse direction (from logistic function to logarithm of probability ratio)
- note that cumulative distribution function can represent a “natural” linkage function in case of binary variable (which is the basic of
*probit*model)

## Archive for the 'statistics' Category

### Logistic Regression | take #1

August 1, 2010### Logistic Regression | take #0

July 31, 2010- Logistic regression (logit) represents example of generalized linear model created by introducing logistic function as dependency between explanatory and response variables
- Logit models fail into a broader class of qualitative response regression models (the dependent variable is qualitative in nature)
- A lot of real-world estimation problems actually fail into the “qualitative” (rather than quantitative) category – most common example being event occurrence (or appropriate probability)
- Additionally – this setting directly maps to the problem of Web analytics regarding user behavior (for example – predicting whether user will ignore advertising)
- Essential read regarding qualitative variables : G.S.Maddala, Limited-Dependent and Qualitative Variables in Econometrics, Cambridge University Press, 1983 .
- Taxonomy of qualitative variables in Web Analytics ? Starting point : http://en.wikipedia.org/wiki/Web_analytics
- Classical example is regression analysis of user sessions – we observe explanatory variables (anything related to page that user was presented at the time) – and qualitative dependent variable – user action (click, skip, stop browsing, etc)
- The question is whether we can properly estimate qualitative variables using standard regression methods like OLS
- First shoot at this would be Linear Probability Model (LPM) – deriving Bernulli process probability as quantitative variable. Issues – residuals are not normal.