## Archive for the 'statistics' Category

August 1, 2010
- We observe Binomial event with two outcomes : 0 and 1 , with
*p* representing probability of event 1 and *(1-p)* representing probability of event 0
- When observing binary variable, we’re often interested at estimating ratio between events
*(1-p)/p *– in order to determine the bias on the event that is more likely to occur
- In order to handle the scale of probability ratio – we’re actually interested in logarithmic ration :
*log((1-p)/p)*
- Additionally – this ratio can be interpreted as measure of
*growth* of process (in direction of events described by given probability)
- deriving
*p *as linearly dependent variable in the context above, naturally introduce *logistic distribution function* :
- it seems that derivation of logit function in literature goes in reverse direction (from logistic function to logarithm of probability ratio)
- note that cumulative distribution function can represent a “natural” linkage function in case of binary variable (which is the basic of
*probit* model)

Posted in statistics **|** Leave a Comment »

July 31, 2010
- Logistic regression (logit) represents example of generalized linear model created by introducing logistic function as dependency between explanatory and response variables
- Logit models fail into a broader class of qualitative response regression models (the dependent variable is qualitative in nature)
- A lot of real-world estimation problems actually fail into the “qualitative” (rather than quantitative) category – most common example being event occurrence (or appropriate probability)
- Additionally – this setting directly maps to the problem of Web analytics regarding user behavior (for example – predicting whether user will ignore advertising)
- Essential read regarding qualitative variables : G.S.Maddala, Limited-Dependent and Qualitative Variables in Econometrics, Cambridge University Press, 1983 .
- Taxonomy of qualitative variables in Web Analytics ? Starting point : http://en.wikipedia.org/wiki/Web_analytics
- Classical example is regression analysis of user sessions – we observe explanatory variables (anything related to page that user was presented at the time) – and qualitative dependent variable – user action (click, skip, stop browsing, etc)
- The question is whether we can properly estimate qualitative variables using standard regression methods like OLS
- First shoot at this would be Linear Probability Model (LPM) – deriving Bernulli process probability as quantitative variable. Issues – residuals are not normal.

Posted in statistics **|** Leave a Comment »