Archive for the 'statistics' Category

Logistic Regression | take #1

August 1, 2010
  • We observe Binomial event with two outcomes : 0 and 1 , with p representing probability of event 1 and (1-p) representing probability of event 0
  • When observing binary variable, we’re often interested at estimating ratio between events (1-p)/p – in order to determine the bias on the event that is more likely to occur
  • In order to handle the scale of probability ratio – we’re actually interested in logarithmic ration : log((1-p)/p)
  • Additionally – this ratio can be interpreted as measure of growth of process (in direction of events described by given probability)
  • deriving p as linearly dependent variable in the context above, naturally introduce logistic distribution function :
  • it seems that derivation of logit function in literature goes in reverse direction (from logistic function to logarithm of probability ratio)
  • note that cumulative distribution function can represent a “natural” linkage function in case of binary variable (which is the basic of probit model)

Logistic Regression | take #0

July 31, 2010
  • Logistic regression (logit) represents example of generalized linear model created by introducing logistic function as dependency between explanatory and response variables
  • Logit models fail into a broader class of qualitative response regression models (the dependent variable is qualitative in nature)
  • A lot of real-world estimation problems actually fail into the “qualitative” (rather than quantitative) category – most common example being event occurrence (or appropriate probability)
  • Additionally – this setting directly maps to the problem of Web analytics regarding user behavior (for example – predicting whether user will ignore advertising)
  • Essential read regarding qualitative variables : G.S.Maddala, Limited-Dependent and Qualitative Variables in Econometrics, Cambridge University Press, 1983 .
  • Taxonomy of qualitative variables in Web Analytics ? Starting point :
  • Classical example is regression analysis of user sessions – we observe explanatory variables (anything related to page that user was presented at the time) – and qualitative dependent variable – user action (click, skip, stop browsing, etc)
  • The question is whether we can properly estimate qualitative variables using standard regression methods like OLS
  • First shoot at this would be Linear Probability Model (LPM)  – deriving Bernulli process probability as quantitative variable.  Issues – residuals are not normal.