Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fixed Effects: ologit vs. feologit

    Hello community,

    I am using an ordered logit model to estimate the effect of a continuous variable on a survey response where respondents choose one of three answers. My dataset is a repeated cross section. I need to include both geographic and time fixed effects.

    I have been working with the command ologit using “i.” where needed for the fixed effects, but have also been experimenting with the user-written command feologit (see paper here). When I run both commands back-to-back, I get very different results (where coefficients differ in both magnitude and direction). I’ve included code below that illustrates this scenario.

    Code:
    * Create sample dataset
    clear
    set more off
    
    * Set seed for replicability
    set seed 123
    
    * Number of observations
    local N = 1000
    
    * Number of groups
    local G = 20
    
    * Generate id variable
    set obs `N'
    gen id = _n
    
    * Generate group variable
    gen group = ceil(`G' * _n / `N')
    
    * Generate random independent variables
    gen x1 = rnormal()
    gen time = floor(uniform()*5) // Generating a discrete variable ranging from 0 to 4
    
    * Generate random error term
    gen u = rnormal()
    
    * Generate discrete dependent variable
    gen y = .
    replace y = 1 if u > 0.5
    replace y = 2 if u <= 0.5 & u > -0.5
    replace y = 3 if u <= -0.5
    
    * Ordered Logit
    ologit y x1 i.time i.group
    
    * Fixed Effects Ordered Logit
    feologit y x1 i.time, group(group)
    The discrepancy leaves me with multiple questions as I am not sure which results to trust:

    How does ologit treat categorical variables? Is there a reason I shouldn’t use “i.”?

    Can feologit handle multiple fixed effects? If not, is using “i.” an effective way to introduce the second fixed effect variable?

    From my vantage point, the two models compared above should produce identical results. I’d like to understand why they don’t. Any help is much appreciated!

  • #2
    You appear to be confusing different types of "fixed effects". -felogit- fit models to longitudinal data; the "fixed effect" refers to time-invariant observation level 'effects'. You have (repeated) cross-sectional data.

    Comment


    • #3
      Hi there!

      I am dealing with a similar issue: I want to estimate an ordered logit model with person fixed effects, but the estimation takes terribly long. I am therefore considering to use -feologit- instead.

      Am I correct in my understanding that both commands should yield the same estimates when used properly?

      I tweaked Caleb's MWE to create a panel dataset, but the results from ologit and feologit still differ. Is there a mistake in my approach, or is there a fundamental difference between the two commands?

      Code:
      * Create sample dataset
      clear
      set more off
      
      * Set seed for replicability
      set seed 123
      
      * Number of observations
      local N = 1000
      
      * Generate id variable
      set obs `N'
      gen id = _n
      replace id = id-500 if _n>500
      
      *gen time:
      gen time = 1 if _n<=500
      replace time = 2 if _n>500
      
      * Generate random independent variable
      gen x1 = rnormal()
      
      * Generate random error term
      gen u = rnormal()
      
      * Generate discrete dependent variable
      gen y = .
      replace y = 1 if u > 0.5
      replace y = 2 if u <= 0.5 & u > -0.5
      replace y = 3 if u <= -0.5
      
      * Ordered Logit
      ologit y x1 i.id
      
      * Fixed Effects Ordered Logit
      feologit y x1, group(id)

      Comment


      • #4
        Originally posted by Kaja Rupieper View Post

        I want to estimate an ordered logit model with person fixed effects, but the estimation takes terribly long. I am therefore considering to use -feologit- instead.

        Am I correct in my understanding that both commands should yield the same estimates when used properly?
        In non-linear panel data models (like ordered logit), when you include dummy variables for each cross-sectional unit to account for fixed effects, the number of parameters to be estimated (the coefficients for the dummies) increases with the number of cross-sectional units \((N)\). If \(T\) (the number of time periods) is small, these fixed effect parameters are estimated imprecisely, which then contaminates the estimation of the primary coefficients of interest. This is the well known incidental parameters problem. If you have a large number of observations per fixed-effect group (e.g., \(T \geq 30\) or even more), the incidental parameters problem becomes less severe. In such cases, ologit with dummies might provide reasonable estimates, as the influence of the imprecisely estimated fixed effects diminishes. However, note that this is more of a rule of thumb than a hard-and-fast rule.

        If your ordered variable has many categories, and you're willing to make the assumption that the "distance" between categories is somewhat meaningful and constant, you could treat your ordered variable as approximately continuous and use a linear fixed effects estimator (e.g., xtreg, fe). This is analogous to using OLS (regress) instead of ologit in the cross-sectional case when you have many categories. The benefit here is that the linear FE estimator (LSDV) is consistent as \(N\rightarrow \infty\) for fixed \(T\). On the other hand, if you only have a few categories in your ordered variable, and you can theoretically justify combining them into two (e.g., "low" vs. "high" satisfaction), you can then use a conditional logit model (xtlogit, fe). Conditional logit is specifically designed to handle fixed effects in binary choice models by conditioning them out, thus avoiding the incidental parameters problem entirely. This is a very robust and widely accepted approach for binary outcomes with fixed effects.

        The procedure implemented by feologit (from SSC) is not as standard in econometrics as, say, conditional logit for binary outcomes. This is precisely why Stata (and other statistical software) might not have a built-in, highly optimized feologit command as part of its core regression suite. I thus tend to avoid such estimators.

        Comment


        • #5
          Thanks for the explanation, Andrew!

          Comment

          Working...
          X