Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using GEE (xtgee) as alternative regression for OLS for non-panel data?

    Hello everyone,
    I hope you are doing well?

    I ran a regression model via the reg command (OLS regression as I have normally distributed data) and would like to check an alternative regression model to validate that my results are not driven by the choice of my regression type (e.g., for a binary I tested logit as robustness check for probit).

    I was wondering if I can run a GEE model, specifically a xtgee model, even if I dont have panel data?

    To describe my data structure, I am looking at failed startups, so basically the point/date at which a company filed for bankruptcy is my unit of observation and everything the company managed up until that point (e.g., sum of funding the company received, how old it is, how many people founded it etc.).

    I thought about using "xtset" in combination with either the founding year or industry and then run the xtgee model, but wanted to check upfront if this is possible and what alternative to a "reg" OLS regression I can run?


    Thanks a lot for your time and support!

  • #2
    if I understand you correctly, and with no data example I am not at all sure I do, you could use -xtgee- as follows:

    1. xtset using some numeric variable that is distinct for every observation
    2. below is an example using the data-supplied auto data set:
    Code:
    encode make, gen(make2)
    xtset make2
    
    xtgee price weight turn, fam(gaussian) link(identity) corr(ind)
    
    Iteration 1:  Tolerance = 4.075e-14
    
    GEE population-averaged model                     Number of obs    =        74
    Group variable: make2                             Number of groups =        74
    Family: Gaussian                                  Obs per group:  
    Link:   Identity                                               min =         1
    Correlation: independent                                       avg =       1.0
                                                                   max =         1
                                                      Wald chi2(2)     =     44.89
    Scale parameter = 5341429                         Prob > chi2      =    0.0000
    
    Pearson chi2(74)     = 3.953e+08                  Deviance         = 3.953e+08
    Dispersion (Pearson) =   5341429                  Dispersion       =   5341429
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |   3.914597   .6763724     5.79   0.000     2.588932    5.240263
            turn |  -385.3904   119.4885    -3.23   0.001    -619.5836   -151.1972
           _cons |   9625.498   3177.312     3.03   0.002     3398.081    15852.92
    ------------------------------------------------------------------------------
    this matches the regress command for this data:
    Code:
    . regress price weight turn
    
          Source |       SS           df       MS      Number of obs   =        74
    -------------+----------------------------------   F(2, 71)        =     21.54
           Model |   239799649         2   119899825   Prob > F        =    0.0000
        Residual |   395265747        71  5567123.19   R-squared       =    0.3776
    -------------+----------------------------------   Adj R-squared   =    0.3601
           Total |   635065396        73  8699525.97   Root MSE        =    2359.5
    
    ------------------------------------------------------------------------------
           price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
          weight |   3.914597   .6905142     5.67   0.000     2.537751    5.291444
            turn |  -385.3904   121.9868    -3.16   0.002    -628.6252   -142.1556
           _cons |   9625.498   3243.744     2.97   0.004     3157.656    16093.34
    ------------------------------------------------------------------------------
    forgot to include: there is a table in the manual (p. 171 of the xtgee manual entry, there is a table showing "Some family(), link(), and corr() combinations result in models already fit by Stata:"

    added clarification: since xtgee set up as I did is exactly the same as regress, it cannot be used for what I think is your purpose in #1 above
    Last edited by Rich Goldstein; 17 Jun 2025, 06:38.

    Comment


    • #3
      With your data structure, I don't see what value GEE would be. If you have a particular cluster structure -- for example, a policy is implemented at the industry level, and you have cross-sectional firm-level data -- then you might use xtgee (or just xtreg with the re option). But I don't really see that here.

      By the way, GEE is an estimation method, not a model. And there is nothing wrong with using OLS regression in your case with heteroskedasticity-robust standard errors.

      Another comment: I don't understand the question being asked. You say you have a single time period for each firm when the firm failed. What is the outcome you're modeling?

      Comment


      • #4
        Sounds like a survival model--do you know both start and finish dates?

        Comment

        Working...
        X