Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about LSDV1 and pooled OLS

    Apologies in advance that the questions probably seem elementary.

    Suppose dealing with panel data (year1-year10) and I have a model that looks like:

    Independent variable(y)= α+λ+βx+μ+Ɛ

    (where α=constant, λ=a set of (time) dummy variables, βx are some dependent variables*coefficients, μ=individual effect, Ɛ=error term)

    Based on the above equation,

    1. Difference in commands between pooled OLS and LSDV1?

    Is it simply like:
    reg y x1 x2 (for pooled ols)
    reg y x1 x2 year2-10 (for LSDV1)

    2. Is it always better to add -robust- as well? (from what I read, always add -robust- is good because same results under homoscedasticity, but correct results under heteroskedasticity, however I see most regression results people generated do not include -robust-)

    3. Difference between ?
    xtreg y x1 x2 year2-10, fe
    and reg y x1 x2 year2-10

    from what I read LSDV1 is already a fixed-effect model, so if Breusch-Pagan test and Hausman test results point me to fixed-effect model, then I could simply use LSDV1 from then on. Is that logically correct? or should I use xtreg y x1 x2 year2-10, fe?

    4.I try running -reg y x1 x2 year2-10-, (already dropped year1 to avoid multi-colinearity), the results still omitted year 10 for me, what should I do?

    Thank you very much for your time.





  • #2
    Am I violating any rules here or my questions are too elementary? Please let me know if I'm missing anything, thanks.

    Comment


    • #3
      Stephen:
      the issue with your post is that it is not clear what you're after, as your comparing different models which are not interchangeable.
      Some comments about your query follows:
      - due due a keyboard mishap you inadvertently mistook independent with dependent variables;
      -1. your code -reg y x1 x2 (for pooled ols)- in not a pooled OLS indeed, as it considers your observations as independent, violating the panel stricture of your data (-cluster- option for standard errors is needed);
      - 2. -robust- option has different functions under -regress- (correct for heteroskedasticity only) and -xtreg- (takes in account heteroskedastcity and/or autocorrelation, because both -robust- and -cluster- options invoke cluster robust standard errors);
      -3. those codes are quite different, as the first one is actually a panel data regression with -fe- specification, whereas the second is a cross-sectional OLS.

      As an aside, I fail to get why you're so interested in LSDV when you can rely on -xtreg,fe-
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you very much for answering.

        Maybe my question is too unclear, sorry, I meant given the fore-mentioned equation at the top, what would be the code for
        1. Pooled OLS 2.Fixed effect

        My problem is mainly due to the time dummy in the equation which confuses me.

        So, upon further research I believe the following commands are what I should use

        For pooled OLS: -reg y x1 x2 i.year, robust- (or cluster robust as you suggested)
        For fixed effect: -xtreg y x1 x2 i.year, fe robust-

        I think this is the way I should go, but I am not entirely sure

        Unfortunately after I try the commands, the coefficients for most/all independent variable are insignificant no matter for the pooled/fe/re method, which I don't know where the problem starts, is it possible that endogeneity is making the coefficients insignificant?

        I am sorry that I don't quite understand your reply on 1.
        I think I get the rest of your answers. When the equation stated that there are time dummies, it means when I do OLS it becomes a cross-sectional OLS right?

        Comment


        • #5
          Stephen:
          1.-reg y x1 x2- is a code for a cross-sectional (ie, one-wave of data only) regression, not for a panel dataset. Under the panel framework, observations are not independent (at best within the same panel, as the same panel unit is measured -n- times on the same variable); hence, residuals are, in all likelihood, correlated). This is an apparent violation of OLS requirements (residuals should not be l). That's why your code is OK for a cross-sectional but not for a pooled OLS, as you did not impose clustered standard errors.
          2. Time dummies are perfectly legal in both codes.At worst you might incurr in multicollinearity issue. As a general advice, please stay away from creating categorical variables/interactions by hand and exploit the wonderful capabilities of -fvvarlist- instead. You can test the joint statistical significance of -timevar- via -testparm- after you have performed -regress-/-xtreg-.
          The first one of your new code (pooled OLS) should be amended as folows:
          Code:
          reg y x1 x2 i.year , vce(cluster panelid)
          as under -regress- -robust- correct for heteroskedasticity only (whereas -robust- and -cluster- do the yery same job under -xtreg-).

          Las but not least your concern should not be focused on collecting how many statistically significant coefficients you can with a given model, but to provide a fair and true view of the data generating process in your model (endogeneity could be caused by the omission of a predictor which is requested by the data generating process).

          As a closing-out aside, please note that (as remninded by the FAQ), your chances of getting (more) helpful replies are conditional on posting what you typed (as in part you did) and what Stata gave you back (as you did not). Thanks.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Very helpful. Thank you very much Carlo.
            I also wonder why the regression result automatically eliminate 2 year dummies for me? without even saying "omitted due to collinearity" or anything at all

            For your reference I typed: reg y x i.year, vce(cluster panelid)
            My data is from 2009-2016 and it only gives coefficients for 2011-2016 with no explanation.

            I thought these are more of theoretical questions so I did not post any regression results as I am afraid it will cost you guys a lot of time.

            Thank you again

            Comment


            • #7
              Stephen:
              - please check that you do not have missing data for the second excluded year dummy (the first one is excluded by default; in this way Stata shelteers you from the so called dummy variable trap - https://en.wikipedia.org/wiki/Dummy_...le_(statistics)).
              For all Stata commands, the listwise deletion applies to observations with missing values in any of the variable.
              The issue with not posting outcome is not repliers'time (at a a very first glance I have a rough idea of the time I should devote to give the original poster an hopefully helpful reply), but the helpfulness extent the replier can reach without having a look at Stata outcome.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X