Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear Regression or Panel Regression

    Hello

    I just read a paper and would like to try to replicate it. However, I don't exactly understand whether the author of the paper is doing a linear regression or a panel regression. His regression equation looks like this

    Click image for larger version

Name:	Bildschirmfoto 2019-07-13 um 22.31.05.png
Views:	1
Size:	10.9 KB
ID:	1507513


    with y_it denoting subjective inequality indices for individual i in year t. The regressor of main interest is East_it, a dummy variable denoting current residency in East Germany. The regression also includes a series of control variables and survey-year fixed effects, denoted by x_it and λ_t, respectively.


    I know that he has a dataset for 3 years (1987, 1992 and 1999) in which he performs the regression, but he does not mention whether he is performing a linear regression or a panel regression, he only mentions, and I quote: "I run a series of simple regression models".
    What do you think the author does in his paper a panel regression or a linear regression? I'm confused because I'm not quite sure if you can do a linear regression with a dataset over 3 years with a dummy.

    Thank you so much for your help.
    Last edited by Lucca Mancini; 13 Jul 2019, 14:49.

  • #2
    My guess is panel regression, e.g. xtreg. That is why there are all these t subscripts. I don't see why you couldn't have the dummy.

    When you say you are going to try to replicate, do you mean that you have the same data? If so you should be able to tell whether you are doing it the same way or not.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    Stata Version: 16.0MP (2 processor)

    EMAIL: rwilliam@ND.Edu
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thank you Mr Williams for your feedback.

      I try to estimate the same equation with the same data, but compared to the author I have only data sets of 2 years (1992 and 1999) whereas the author has over three years (1987, 1992 and 1999).

      I'm unsure whether to do panel regression or linear regression with a 2 year dataset. I don't know where the difference lies.

      Comment


      • #4
        with y_it denoting subjective inequality indices for individual i in year t.
        This already answers your question. You have data for both i and t , so it's panel data.
        I'm unsure whether to do panel regression or linear regression with a 2 year dataset.
        A linear regression is a regression where you estimate a linear relationship between your y and x variables. That is the case above. Thus, it's a linear regression with panel data. Panel data doesn't mean that you cannot do linear regression.

        Comment


        • #5
          Originally posted by Wouter Wakker View Post
          This already answers your question. You have data for both i and t , so it's panel data.

          A linear regression is a regression where you estimate a linear relationship between your y and x variables. That is the case above. Thus, it's a linear regression with panel data. Panel data doesn't mean that you cannot do linear regression.
          I agree with Wouter. Empirical researchers would do well to remember the difference between an estimation method and a model. The equation you posted is a model. It an be estimated many different ways. By "regression" you presumably mean "pooled OLS." One could also use random effects, which is a particular GLS estimator. Or, one could use fixed effects, which removes time averages. Yes, these estimation methods are intended for different models, but any estimation method can be applied to the problem.

          I would guess that removing heterogeneity is important to infer causality, and so I would tend to use fixed effects. But I almost always check pooled OLS, and maybe even random effects, and also maybe first differencing. One can learn a lot by doing all four.

          And as was mentioned above, there is no issue in East(i,t) being a dummy variable. It gets transformed just like any other variable. Don't overthink. Pick up a panel data book and you'll notice that no special treatment is given to dummy variables.

          With two years of data, FE and FD will be the same. So there are not two separate ways to remove heterogeneity.

          JW

          Comment


          • #6
            I will add that the fact that the error term, epsilon(i,t), in the equation, is not separated into a heterogeneity term and an idiosyncratic error, strong suggests the original study did not use FE or FD. My guess is pooled OLS was used. It's like that there is little variation in the variable East(i,t) across t for each individual, and so FE probably wipes out the effect.

            If pooled OLS is used, cluster the standard errors at a minimum. And I would use FE to see what happens.

            Comment


            • #7
              Thank you very much Mr Wooldridge

              I would like to show here the table of the paper to be replicated in which the regression equation shown above was applied over the years 1987, 1992 and 1999 for region east and west:
              Click image for larger version

Name:	Bildschirmfoto 2019-07-14 um 17.36.21.png
Views:	1
Size:	74.6 KB
ID:	1507594




              However, when I try to do a pooled OLS regression, I get the answer "repeated time values within panel" as soon as I type "xtset panel_id year". I don't know exactly why I get this error message.

              I have for region and panel_id 2 values (East and West) and for year as well (1992 and 1999). However, these are survey data from 2 years in which the same persons were not interviewed.

              - In 1992, 579 participants from East and 170 participants from West participated.

              - In 1999, 678 participants from East and 201 participants from West participated.

              Can one estimate a pooled OLS with these data? Or should I use a different approach?

              Comment


              • #8
                If the same people aren’t being interviewed, it isn’t panel data, is it? It is successive cross-sections. Which to me seems to go counter to what I thought you were saying before.
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                Stata Version: 16.0MP (2 processor)

                EMAIL: rwilliam@ND.Edu
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  Absolutely correct, Mr Williams, these are successive cross-sections. Do you have any idea what kind of regression you should use here to create a table similar to the one above? Maybe a linear regression?

                  Comment


                  • #10
                    It looks like an ols regression to me, e.g.

                    reg y i.year i.region i.region#i.year

                    if you really have the same data I would think you could reproduce the results for the two years you have.
                    -------------------------------------------
                    Richard Williams, Notre Dame Dept of Sociology
                    Stata Version: 16.0MP (2 processor)

                    EMAIL: rwilliam@ND.Edu
                    WWW: https://www3.nd.edu/~rwilliam

                    Comment


                    • #11
                      Thank you very much for your help Mr Williams. I appreciate it very much.

                      Comment


                      • #12
                        What is the citation for the paper? The notation still seems weird to me if it isn’t panel data. But maybe it makes sense in context. Or, maybe it is just wrong.
                        -------------------------------------------
                        Richard Williams, Notre Dame Dept of Sociology
                        Stata Version: 16.0MP (2 processor)

                        EMAIL: rwilliam@ND.Edu
                        WWW: https://www3.nd.edu/~rwilliam

                        Comment


                        • #13
                          Dear Mr Williams

                          The following link contains the paper: http://www.econ.uzh.ch/static/wp/econwp009.pdf

                          Comment


                          • #14
                            The notation in the equation is misleading. This looks like repeated cross sections, not panel data. With East(i,t) in the equation one gets the impression that not only are the same individuals being followed over time, but that some individuals switch from East to West, or vice versa.

                            The analysis is like a difference in differences, where East acts like the treatment variable. Then it is interacted with the time dummies. In this case, the interest is in how perceptions have changed over time in the East versus the west.

                            Estimate using OLS, as suggested by Richard. I would make the standard errors robust to heteroskedasticity.

                            Comment

                            Working...
                            X