Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quantile regression with clustered errors

    Good evening,

    I would like to ask a question about quantile regression with clustered standard errors. I have read the paper from Parente and Silva and I am using the command qreg2 in Stata to perform an analysis for a set of countries during a time span of 20 years. I understand that this methodology is the closest technique to a panel data estimation using quantile regression, and as I cluster standard errors by countries it is similar to a fixed effects estimation with panel data. Am I right? I really appreciate if anyone could explain me the methodology in simple words.I have been asked if this kind of analysis implies a pooling of regressions that are time series in nature, and I do not know how to answer this question.

    Thanks in advance

    ​Kind regards

  • #2
    Dear Marcus,

    Thank you for your interest in our work. I am afraid you are not right, what you are estimating is the quantile regression equivalent of pooled OLS with clustered standard errors.

    Best regards,

    Joao

    Comment


    • #3
      Thank you for your answer,

      So can you explain what it means that the standard errors are clustered by a group variable? What is the difference between general quantile regression using commands qreg or sqreg, and your command qreg2?

      The thing is that I would like to understand the methodology in order to explain it properly in my paper.

      Thanks in advance

      Comment


      • #4
        Marcus,

        As you will see, the estimates obtained with the 3 commands are the same. The difference is that -qreg2- allows you to compute "clustered standard errors".

        In a panel, it is likely that the observations for each individual are correlated over time, although observations form different individuals are independent. Therefore, to compute a valid covariance matrix you need to take this structure into account, and that is what "clustered standard errors" do. In short, if you simply use -qreg- or -sqreg- the t-tests reported are generally invalid when you estimate the model with panel data; -qreg2- allows you to by-pass that problem.

        Hope this helps but you should read about "clustered standard errors" in a good textbook.

        Joao

        Comment


        • #5
          Thank you very much again for your helpful answer,

          What I have understood is that the difference between regular quantile regression and the command qreg2 is the way they calculate standard errors

          So, the thing is that I have data for a set of countries during a time span of 20 years. Therefore my data set has a panel structure, and as you have said, if I use common quantile regression using qreg or sqreg, the covariance matrix estimated is not valid. am I right? So, the most smart thing to do is to use qreg2 which allow to solve this problem. This is what I have understood from your answer and I hope I have understood it rightly.

          I am working with this issue in my paper and I have been told this: “Does the analysis involve a pooling of regressions that are time series in nature?” and I do not know what should be the answer. Is the qreg2 command considering pooled regressions and taking into account that they are time series, as the data set has been established as panel data?

          Could you recommend me any paper or textbook to understand how clustered standard errors are calculated, to better understand them?

          Sorry for so many questions but I think that nobody can answer these questions better than one of the authors of this command and this methodology.

          Kind regards


          Comment


          • #6
            Yes, that is broadly right and indeed your data does that pooling. I suggest you have a look any of the textbooks by Wooldridge or by Cameron and Trivedi.

            Best of luck,

            Joao

            Comment


            • #7
              Thank you very much for your answer again.

              I have understood more or less everything to go on with my paper. Maybe I will ask you again some questions about this issue in the future if that does not bother you. Should I open a new post or can i follow this thread?

              Thank a lot

              Regards

              Comment


              • #8
                Sure. If it is on the same topic is it fine to use this thread, otherwise it is better to open a new one.

                Best wishes,

                Joao

                Comment


                • #9
                  In my 2010 MIT Press textbook, Econometric Analysis of Cross Section and Panel Data, 2e, Section 12.10.3, I discuss various approaches to quantile regression with panel data. As an approximation to what one might mean by "fixed effects," one can use the Mundlak-Chamberlain device. Or, for median estimation, difference or use the withing deviations in a LAD estimation. Everything that we know how to do is an approximation. I tend to prefer quantile regression with the Mundlak device.

                  I might also (immodestly) point out that in the same Section 12.10.3, I suggested the use of the same clusterd standard errors as Parente and Santos Silva. (It did not appear in the first edition, 2002.) The material actually dates back to my NBER lectures with Guido Imbens starting in 2007. Of course, I didn't do the hard work of verifying the regularity conditions. :-)

                  NBER 2007

                  Comment


                  • #10
                    Thank you for your answer Mr Wooldridge,

                    I have read the document you attached. It is possible to apply any of these techniques (Mundalk approach for instance) in STATA?, If it is not possible I assume that the better choice is to apply the Parente and Santos command using clustered standard errors.

                    Regards

                    Comment


                    • #11
                      It's pretty easy to use the Mundlak device along with the Parente/Santos Silva software. You need to compute the time averages by country for the time-varying explanatory variables.

                      Code:
                      egen x1bar = mean(x1), by(countryid)
                      egen x2bar = mean(x2), by(countryid)
                      ...
                      egen xKbar = mean(xK), by(countryid)
                      
                      qreq2 y x1 ... xK x1bar ... xKbar z1 ... zJ d2 ... dT, q(.5) cluster(countryid)
                      z1 ... zJ are time-constant variables and d2 ... dT are the time dummies. Of course you can use any quantile you want.

                      JW

                      Comment


                      • #12
                        Thank you very much for your answer Mr Wooldridge,

                        I have another question. When you use the Mundlak device with the code you have told me, which coefficients do you interpret from the results? The coefficients for the x1...xk variables, the coefficients for the x1bar....xKbar variables, or both? For instance, imagine that the coefficient for the x1 is not siginificant but for the x1bar it is highly significant, can I infer something from that?

                        Regards

                        Comment


                        • #13
                          Marcos: That's not good news in the sense that it's essentially the same result that the usual FE estimates are insignificant. If you were use OLS rather than quantile regression, the coefficients on x1 ... xK would be identical to the FE estimates. Then, we would conclude that the heterogeneity is correlated with the covariates. You're finding that any effect you find of, say, x1 when not controlling for x1bar must be treated as spurious.

                          In the regression case, testing x1bar ... xKbar is the regression-based version of the Hausman test. So, you are rejecting pooled quantile regression in favor of the Mundlak approach. Sorry, but that's how it often works out: A variable is statistically significant using pooled OLS or RE, but not when you use FE. You are finding the analogous result for quantile regression using Mundlak.

                          Comment


                          • #14
                            Thank you very much for your answer, it was very helpful

                            I have another question that maybe someone can answer me. Is there a way to choose the quantiles? How can I justify the selection of the quantiles? I would like to analyze how me dependent variable relates to the independent variables along all the distribution. Therefore I am using quantiles 0.05, 0.25. 0.5, 0.75 and 0.95? Is it correct? Should I choose another quantiles?

                            Thanks in advance

                            Comment


                            • #15
                              Good morning Mr Santos Silva and Mr Wolldridge,

                              I am sorry for bothering both of you again but I have another question related with this topic. Can any of you tell me which is the mathematical expression for the calculation of normal standard errors, and clustered standard errors in the quantile regression? I would need a mathematical expression for this if it is possible.

                              Thank you very much in advance

                              Comment

                              Working...
                              X