Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thank you Carlo for your response.

    Final question:

    would it be alright if I say that the coefficients are not biased per se if you have non normality of the residuals, but the p values could (although it could indicate some mis specification or OVB which would then lead to imprecise coefficients)

    And 2: is there some way to see for which districts the effect of some variables on the outcome don’t hold? I think it would be just as interesting to see if there is no effect for some districts. The only thing I could come up is stated in #14 and that would basically be splitting the sample and running the regression (which is far from perfect) or looking at the difference in residuals when including the policy variables and without. Sadly my advisor did not know but said that I definitely should…

    Comment


    • #17
      Mimina:
      1) that's correct. In addition, your previous test does not show evidence of misspecification of the functional form of the regressand: hence I would avoid to mention what you reported between brackets (lines 3 and 4 from above);
      2) not that I know. Just to fullfil your supervisor's request, you may want to perform a scenario sensitivity analysis in which you perform a set of regression limiting your sample size at the observations with the "weird" residuals in the overall model.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #18
        Thank you Carlo. Greatly appreciated. Since I just tested for the functional form and not OVB per se, could I still say there might be some OVB based on the qnorm plot (ofc there could always be some).
        I will look into additionally running different regressions and see if the effect might be different although I believe I then should be careful when comparing coefficients.

        Edit: Would it perhaps be more useful to include some sort of three way interaction term with the policy variables? Then I can see if there is a difference between sub groups.
        When doing the sample split and running a separate regression based on the weird residuals, I am interacting every variable including the fixed effects. At most I could then see if there is no effect based on that sub sample and test if the coefficients significantly differ across the sample split right? I have often run regressions with an interaction term because that allowed me to actually compare. But I don’t know if that is most appropriate when I expect there might be no effect at all for some group.

        My advisor said these questions are very difficult for her, but I should definitely do something.
        Last edited by Mimina John; 06 Jul 2022, 12:56.

        Comment


        • #19
          Mimina:
          not later than yesterday, I came across a regression whose residuals followed a Student t distribution.
          The degrees of freedom were 60 (that is less than >120 cutoff reported on statistical tables that makes a Student t distribution equivalent to a Normal one).
          However (as expected) there was no evidence of heteroskedasticity.
          Therefore, non-normality and homoskedasticity of the residual distribution can safely live together.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #20
            Thank you Carlo. I have only now opened the forum again. I was wondering one more thing. Today my advisor told me I should not cluster on ID level since I only have one cross section(district) per ID and thus per cluster. I have ofcourse multiple time periods per ID, but my advisor did not know for sure. If I cluster on state level, I only get 14 clusters, but my advisor told me you should always have multiple ID’s in each cluster. My advisor also told me non normality=Heteroskedasticity btw. Many thanks.

            Comment


            • #21
              Mimina:
              1) clustering with a limited number of panels (ie, less than 30-50 according different approach) is not advisable, as the resulting standard errors may well be more isleading than their default counterparts;
              2) while it is true that oftentimes non-normality and heteroskedasrìticity gi hand in hand, a Student t-shaped residual distribution is an example contra.
              In addition, residual normality and homoskedastcity of the residual distribution are two different OLS requirements (see among many others, https://www.wiley.com/en-us/Introduc...-9780470032701 page 66; admittedly, I judged the book by its cover ).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #22
                Thank you Carlo as always.

                I brought both points to the attention of my advisor actually and I am fully sold on the second point of yours, I am confused about the first one. I firmly agree you should not cluster with not too many clusters as it can make things even worse. I was just wondering if what my advisors says is true: that you should not only have multiple time periods within each cluster but also multiple ID’s. Is this correct?


                I was always told that usually you cluster on ID level. I have district level data (ID) and therefore I cluster on ID correct? If I would cluster on state, I would get multiple ID’s in the cluster what my advisor suggested but only 14 clusters. Moreover my treatment and variables are on ID level (district) and I believe errors to be more to be correlated within districts than within states

                Comment


                • #23
                  Mimina:
                  1) it depends on what you're clustering on. Usually, you cluster on -panelid- and you have multiple time periods per each unit.
                  2) if you cluster on a different variable (say, district), you will have both multiple panels and time periods per district (and this one seems to be your case, if I get you right).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #24
                    1) yes I have multiple time periods per unit and I clustered on ID/unit level. I was now told by my advisor this is always wrong. And that I should cluster on a larger level (14 clusters then) so that you have multiple panels/ ID’s per cluster and multiple time periods.
                    2) I originally clustered on ID level since a) all my variables are on that level, b) my policy is on that level c) I believe errors to be correlated within each district/ID and d) I have multiple time periods for each ID.

                    I was wondering if what my advisor was saying is correct since it sounds very wrong to state that you have to have multiple ID’s per cluster and not just time periods and that you can run a robust regression with 14 clusters.

                    Comment


                    • #25
                      Mimina:
                      I've probably missed out on something, then. Let's give it another shotm trying to be clearer:
                      1) clustering on panelid, as you did, is correct.
                      2) clustering on any other variables that produces 14 clusters is not the way to go, as this may cause your clustered-robust standard errors to be biased and misleading (set aside any issue that your advisors brought up).
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #26
                        Thank you Carlo for clarifying that clustering on panel is not per se always incorrect due to the fact you don’t have multiple ID’s in you cluster. That sounded like a very strong statement to make. 14 clusters does sound incorrect indeed. I can however see the good points of clustering on a larger level if you have more clusters than 14 like I do.

                        Comment

                        Working...
                        X