Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1) robustness checks after hausmen test? 2) when to do winsorizing

    hello all,

    I am writing my thesis (first time). Hope someone can help me.

    1) I have a panel data with sample size=62; obs=1302; T=21 years. I first, do hausman test and verified the model (FE model), then started to do diagnostics test such as studentized residuals, leverage, log transformation. is it correct? After other tests (heteroscedasticity, autocorrelation and multicollinerity), i will determine the model with the transformation values.

    2) after the log transformation I got rid of the influencial case but residuals still show outliers. How can i be sure I should use winsorizing method?

    thanks a lot for your help

    belgin

  • #2
    Belgin:
    1) with such N and T, default standard errors seldom hold. I 'd cluster them.
    2) If you cluster your standard errors, you should switch from -hausman- to the community-contributed module -xtoverid-;
    3) I'm not a fan of winsorizing. Most of the times, the so called outliers are simply possible values that are perfectly consistent with the data generating process.
    4) as far as residual nuisance is concerned, I assume that you're talking about heteroskedasticity. Again, -vce(cluster panelid)- would take heteroskedasticity and autocorrelation into account.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo. My supervisor wants me to use Hausman test I think. It is a dissertation and I am learning panel data regression.

      Question: How can i check if the outliers are values and I can keep them

      Thanks in advance

      Comment


      • #4
        There is a massive literature on outliers, some of it helpful. Even online a search of one valuable website https://stats.stackexchange.com/ques...agged/outliers reveals 1312 questions. I would recommend looking at some of the most upvoted threads, even if that underlines that statistical people don't always agree on outliers.

        How to summarize? Well, I can only do that personally, and other views may follow.

        An outlier should be dropped from the data if and only if

        1. It is self-evidently impossible and so the result of some measurement or coding blunder. You have data on basketball players and one is reported as 12 feet tall.

        2. There are independent grounds for thinking it is a data point irrelevant to research goals. You have data on basketball players but somehow somebody else who doesn't play basketball was entered too.

        Naturally the examples are silly but we have no idea what kind of data you're dealing with beyond a guess of some kind of business or economic data.

        Comment


        • #5
          Belgin:
          your supervisor advice shoud stand the test of evidence.
          If you need to invoke non-default standard errors, you cannot go -hausman-, because -hausman- allows default standard errors only (as your suoervisor is surely aware of).
          As far as your second question is concerned, I'd only add to Nick's helpful reply, that knowing that data generating process that creates your sample is very useful. For instance we know that health care costs totalled by a sample of patients follow a positively skewed Gamma distribution, that allows very extreme values in its right tail (and legally so).
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment

          Working...
          X