Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • --->>> Finding outlier for FE panel model

    Dear Statalist experts

    I am researching about how to detect outlier now. However, I am a bit confused whether...
    1. it is the same for finding outliers using xtreg and reg? (I'm thinking of using https://www3.nd.edu/~rwilliam/stats2/l24.pdf methods)
    2. Just to make sure whether I understand these correctly or not. According to my graph, I think there should be outliers. (weirdly, instead of positive interaction as indicated in all of my coefficients in all regressions, the scatter plot produces negative relationship instead... this might be affected by the outlier at the bottom of the graph?) so I am thinking of getting rid of extremes values and then compare whether the result change a lot or not.

    Code:
     extremes  govexp_gdp migrant_pop
    +----------------------------+
    | obs: govexp~p migran~p |
    |----------------------------|
    | 79. 9.454983 2.456626 |
    | 71. 10.2687 5.899138 |
    | 139. 10.86259 2.100434 |
    | 170. 11.15736 1.818957 |
    | 20. 11.57134 20.86153 |
    +----------------------------+

    +----------------------------+
    | 43. 25.86872 22.03337 |
    | 142. 26.48121 11.01826 |
    | 67. 27.08459 8.463334 |
    | 167. 27.36583 9.182918 |
    | 130. 29.9406 . |
    +----------------------------+



    Code:
    . xi: xtreg govexp_gdp migrant_pop  i.year, fe
    i.year _Iyear_1990-2010 (naturally coded; _Iyear_1990 omitted)


    Fixed-effects (within) regression Number of obs = 163

    Group variable: country Number of groups = 35


    R-sq: within = 0.2436 Obs per group: min = 1

    between = 0.0196 avg = 4.7

    overall = 0.0019 max = 5


    F(5,123) = 7.92

    corr(u_i, Xb) = -0.7451 Prob > F = 0.0000


    ------------------------------------------------------------------------------

    govexp_gdp | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    -------------+----------------------------------------------------------------

    migrant_pop | .4818744 .1003668 4.80 0.000 .2832044 .6805443

    _Iyear_1995 | .1822943 .5228856 0.35 0.728 -.8527256 1.217314

    _Iyear_2000 | -1.286386 .5284281 -2.43 0.016 -2.332377 -.2403948

    _Iyear_2005 | -1.169919 .5584982 -2.09 0.038 -2.275432 -.0644057

    _Iyear_2010 | -.4115732 .5954315 -0.69 0.491 -1.590193 .7670469

    _cons | 15.59851 .8436665 18.49 0.000 13.92853 17.2685





    Thank you
    Guest
    Last edited by sladmin; 02 May 2018, 08:13. Reason: anonymize poster

  • #2
    It seems you are in doubt whether there is an outlier or not, albeit the guts to exclude it.

    Most probably (virtually always, I daresay), it is not a good idea, unless it is due to mistyping.

    You may wish to produce plots and check for that matter.

    You didn't mention the Stata version, but if you are using Stata from #14 (I guess) to the current one (15.1), you won't need the xi prefix for xtreg issues.

    To end, there are several resources | models to deal with extreme values, maiming the data being the least appropriate.
    Last edited by Marcos Almeida; 05 Mar 2018, 04:42. Reason: Edited to update information on Stata version.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos

      I am still confused. I thought that the graph shown is the one that indicate whether there is an outlier?

      Or do I have to use the command
      Code:
      lvr2plot, mlabel(id)
      ? I don't understand why I can-not use this command when I use xtreg. I can only use it with reg...
      (My main focus is on the FE)

      Thank you
      Guest
      Last edited by sladmin; 02 May 2018, 08:13. Reason: anonymize poster

      Comment


      • #4
        With regards to the use - or not - of - the prefix xi, as I said, we don't it anymore if we - xtreg - in the latest Stata versions:

        Code:
        webuse nlswork
        xtset idcode
        egen mygroup = cut(hours), at(1, 25,50, 75, 100, 125, 150, 175)
        
        . xi: xtreg ln_w age tenure not_smsa south i.mygroup, fe
        i.mygroup         _Imygroup_1-150     (naturally coded; _Imygroup_1 omitted)
        
        Fixed-effects (within) regression               Number of obs     =     28,029
        Group variable: idcode                          Number of groups  =      4,698
        
        R-sq:                                           Obs per group:
             within  = 0.1483                                         min =          1
             between = 0.2739                                         avg =        6.0
             overall = 0.2080                                         max =         15
        
                                                        F(9,23322)        =     451.24
        corr(u_i, Xb)  = 0.1946                         Prob > F          =     0.0000
        
        -------------------------------------------------------------------------------
              ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        --------------+----------------------------------------------------------------
                  age |   .0128863   .0004112    31.34   0.000     .0120803    .0136923
               tenure |   .0202236   .0007978    25.35   0.000     .0186599    .0217874
             not_smsa |  -.0905847   .0096889    -9.35   0.000    -.1095755   -.0715939
                south |  -.0640467    .011105    -5.77   0.000    -.0858134   -.0422801
         _Imygroup_25 |   .0660298     .00675     9.78   0.000     .0527995    .0792602
         _Imygroup_50 |  -.0529665   .0125558    -4.22   0.000    -.0775767   -.0283563
         _Imygroup_75 |   -.508224   .0522299    -9.73   0.000     -.610598     -.40585
        _Imygroup_100 |  -.6305564   .1675232    -3.76   0.000    -.9589129   -.3021999
        _Imygroup_150 |  -2.207888   .3409224    -6.48   0.000    -2.876118   -1.539658
                _cons |   1.240113   .0135113    91.78   0.000      1.21363    1.266596
        --------------+----------------------------------------------------------------
              sigma_u |  .37512912
              sigma_e |  .29505844
                  rho |  .61779361   (fraction of variance due to u_i)
        -------------------------------------------------------------------------------
        F test that all u_i=0: F(4697, 23322) = 7.02                 Prob > F = 0.0000
        
        . xtreg ln_w age tenure not_smsa south i.mygroup, fe
        
        Fixed-effects (within) regression               Number of obs     =     28,029
        Group variable: idcode                          Number of groups  =      4,698
        
        R-sq:                                           Obs per group:
             within  = 0.1483                                         min =          1
             between = 0.2739                                         avg =        6.0
             overall = 0.2080                                         max =         15
        
                                                        F(9,23322)        =     451.24
        corr(u_i, Xb)  = 0.1946                         Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                 age |   .0128863   .0004112    31.34   0.000     .0120803    .0136923
              tenure |   .0202236   .0007978    25.35   0.000     .0186599    .0217874
            not_smsa |  -.0905847   .0096889    -9.35   0.000    -.1095755   -.0715939
               south |  -.0640467    .011105    -5.77   0.000    -.0858134   -.0422801
                     |
             mygroup |
                 25  |   .0660298     .00675     9.78   0.000     .0527995    .0792602
                 50  |  -.0529665   .0125558    -4.22   0.000    -.0775767   -.0283563
                 75  |   -.508224   .0522299    -9.73   0.000     -.610598     -.40585
                100  |  -.6305564   .1675232    -3.76   0.000    -.9589129   -.3021999
                150  |  -2.207888   .3409224    -6.48   0.000    -2.876118   -1.539658
                     |
               _cons |   1.240113   .0135113    91.78   0.000      1.21363    1.266596
        -------------+----------------------------------------------------------------
             sigma_u |  .37512912
             sigma_e |  .29505844
                 rho |  .61779361   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(4697, 23322) = 7.02                 Prob > F = 0.0000
        With regards to the outlier (s), as I highlighted, perhaps you'd better think about the core-issue, instead of thinking about getting rid of them.

        That said, you may wish to plot predicted values against the dependent variable, predicted versus residuals, etc, and include the label marker for the id, as you guessed.

        I fear to be repetitive, but if the outliers (whatsoever the criteria we use) are "real", there is no reason to throw them out.

        To end, I suggest you take a look at the postestimation commands concerning - xtreg - models.

        Hopefully that helped.
        Last edited by Marcos Almeida; 05 Mar 2018, 08:53.
        Best regards,

        Marcos

        Comment


        • #5
          I can't see a graph as you evidently can.

          You can manufacture analogues of fairly regression diagnostics using something like this after xtreg


          Code:
          predict predicted
          gen residual =
          Code:
           govexp_gdp  - predicted
          sc res predict
          sc govexp_gdp predict 


          Different point: I'd be surprised if govexp_gdp and migrant_pop were not better examined on log scale.

          Comment

          Working...
          X