Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Outliers in logit regression: all positive outcomes

    Dear all, I have a dataset on civil wars in the country-year format on a total of N = 2207. I am running a logit model estimating the effect of female empowerment on civil war outcome (binary measure about whether the challengers succeed in obtaining their aims). After this, I wanted to investigate any possible outliers defined by standardized Pearson residuals larger then 3 or smaller than - 3. My command looks like this:

    Code:
    logit war_outcome lagfpe lagpolyarchy lnlaggdp lnlagpop lnlagmilper i.region lagtimesv lagtimesv2 lagtimesv3 lagtimesnv lagtimesnv2 lagtimesnv3 coldwar, vce(cluster country_name)
    
    predict rsta, rsta;
    I then make a variable to list the observations listed as outliers.
    Code:
    .
    gen residualv = 1 if rsta>3
    replace residualv = 1 if rsta<-3
    list location year war_outcome if residualv == 1
    This produces this output:
    Code:
          +------------------------------------------------+
          |                     location   year   war_outcome |
          |------------------------------------------------|
       3. |                       Uganda   1986          1 |
      28. |                      Namibia   1988          1 |
      42. |                      Algeria   1962          1 |
     107. |                       Rwanda   1994          1 |
     108. |                     Ethiopia   1991          1 |
          |------------------------------------------------|
     130. |                      Liberia   1990          1 |
     192. |                      Burundi   1992          1 |
     286. |                       Angola   1974          1 |
     353. |                   Mozambique   1974          1 |
     355. |                      Liberia   2003          1 |
          |------------------------------------------------|
     357. |                     Ethiopia   1991          1 |
     384. |                        Congo   1997          1 |
     400. |                      Somalia   1994          1 |
     441. |                      Tunisia   1954          1 |
     504. |                Guinea-Bissau   1974          1 |
          |------------------------------------------------|
     508. |                       Rwanda   1994          1 |
     523. |                         Chad   1990          1 |
     533. | Democratic Republic of Congo   1997          1 |
     539. |     Central African Republic   2013          1 |
     583. |                      Somalia   1991          1 |
          |------------------------------------------------|
     593. |                Guinea-Bissau   1999          1 |
     624. |                       Rwanda   1961          1 |
     703. |                        Chile   1973          1 |
     733. |                         Cuba   1959          1 |
     850. |                    Nicaragua   1979          1 |
          |------------------------------------------------|
     858. |                    Argentina   1955          1 |
    1047. |                      Lebanon   1978          1 |
    1143. |                        Yemen   1967          1 |
    1150. |                     Cambodia   1979          1 |
    1263. |                  Afghanistan   1978          1 |
          |------------------------------------------------|
    1272. |                    Palestine   1948          1 |
    1299. |                         Laos   1975          1 |
    1305. |                         Laos   1949          1 |
    1376. |                     Cambodia   1975          1 |
    1620. |                    Indonesia   1949          1 |
          |------------------------------------------------|
    1636. |                    Hyderabad   1948          1 |
    1726. |                      Georgia   2008          1 |
    1746. |         Cambodia (Kampuchea)   1954          1 |
    1858. |                        China   1949          1 |
    1882. |                  Afghanistan   1996          1 |
          |------------------------------------------------|
    1884. |                  Afghanistan   1988          1 |
    1896. |                     Pakistan   1971          1 |
    1934. |                        Yemen   1948          1 |
    1966. |                       Cyprus   1959          1 |
    1974. |                      Vietnam   1954          1 |
          |------------------------------------------------|
    2148. |                     Slovenia   1991          1 |
    2195. |                      Romania   1989          1 |
    2197. |                         Fiji   1987          1 |
          +------------------------------------------------+
    However, as you see, I have a lot of outliers. All the outliers have the value 1 on war_outcome, and together make up almost all of my sample of positive outcomes for war_outcome (48 out of 56 instances). I am unsure how to interpret this, and whether the problem lies in my data or interpretation of the logit model/binary dependent variable or in my limited Stata knowledge.

    Any help would be greatly appreciated!

    (Stata SE 17)

  • #2
    The way to answer this is to look more closely at the results, e.g. by using predict to get fitted probabilities too. For example, if most values are 0 and the fitted function is nearly flat then all the values of 1 will be associated with large residuals.

    This demonstration is extreme to make the point.

    Code:
    clear
    set obs 100
    set seed 2803
    gen y = runiform() < 0.1
    gen x = _n
    logit y x
    predict rsta, rsta
    predict fitted
    su 
    
    scatter rsta fitted 
    scatter rsta y

    Comment


    • #3
      That makes very much sense!

      I will look more closely at my results - thank you Nick!

      Comment

      Working...
      X