Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why Ineqdeco produces different total Ginis if the bygroup option is used or not?

    Dear Stephen and rest of Statalist members

    The question was raised in 2015, but it was not answered perhaps because it did not mention Ineqdeco in the title ( https://www.statalist.org/forums/for...ue-and-command)

    Ineqdeco (from SSC, author S. Jenkins) produces different Ginis when the “by subgroup” option is used or not.
    The values not used in calculations differ in each case, but why Ineqdeco loses more values when the option is not used? and is there any other reason?



    Using for example the LISSY interface of the Luxembourg Income Study database

    use $ca00h
    qui ineqdeco dpi [w=hwgt], by(region_c)
    di r(gini)

    (ca00: version 7.0 13 Jun 2018 17:31)
    (Warning: dpi has 60 values < 0. Not used in calculations)
    .36032192


    use $ca00h
    qui ineqdeco dpi [w=hwgt]
    di r(gini)

    (ca00: version 7.0 13 Jun 2018 17:31)
    (Warning: dpi has 68 values < 0. Not used in calculations)
    .37266473
    Last edited by Juan Vicente; 30 Sep 2018, 18:40.

  • #2
    I do not have access to Stata right now. Meanwhile please show the results of typing

    Code:
    which ineqdeco
    which ineqdec0
    summarize dpi [aw = hwgt]
    tabulate region_c [aw = hwgt], summ(dpi)
    Please run these without -quietly- while also rerunning the commands /output you report above. Then repeat using -ineqdec0- rather than -ineqdeco-.
    Also please ensure that you use CODE delimiters for legibility's sake.
    Please also report some additional checks on the number of missing values separately on dpi, on region_c, as well as how many are missing on dpi by region. Repeat these diagnostic checks again but counting the number of zero and negative values instead.
    Last edited by Stephen Jenkins; 01 Oct 2018, 00:58.

    Comment


    • #3
      I hope it's all you've asked me

      Code:
      use $ca00h
      which ineqdeco
      which ineqdec0
      summarize dpi [aw = hwgt]
      tabulate region_c [aw = hwgt], summ(dpi)
      clear
       
      use $ca00h
      sort region_c
      by region_c: count if dpi==0
      by region_c: count if dpi<=0
      by region_c: count if dpi==.
      clear
       
      use $ca00h
      ineqdeco dpi [w=hwgt]
      di r(gini)
      clear
       
      use $ca00h
      ineqdeco dpi [w=hwgt], by(region_c)
      di r(gini)
      clear
       
      use $ca00h
      ineqdec0 dpi [w=hwgt]
      di r(gini)
      clear
       
      use $ca00h
      ineqdec0 dpi [w=hwgt], by(region_c)
      di r(gini)
      clear
      results:

      __________________________
      . which ineqdeco
      /media/inc/pack/ado/ineqdeco.ado
      *! 2.0.1 SPJ August 2006 (new vbles created as doubles)
      *! 2.0.0 SPJ August 2006 (port to Stata 8.2; additional saved results),
      *! with initial code rewriting contribution from Nick Cox (many thanks!)
      *! version 1.6 April 2001 (made compatible with Stata 7; SSC)
      *! version 1.0.1 Stephen P. Jenkins, April 1998 STB-48 sg104
      *! Inequality indices, with optional decomposition by population subgroups
      ___________________________
      . which ineqdec0
      /media/inc/pack/ado/ineqdec0.ado
      *! 2.0.2 SPJ May 2008 (fix bug arising if bygroup() and `touse' lead to no obs in a group)
      *! bug fix method provided by Austin Nichols (many thanks!)
      *! 2.0.1 SPJ August 2006 (new vbles created as doubles)
      *! 2.0.0 SPJ August 2006 (port to Stata 8.2; additional saved results),
      *! with initial code rewriting contribution from Nick Cox (many thanks!)
      *! version 1.6 April 2001 (made compatible with Stata 7; SSC)
      *! version 1.0.1 Stephen P. Jenkins, April 1998 STB-48 sg104
      *! Inequality indices, with optional decomposition by population subgroups
      __________________________

      . use $ca00h
      (ca00: version 7.0 13 Jun 2018 17:31)

      _______________________________

      . ineqdeco dpi [w=hwgt]
      (analytic weights assumed)
      Warning: dpi has 68 values < 0. Not used in calculations
      GE2 .35033 Gini .37266

      ineqdeco dpi [w=hwgt], by(region_c)
      (analytic weights assumed)
      Warning: dpi has 60 values < 0. Not used in calculations
      GE2 .24749 Gini .36032

      . ineqdec0 dpi [w=hwgt]
      (analytic weights assumed)
      Warning: dpi has 68 values < 0. Not used in calculations
      GE2 .35535 Gini .37602

      . ineqdec0 dpi [w=hwgt], by(region_c)
      (analytic weights assumed)
      Warning: dpi has 60 values < 0. Not used in calculations
      GE2 .25051 Gini .36285
      Last edited by Juan Vicente; 01 Oct 2018, 03:31.

      Comment


      • #4
        Please put your output within CODE delimiters too, interspersed with the commands that produced the output. Also you appear to have edited the output. (Please follow FAQ requests to the letter, especially point 12.1.)

        You have the latest versions of ineqdeco and ineqdec0.

        To be honest, this output is not informative at all because you haven't reported all the information I asked for. I think the most important thing you need to investigate is the pattern of missing values on dpi and region_c (each separately) and also how many missing values there are for both. The missings command may be useful for generating this information.

        My guess is that the results that you think are inconsistent are arising because of different samples being used, and the different samples arise because of the patterns of missingness on the relevant variables. [I'm assuming there are no missing values on hwgt.] The inequality measure computed for (1) the sample with non-missing values on dpi will likely differ from the inequality measure computed from (2) the sample with non-missing values for both dpi and region. (In this case, the issue is with the data, not the program doing the inequality calculations.)

        Try running the following code (nb without weights!), and show us, within CODE delimiters, the code run and the output produced

        Code:
        use $ca00h
        summarize dpi, detail 
        tabulate region_c, missing
        tabulate region_c , summ(dpi)
        
        missings report dpi region_c
        
        sort region_c
        by region_c: count if dpi ==  0
        by region_c: count if dpi <= 0
        by region_c: count if missing(dpi) 
        
        count if !missing(dpi)
        count if !missing(region_c)
        
        count if !missing(dpi) & !missing(region_c)
        count if  missing(dpi) & !missing(region_c)
        count if !missing(dpi) &  missing(region_c) 
        count if  missing(dpi) &  missing(region_c)

        Comment


        • #5
          Code #4
          Output:
          Code:
          . use $ca00h 
          (ca00: version 7.0 13 Jun 2018 17:31)
          
          . summarize dpi, detail  
          
                        cash disposable household income
          -------------------------------------------------------------
                Percentiles      Smallest
           1%         2550        -127660
           5%         9740        -112165
          10%        13230         -62180       Obs              28,970
          25%        21760         -38250       Sum of Wgt.      28,970
          
          50%        35555                      Mean            41698.3
                                  Largest       Std. Dev.      31587.43
          75%        54395         804415
          90%      76027.5         826905       Variance       9.98e+08
          95%        90920         900420       Skewness       5.527833
          99%       139560         901550       Kurtosis       99.69825
          
          . tabulate region_c, missing 
          
                          Province |      Freq.     Percent        Cum.
          -------------------------+-----------------------------------
                  [10]Newfoundland |      1,177        4.06        4.06
          [11]Prince Edward Island |        821        2.83        6.90
                   [12]Nova Scotia |      1,954        6.74       13.64
                 [13]New Brunswick |      1,728        5.96       19.61
                        [24]Quebec |      5,755       19.87       39.47
                       [35]Ontario |      8,384       28.94       68.41
                      [46]Manitoba |      2,172        7.50       75.91
                  [47]Saskatchewan |      2,034        7.02       82.93
                       [48]Alberta |      2,342        8.08       91.01
              [59]British Columbia |      2,514        8.68       99.69
                                 . |         89        0.31      100.00
          -------------------------+-----------------------------------
                             Total |     28,970      100.00
          
          . tabulate region_c , summ(dpi) 
          
                      |     Summary of cash disposable
                      |          household income
             Province |        Mean   Std. Dev.       Freq.
          ------------+------------------------------------
            [10]Newfo |   34174.618   21960.729       1,177
            [11]Princ |   36707.454   24328.074         821
            [12]Nova  |   36660.647   25109.995       1,954
            [13]New B |    38264.69   25192.126       1,728
            [24]Quebe |   36560.499   24352.485       5,755
            [35]Ontar |   47967.088   31284.977       8,384
            [46]Manit |   38090.808   24020.157       2,172
            [47]Saska |   38079.014   24713.589       2,034
            [48]Alber |   44918.576   29194.706       2,342
            [59]Briti |   41105.561   27236.329       2,514
          ------------+------------------------------------
                Total |   41182.889   27638.925      28,881
          
          .  
          . missings report dpi region_c 
          command missings is unrecognized
          r(199);
          
          .  
          . sort region_c 
          
          . by region_c: count if dpi ==  0 
          
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [10]Newfoundland
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [11]Prince Edward Island
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [12]Nova Scotia
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [13]New Brunswick
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [24]Quebec
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [35]Ontario
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [46]Manitoba
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [47]Saskatchewan
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [48]Alberta
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [59]British Columbia
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = .
            0
          
          . by region_c: count if dpi <= 0 
          
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [10]Newfoundland
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [11]Prince Edward Island
            2
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [12]Nova Scotia
            3
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [13]New Brunswick
            3
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [24]Quebec
            8
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [35]Ontario
            22
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [46]Manitoba
            5
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [47]Saskatchewan
            3
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [48]Alberta
            8
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [59]British Columbia
            6
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = .
            8
          
          . by region_c: count if missing(dpi)  
          
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [10]Newfoundland
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [11]Prince Edward Island
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [12]Nova Scotia
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [13]New Brunswick
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [24]Quebec
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [35]Ontario
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [46]Manitoba
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [47]Saskatchewan
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [48]Alberta
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = [59]British Columbia
            0
          --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
          -> region_c = .
            0
          
          .  
          . count if !missing(dpi) 
            28,970
          
          . count if !missing(region_c) 
            28,881
          
          .  
          . count if !missing(dpi) & !missing(region_c) 
            28,881
          
          . count if  missing(dpi) & !missing(region_c) 
            0
          
          . count if !missing(dpi) &  missing(region_c)  
            89
          
          . count if  missing(dpi) &  missing(region_c)
            0

          Comment


          • #6
            Thank you. (BTW missings is downloadable from the SJ; sorry, I thought it was built-in. That's why you got an error message. search missings)

            Whatever, my conjecture is correct: you have 89 cases with missing values for region_c and so, as I said,
            ... the results that you think are inconsistent are arising because of different samples being used, and the different samples arise because of the patterns of missingness on the relevant variables. ... The inequality measure computed for (1) the sample with non-missing values on dpi will likely differ from the inequality measure computed from (2) the sample with non-missing values for both dpi and region. (In this case, the issue is with the data, not the program doing the inequality calculations.)
            In sum, I'm confident that this explains why you get the results that you do. If subgroup decomposition (by region) is the focus of your project, then I guess you'll have to use the all-regions-combined inequality indices that arises from the by-group use of ineqdeco or ineqdec0 -- that ensures consistency. Or go back and try and address the missing value problem in the region variable.

            The more general lesson is that detailed analysis of the "descriptives" associated with your variables of interest, including patterns of missingness, is always a valuable exercise to do.

            Comment


            • #7
              Thanks Stephen, now I see it. They are 89 observations with region=. (8 with dpi<0) . So ineqdeco uses 81 observations more than ineqdeco by region and they have a dpi higher than the average. Then Ineqdeco produces a greater Gini because uses a more unequal sample.

              The error with missings is because it is not available for remote execution in the Lissy interface of LIS. My join date to Statalist explain the rest of failures.



              Comment


              • #8
                A last point in relation to
                Code:
                you'll have to use the all-regions-combined inequality indices that arises from the by-group use of ineqdeco or ineqdec0 -- that ensures consistency
                I think that the alternative recode groupvar .=[99]extraregio also ensures consistency and keeps global inequality at its real level.
                And of course I completely agree with you about that
                Code:
                analysis of the "descriptives" associated with your variables of interest, including patterns of missingness, is always a valuable exercise to do.
                Thanks again
                Last edited by Juan Vicente; 02 Oct 2018, 01:09.

                Comment


                • #9
                  Well, converting the missing values on region into their own special 'region' category mechanically 'solves' the problem -- in the sense of giving the same set of observations as in the sample of obs with non-missing dpi. But if subgroup decomposition is a focus of your analysis (relatiing inequality within and between regions to total inequality), then you might want to ponder further on what the implications are of your new artificial region category.

                  Comment


                  • #10
                    On a different level: please note that the button marked with a left or opening quotation mark can be used for entering quotations while the button marked # is for entering Stata code..

                    Comment


                    • #11
                      Thanks Nick. I will do so. I am learning a lot since my join date.

                      Comment

                      Working...
                      X