Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summing a group of observations from a variable based on two different variables

    Hello, i am trying to create an index of ethnic fractiolization using this formula:

    Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	3.0 KB
ID:	1428211
    Where s ij is the share of group I (i=1…N) in country j. Shows the probability that two randomly selected individuals from a nations population belongs to different ethnic groups.

    The ethnic group is groupid, and size is the size of said ethnic group at that time. I tried to do :

    bysort statename from:gen test =1-sum((size^2))

    bysort statename from: egen test1=1-sum(size^2)

    But those didn't work.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str27 statename int(from to) long groupid float size
    "United States"       1946 1965  1000  .691
    "United States"       1946 1965  2000  .125
    "United States"       1946 1965  3000  .124
    "United States"       1946 1965  4000  .036
    "United States"       1946 1965  5000 .0078
    "United States"       1946 1965  6000 .0042
    "United States"       1966 2008  1000  .691
    "United States"       1966 2008  2000  .125
    "United States"       1966 2008  3000  .124
    "United States"       1966 2008  4000  .036
    "United States"       1966 2008  5000 .0078
    "United States"       1966 2008  6000 .0042
    "United States"       2009 2013  1000   .66
    "United States"       2009 2013  2000   .15
    "United States"       2009 2013  3000  .124
    "United States"       2009 2013  4000  .044
    "United States"       2009 2013  5000 .0078
    "United States"       2009 2013  6000  .005
    "Canada"              1946 1981  1000   .59
    "Canada"              1946 1981  2000   .28
    "Canada"              1946 1981  4000   .02
    "Canada"              1982 2013  1000   .59
    "Canada"              1982 2013  2000   .23
    "Canada"              1982 2013  4000  .028
    "Cuba"                1946 1959  1000  .641
    "Cuba"                1946 1959  2000  .359
    "Cuba"                1960 2013  1000  .641
    "Cuba"                1960 2013  2000  .359
    "Haiti"               1946 2013  1000     1
    "Dominican Republic"  1946 2013  1000     1
    "Jamaica"             1962 2013  1000     1
    "Trinidad and Tobago" 1962 1986  1000    .4
    "Trinidad and Tobago" 1962 1986  2000  .375
    "Trinidad and Tobago" 1987 1991  1000    .4
    "Trinidad and Tobago" 1987 1991  2000  .375
    "Trinidad and Tobago" 1992 1995  1000    .4
    "Trinidad and Tobago" 1992 1995  2000  .375
    "Trinidad and Tobago" 1996 2002  1000    .4
    "Trinidad and Tobago" 1996 2002  2000  .375
    "Trinidad and Tobago" 2003 2010  1000    .4
    "Trinidad and Tobago" 2003 2010  2000  .375
    "Trinidad and Tobago" 2011 2013  1000  .354
    "Trinidad and Tobago" 2011 2013  2000  .342
    "Mexico"              1946 1974  1000  .801
    "Mexico"              1946 1974  3000  .142
    "Mexico"              1946 1974  9000   .05
    "Mexico"              1946 1974  2000  .007
    "Mexico"              1975 1997  1000  .801
    "Mexico"              1975 1997  3000  .142
    "Mexico"              1975 1997  9000   .05
    "Mexico"              1975 1997  2000  .007
    "Mexico"              1998 2013  1000  .801
    "Mexico"              1998 2013  3000  .142
    "Mexico"              1998 2013  9000   .05
    "Mexico"              1998 2013  2000  .007
    "Guatemala"           1946 1954  2000  .515
    "Guatemala"           1946 1954  1000   .48
    "Guatemala"           1946 1954  4000  .003
    "Guatemala"           1946 1954  3000  .002
    "Guatemala"           1955 1995  2000  .515
    "Guatemala"           1955 1995  1000   .48
    "Guatemala"           1955 1995  4000  .003
    "Guatemala"           1955 1995  3000  .002
    "Guatemala"           1996 2013  2000  .515
    "Guatemala"           1996 2013  1000   .48
    "Guatemala"           1996 2013  4000  .003
    "Guatemala"           1996 2013  3000  .002
    "Honduras"            1946 1974  6000   .91
    "Honduras"            1946 1974  7000   .07
    "Honduras"            1946 1974 10000  .016
    "Honduras"            1975 1989  6000   .91
    "Honduras"            1975 1989  7000   .07
    "Honduras"            1975 1989 10000  .016
    "Honduras"            1990 2013  6000   .91
    "Honduras"            1990 2013  7000   .07
    "Honduras"            1990 2013 10000  .016
    "El Salvador"         1946 1994  1000    .9
    "El Salvador"         1946 1994  2000    .1
    "El Salvador"         1995 2013  1000    .9
    "El Salvador"         1995 2013  2000    .1
    "Nicaragua"           1946 1987  1000   .86
    "Nicaragua"           1946 1987  2000   .09
    "Nicaragua"           1946 1987  3000  .035
    "Nicaragua"           1946 1987  5000  .002
    "Nicaragua"           1988 2013  1000   .86
    "Nicaragua"           1988 2013  2000   .09
    "Nicaragua"           1988 2013  3000  .035
    "Nicaragua"           1988 2013  5000  .002
    "Costa Rica"          1946 1953  1000  .937
    "Costa Rica"          1946 1953  2000  .019
    "Costa Rica"          1946 1953  3000  .017
    "Costa Rica"          1954 1991  1000  .937
    "Costa Rica"          1954 1991  2000  .019
    "Costa Rica"          1954 1991  3000  .017
    "Costa Rica"          1992 2013  1000  .937
    "Costa Rica"          1992 2013  2000  .019
    "Costa Rica"          1992 2013  3000  .017
    "Panama"              1946 1959  1000    .8
    "Panama"              1946 1959  2000   .14
    "Panama"              1946 1959  3000   .04
    end

  • #2
    What you need here is:
    Code:
    bysort statename from: egen index = total(size^2)
    replace index = 1-index
    The -sum()- function is the "running" sum; see -help sum()-

    On a substantive note: There's an interesting ethnic fractionalization index defined in the following and related literature:
    Montalvo, Josè and Marta Reynal-Querol (2005a) "Ethnic Polarization, Potential Conflict, and Civil Wars," American Economic Review, 95(3), 769-813.

    The well-known index you chose measures ethnic diversity, but if you want something that represents conflict potential ("fractionalization"), the index in that literature might be more relevant, depending on the context of your study.



    Comment


    • #3
      This one? Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	9.8 KB
ID:	1428257I think i would like to include both, so if the RQ is indeed the index you are speaking of (Which i'm pretty sure it is) then i can use it, using the method you gave to me for the original FRACT formula.

      Thank you

      Comment


      • #4
        After trying to calculate the RQ i did this

        bysort NAMES_STD year: egen index = total(size^2)
        replace index = 4*index*(1-size)

        However since i'm using size again it is no longer sorted by the two variables, and replace does not allow by option.
        I cannot do the upper RQ one as when using gen or egen i cannot use. I tried writing it as

        bysort NAMES_STD year: egen index = total((½-size)/½)^2 but that didn't work

        Comment


        • #5
          See also (e.g.) entropyetc from SSC. https://www.statalist.org/forums/for...lable-from-ssc

          Given your helpful data example, here is some code:

          Code:
          . entropyetc size, by(statename from)  
          
          -------------------------------------------------------------------------------------
                             Group |  Shannon H      exp(H)     Simpson   1/Simpson     dissim.
          -------------------------+-----------------------------------------------------------
                       Canada 1946 |      1.099       3.000       0.333       3.000       0.932
                       Canada 1982 |      1.099       3.000       0.333       3.000       0.932
                   Costa Rica 1946 |      1.099       3.000       0.333       3.000       0.932
                   Costa Rica 1954 |      1.099       3.000       0.333       3.000       0.932
                   Costa Rica 1992 |      1.099       3.000       0.333       3.000       0.932
                         Cuba 1946 |      0.693       2.000       0.500       2.000       0.955
                         Cuba 1960 |      0.693       2.000       0.500       2.000       0.955
           Dominican Republic 1946 |      0.000       1.000       1.000       1.000       0.977
                  El Salvador 1946 |      0.693       2.000       0.500       2.000       0.955
                  El Salvador 1995 |      0.693       2.000       0.500       2.000       0.955
                    Guatemala 1946 |      1.386       4.000       0.250       4.000       0.909
                    Guatemala 1955 |      1.386       4.000       0.250       4.000       0.909
                    Guatemala 1996 |      1.386       4.000       0.250       4.000       0.909
                        Haiti 1946 |      0.000       1.000       1.000       1.000       0.977
                     Honduras 1946 |      1.099       3.000       0.333       3.000       0.932
                     Honduras 1975 |      1.099       3.000       0.333       3.000       0.932
                     Honduras 1990 |      1.099       3.000       0.333       3.000       0.932
                      Jamaica 1962 |      0.000       1.000       1.000       1.000       0.977
                       Mexico 1946 |      1.386       4.000       0.250       4.000       0.909
                       Mexico 1975 |      1.386       4.000       0.250       4.000       0.909
                       Mexico 1998 |      1.386       4.000       0.250       4.000       0.909
                    Nicaragua 1946 |      1.386       4.000       0.250       4.000       0.909
                    Nicaragua 1988 |      1.386       4.000       0.250       4.000       0.909
                       Panama 1946 |      1.099       3.000       0.333       3.000       0.932
          Trinidad and Tobago 1962 |      0.693       2.000       0.500       2.000       0.955
          Trinidad and Tobago 1987 |      0.693       2.000       0.500       2.000       0.955
          Trinidad and Tobago 1992 |      0.693       2.000       0.500       2.000       0.955
          Trinidad and Tobago 1996 |      0.693       2.000       0.500       2.000       0.955
          Trinidad and Tobago 2003 |      0.693       2.000       0.500       2.000       0.955
          Trinidad and Tobago 2011 |      0.693       2.000       0.500       2.000       0.955
                United States 1946 |      1.792       6.000       0.167       6.000       0.864
                United States 1966 |      1.792       6.000       0.167       6.000       0.864
                United States 2009 |      1.792       6.000       0.167       6.000       0.864
          -------------------------------------------------------------------------------------
          
          . entropyetc size, by(statename from) gen(3=wanted)
          
          <output suppressed> 
          
          
          . replace wanted = 1 - wanted

          Comment


          • #6
            Yes, I was thinking that the original poster might find the so-called RQ index useful. See -divcat- at SSC for a package to compute it.

            Comment


            • #7
              Mike, i used divcat and made it work, however i found the results a little bit puzzling. As far as i am aware, RQ index reaches a maximum value if there are two groups of equal size (0.5 and 0.5). However, when i used this command

              bysort NAMES_STD from: divcat size, gen_rq(ethnicpol)

              i get this output

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str27 country_name int(from to) float(size ethnicpol)
              "Afghanistan" 1946 1978 .015 .3472222
              "Afghanistan" 1946 1978 .019 .3472222
              "Afghanistan" 1946 1978  .11 .3472222
              "Afghanistan" 1946 1978 .001 .3472222
              "Afghanistan" 1946 1978  .41 .3472222
              "Afghanistan" 1946 1978 .002 .3472222
              "Afghanistan" 1946 1978  .02 .3472222
              "Afghanistan" 1946 1978 .005 .3472222
              "Afghanistan" 1946 1978 .007 .3472222
              "Afghanistan" 1946 1978 .007 .3472222
              "Afghanistan" 1946 1978  .08 .3472222
              "Afghanistan" 1946 1978  .25 .3472222
              "Afghanistan" 1979 1992  .02 .3472222
              "Afghanistan" 1979 1992  .25 .3472222
              "Afghanistan" 1979 1992 .007 .3472222
              "Afghanistan" 1979 1992 .015 .3472222
              "Afghanistan" 1979 1992 .001 .3472222
              "Afghanistan" 1979 1992 .005 .3472222
              "Afghanistan" 1979 1992 .007 .3472222
              "Afghanistan" 1979 1992  .41 .3472222
              "Afghanistan" 1979 1992  .08 .3472222
              "Afghanistan" 1979 1992  .11 .3472222
              "Afghanistan" 1979 1992 .019 .3472222
              "Afghanistan" 1979 1992 .002 .3472222
              "Afghanistan" 1993 1996 .002 .3472222
              "Afghanistan" 1993 1996 .033 .3472222
              "Afghanistan" 1993 1996  .41 .3472222
              "Afghanistan" 1993 1996 .019 .3472222
              "Afghanistan" 1993 1996 .007 .3472222
              "Afghanistan" 1993 1996 .005 .3472222
              "Afghanistan" 1993 1996  .01 .3472222
              "Afghanistan" 1993 1996  .11 .3472222
              "Afghanistan" 1993 1996 .001 .3472222
              "Afghanistan" 1993 1996  .25 .3472222
              "Afghanistan" 1993 1996 .113 .3472222
              "Afghanistan" 1993 1996 .007 .3472222
              "Afghanistan" 1997 2001  .41 .3472222
              "Afghanistan" 1997 2001 .002 .3472222
              "Afghanistan" 1997 2001 .001 .3472222
              "Afghanistan" 1997 2001  .25 .3472222
              "Afghanistan" 1997 2001 .007 .3472222
              "Afghanistan" 1997 2001 .015 .3472222
              "Afghanistan" 1997 2001 .005 .3472222
              "Afghanistan" 1997 2001 .007 .3472222
              "Afghanistan" 1997 2001  .02 .3472222
              "Afghanistan" 1997 2001  .08 .3472222
              "Afghanistan" 1997 2001  .11 .3472222
              "Afghanistan" 1997 2001 .019 .3472222
              "Afghanistan" 2002 2005 .019 .3472222
              "Afghanistan" 2002 2005 .007 .3472222
              "Afghanistan" 2002 2005 .015 .3472222
              "Afghanistan" 2002 2005 .005 .3472222
              "Afghanistan" 2002 2005  .08 .3472222
              "Afghanistan" 2002 2005 .002 .3472222
              "Afghanistan" 2002 2005  .11 .3472222
              "Afghanistan" 2002 2005 .007 .3472222
              "Afghanistan" 2002 2005  .25 .3472222
              "Afghanistan" 2002 2005 .001 .3472222
              "Afghanistan" 2002 2005  .02 .3472222
              "Afghanistan" 2002 2005  .41 .3472222
              "Afghanistan" 2006 2013  .11 .3472222
              "Afghanistan" 2006 2013  .02 .3472222
              "Afghanistan" 2006 2013 .005 .3472222
              "Afghanistan" 2006 2013 .002 .3472222
              "Afghanistan" 2006 2013  .25 .3472222
              "Afghanistan" 2006 2013  .08 .3472222
              "Afghanistan" 2006 2013 .015 .3472222
              "Afghanistan" 2006 2013  .41 .3472222
              "Afghanistan" 2006 2013 .001 .3472222
              "Afghanistan" 2006 2013 .019 .3472222
              "Afghanistan" 2006 2013 .007 .3472222
              "Afghanistan" 2006 2013 .007 .3472222
              "Albania"     1946 1989  .95 .8888889
              "Albania"     1946 1989  .01 .8888889
              "Albania"     1946 1989  .03 .8888889
              "Albania"     1990 2013  .82 .8888889
              "Albania"     1990 2013  .08 .8888889
              "Albania"     1990 2013  .02 .8888889
              "Algeria"     1962 2013  .28        1
              "Algeria"     1962 2013  .72        1
              "Angola"      1975 2002  .38      .64
              "Angola"      1975 2002  .02      .64
              "Angola"      1975 2002  .13      .64
              "Angola"      1975 2002  .09      .64
              "Angola"      1975 2002  .26      .64
              "Angola"      2003 2005  .26      .64
              "Angola"      2003 2005  .02      .64
              "Angola"      2003 2005  .09      .64
              "Angola"      2003 2005  .38      .64
              "Angola"      2003 2005  .13      .64
              "Angola"      2006 2013  .02      .64
              "Angola"      2006 2013  .09      .64
              "Angola"      2006 2013  .13      .64
              "Angola"      2006 2013  .26      .64
              "Angola"      2006 2013  .38      .64
              "Argentina"   1946 1975 .015        1
              "Argentina"   1946 1975  .97        1
              "Argentina"   1976 1983 .015        1
              "Argentina"   1976 1983  .97        1
              "Argentina"   1984 2013  .97        1
              end
              And as far as i can see, it is on its head. The higher the value, the lower the polarization, as you can see if you look at Argentina, which have a value of 1, but one group size .97 and another of size .015. Just making sure i haven't messed up the code, and if i should just interprent it as i see it, that lower values are equal to higher ethnic conflict potential.

              Comment

              Working...
              X