Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duncan Segregation Index with Aggregate Data

    I'd like to calculate the duncan index of segregation, for occupational gender segregation. In addition I'd like to do this by year and country of my observations, which are aggregated (shown in the data example below). I don't think the duncan command works because the data is aggregated, and the dicseg command doesn't work as it doesn't take a by option (needed to group by country and year).

    HTML Code:
    . dataex country year occupation sex employment if (year==1995 | year==1996) & (occupation=="Managers"|occupation=="Service and sales workers")
    
    ----------------------- copy starting from the next line -----------------------
    [CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str14 country int year str104 occupation float sex double employment
    "Australia"      1995 "Managers"                  0   565.164
    "Australia"      1995 "Managers"                  1   278.202
    "Australia"      1995 "Service and sales workers" 0   372.313
    "Australia"      1995 "Service and sales workers" 1   863.474
    "Australia"      1996 "Managers"                  0    569.11
    "Australia"      1996 "Managers"                  1   267.965
    "Australia"      1996 "Service and sales workers" 0   371.909
    "Australia"      1996 "Service and sales workers" 1   915.227
    "Canada"         1995 "Managers"                  0   982.308
    "Canada"         1995 "Managers"                  1   512.696
    "Canada"         1995 "Service and sales workers" 0   660.893
    "Canada"         1995 "Service and sales workers" 1  1260.283
    "Canada"         1996 "Managers"                  0   973.775
    "Canada"         1996 "Managers"                  1   546.179
    "Canada"         1996 "Service and sales workers" 0   688.731
    "Canada"         1996 "Service and sales workers" 1  1300.599

  • #2
    I think that's what Otis Dudley Duncan and Beverly Duncan also called the dissimilarity index but the only safe way to focus discussion is to give a reference or an explicit definition.

    dicseg will be community-contributed, I guess, so by FAQ Advice #12 you are asked to tell us where it comes from.

    In your data example I see countries, years, just two sectors and two genders. Correct?
    Last edited by Nick Cox; 12 Nov 2019, 11:27.

    Comment


    • #3
      As I think I know what you want, I went ahead any way. Searching for an existing command seemed futile, as the calculation is a few lines. Nothing in my code assumes just two occupations.


      Code:
      . bysort country year sex : egen work = total(employment) 
      
      . replace work = employment / work 
      (16 real changes made)
      
      . bysort country year occupation : gen absdiff = cond(_n == 1, abs(work[1] - work[2]), 0) 
      
      . 
      . by country year: egen Duncan = total(absdiff) 
      
      . 
      . list , sepby(country year) 
      
           +------------------------------------------------------------------------------------------------+
           |   country   year                  occupation   sex   employ~t       work    absdiff     Duncan |
           |------------------------------------------------------------------------------------------------|
        1. | Australia   1995                    Managers     1    278.202   .2436786   .3591778   .7183556 |
        2. | Australia   1995                    Managers     0    565.164   .6028564          0   .7183556 |
        3. | Australia   1995   Service and sales workers     0    372.313   .3971436   .3591778   .7183556 |
        4. | Australia   1995   Service and sales workers     1    863.474   .7563214          0   .7183556 |
           |------------------------------------------------------------------------------------------------|
        5. | Australia   1996                    Managers     1    267.965   .2264763   .3783042   .7566084 |
        6. | Australia   1996                    Managers     0     569.11   .6047806          0   .7566084 |
        7. | Australia   1996   Service and sales workers     0    371.909   .3952194   .3783042   .7566084 |
        8. | Australia   1996   Service and sales workers     1    915.227   .7735236          0   .7566084 |
           |------------------------------------------------------------------------------------------------|
        9. |    Canada   1995                    Managers     1    512.696   .2891721   .3086294   .6172588 |
       10. |    Canada   1995                    Managers     0    982.308   .5978014          0   .6172588 |
       11. |    Canada   1995   Service and sales workers     0    660.893   .4021985   .3086295   .6172588 |
       12. |    Canada   1995   Service and sales workers     1   1260.283   .7108279          0   .6172588 |
           |------------------------------------------------------------------------------------------------|
       13. |    Canada   1996                    Managers     1    546.179    .295747   .2899802   .5799605 |
       14. |    Canada   1996                    Managers     0    973.775   .5857272          0   .5799605 |
       15. |    Canada   1996   Service and sales workers     0    688.731   .4142728   .2899802   .5799605 |
       16. |    Canada   1996   Service and sales workers     1   1300.599    .704253          0   .5799605 |
           +------------------------------------------------------------------------------------------------+
      
      . 
      . tabdisp country year, c(Duncan) 
      
      ------------------------------
                |        year       
        country |     1995      1996
      ----------+-------------------
      Australia | .7183556  .7566084
         Canada | .6172588  .5799605
      ------------------------------

      Comment


      • #4
        That's exactly what I was looking for, thanks.

        Comment


        • #5
          P.S small point but I think the line
          Code:
          egen Duncan = total(absdiff)
          Should be:
          Code:
          egen Duncan = total(absdiff/2)

          Comment


          • #6
            You're correct. Sorry about that. The usual convention seems to be reporting in [0,1] and what I gave counts double. The test is that if all Xs are As and all Ys are Bs, that is maximum segregation:

            Code:
            clear
            input str14 country int year str1 occupation float sex double employment
            "Freedonia" 2019 "X" 1 1
            "Freedonia" 2019 "X" 2 0
            "Freedonia" 2019 "Y" 1 0
            "Freedonia" 2019 "Y" 2 1
            end
            
            bysort country year sex : egen work = total(employment)
            replace work = employment / work
            bysort country year occupation : gen absdiff = cond(_n == 1, abs(work[1] - work[2]), 0)
            by country year: egen Duncan = total(absdiff/2)
            
            list
            
                 +------------------------------------------------------------------------+
                 |   country   year   occupa~n   sex   employ~t   work   absdiff   Duncan |
                 |------------------------------------------------------------------------|
              1. | Freedonia   2019          X     2          0      0         1        1 |
              2. | Freedonia   2019          X     1          1      1         0        1 |
              3. | Freedonia   2019          Y     1          0      0         1        1 |
              4. | Freedonia   2019          Y     2          1      1         0        1 |
                 +------------------------------------------------------------------------+
            
            tabdisp country year, c(Duncan) format(%4.3f)
            
            -----------------
                      | year
              country |  2019
            ----------+------
            Freedonia | 1.000
            -----------------

            Comment


            • #7
              I know this is an old post but I am attempting to replicate this analysis with a similar dataset. Does anyone know what "employment" in the example above is measuring? I have occupation, gender, year and country variable.

              Comment


              • #8
                The employment variable is the number of people. As the thread title implies, the question is about aggregate data.

                Comment

                Working...
                X