Duncan Segregation Index with Aggregate Data

David Zentler-Munro

Join Date: Apr 2018
Posts: 6

Duncan Segregation Index with Aggregate Data

12 Nov 2019, 10:05

I'd like to calculate the duncan index of segregation, for occupational gender segregation. In addition I'd like to do this by year and country of my observations, which are aggregated (shown in the data example below). I don't think the duncan command works because the data is aggregated, and the dicseg command doesn't work as it doesn't take a by option (needed to group by country and year).

HTML Code:

. dataex country year occupation sex employment if (year==1995 | year==1996) & (occupation=="Managers"|occupation=="Service and sales workers")

----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input str14 country int year str104 occupation float sex double employment
"Australia"      1995 "Managers"                  0   565.164
"Australia"      1995 "Managers"                  1   278.202
"Australia"      1995 "Service and sales workers" 0   372.313
"Australia"      1995 "Service and sales workers" 1   863.474
"Australia"      1996 "Managers"                  0    569.11
"Australia"      1996 "Managers"                  1   267.965
"Australia"      1996 "Service and sales workers" 0   371.909
"Australia"      1996 "Service and sales workers" 1   915.227
"Canada"         1995 "Managers"                  0   982.308
"Canada"         1995 "Managers"                  1   512.696
"Canada"         1995 "Service and sales workers" 0   660.893
"Canada"         1995 "Service and sales workers" 1  1260.283
"Canada"         1996 "Managers"                  0   973.775
"Canada"         1996 "Managers"                  1   546.179
"Canada"         1996 "Service and sales workers" 0   688.731
"Canada"         1996 "Service and sales workers" 1  1300.599

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35694
#2

12 Nov 2019, 10:48

I think that's what Otis Dudley Duncan and Beverly Duncan also called the dissimilarity index but the only safe way to focus discussion is to give a reference or an explicit definition.

dicseg will be community-contributed, I guess, so by FAQ Advice #12 you are asked to tell us where it comes from.

In your data example I see countries, years, just two sectors and two genders. Correct?

Last edited by Nick Cox; 12 Nov 2019, 11:27.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35694

12 Nov 2019, 11:37

As I think I know what you want, I went ahead any way. Searching for an existing command seemed futile, as the calculation is a few lines. Nothing in my code assumes just two occupations.

Code:

. bysort country year sex : egen work = total(employment) 

. replace work = employment / work 
(16 real changes made)

. bysort country year occupation : gen absdiff = cond(_n == 1, abs(work[1] - work[2]), 0) 

. 
. by country year: egen Duncan = total(absdiff) 

. 
. list , sepby(country year) 

     +------------------------------------------------------------------------------------------------+
     |   country   year                  occupation   sex   employ~t       work    absdiff     Duncan |
     |------------------------------------------------------------------------------------------------|
  1. | Australia   1995                    Managers     1    278.202   .2436786   .3591778   .7183556 |
  2. | Australia   1995                    Managers     0    565.164   .6028564          0   .7183556 |
  3. | Australia   1995   Service and sales workers     0    372.313   .3971436   .3591778   .7183556 |
  4. | Australia   1995   Service and sales workers     1    863.474   .7563214          0   .7183556 |
     |------------------------------------------------------------------------------------------------|
  5. | Australia   1996                    Managers     1    267.965   .2264763   .3783042   .7566084 |
  6. | Australia   1996                    Managers     0     569.11   .6047806          0   .7566084 |
  7. | Australia   1996   Service and sales workers     0    371.909   .3952194   .3783042   .7566084 |
  8. | Australia   1996   Service and sales workers     1    915.227   .7735236          0   .7566084 |
     |------------------------------------------------------------------------------------------------|
  9. |    Canada   1995                    Managers     1    512.696   .2891721   .3086294   .6172588 |
 10. |    Canada   1995                    Managers     0    982.308   .5978014          0   .6172588 |
 11. |    Canada   1995   Service and sales workers     0    660.893   .4021985   .3086295   .6172588 |
 12. |    Canada   1995   Service and sales workers     1   1260.283   .7108279          0   .6172588 |
     |------------------------------------------------------------------------------------------------|
 13. |    Canada   1996                    Managers     1    546.179    .295747   .2899802   .5799605 |
 14. |    Canada   1996                    Managers     0    973.775   .5857272          0   .5799605 |
 15. |    Canada   1996   Service and sales workers     0    688.731   .4142728   .2899802   .5799605 |
 16. |    Canada   1996   Service and sales workers     1   1300.599    .704253          0   .5799605 |
     +------------------------------------------------------------------------------------------------+

. 
. tabdisp country year, c(Duncan) 

------------------------------
          |        year       
  country |     1995      1996
----------+-------------------
Australia | .7183556  .7566084
   Canada | .6172588  .5799605
------------------------------

Comment

David Zentler-Munro

Join Date: Apr 2018

Posts: 6
#4

12 Nov 2019, 13:30

That's exactly what I was looking for, thanks.
Comment
David Zentler-Munro

Join Date: Apr 2018

Posts: 6
#5

13 Nov 2019, 03:01

P.S small point but I think the line

Code:

egen Duncan = total(absdiff)

Should be:

Code:

egen Duncan = total(absdiff/2)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35694

13 Nov 2019, 05:47

You're correct. Sorry about that. The usual convention seems to be reporting in [0,1] and what I gave counts double. The test is that if all Xs are As and all Ys are Bs, that is maximum segregation:

Code:

clear
input str14 country int year str1 occupation float sex double employment
"Freedonia" 2019 "X" 1 1
"Freedonia" 2019 "X" 2 0
"Freedonia" 2019 "Y" 1 0
"Freedonia" 2019 "Y" 2 1
end

bysort country year sex : egen work = total(employment)
replace work = employment / work
bysort country year occupation : gen absdiff = cond(_n == 1, abs(work[1] - work[2]), 0)
by country year: egen Duncan = total(absdiff/2)

list

     +------------------------------------------------------------------------+
     |   country   year   occupa~n   sex   employ~t   work   absdiff   Duncan |
     |------------------------------------------------------------------------|
  1. | Freedonia   2019          X     2          0      0         1        1 |
  2. | Freedonia   2019          X     1          1      1         0        1 |
  3. | Freedonia   2019          Y     1          0      0         1        1 |
  4. | Freedonia   2019          Y     2          1      1         0        1 |
     +------------------------------------------------------------------------+

tabdisp country year, c(Duncan) format(%4.3f)

-----------------
          | year
  country |  2019
----------+------
Freedonia | 1.000
-----------------

Comment

Sherine Maui

Join Date: Apr 2018

Posts: 90
#7

24 Jul 2021, 07:04

I know this is an old post but I am attempting to replicate this analysis with a similar dataset. Does anyone know what "employment" in the example above is measuring? I have occupation, gender, year and country variable.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#8

24 Jul 2021, 10:37

The employment variable is the number of people. As the thread title implies, the question is about aggregate data.
1 like
Comment

Announcement

Duncan Segregation Index with Aggregate Data

Comment

Comment

Comment

Comment

Comment

Comment

Comment