Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating dissimilarity Index

    Hello statalist,

    I am trying to calculate ethnic diversity of the labour force in a particular industry/occupation. I need to use a dissimilarity index.

    My industry/occupation variable is IN_OC and my ethnicity variable is ETHNIC.

    I tried using the “Duncan & Duncan dissimilarity index” as it permits weights.

    This command gives the error “too many values” because of (I'm assuming) the many industry/occupation pairs.
    Code:
    duncan IN_OC ETHNIC [w=weight]

    I am also aware of Nick Cox “ineq” command from SSC.

    Code:
    ineq ETHNIC, by( IN_OC)
    It gives me the same error “too many values” and I don’t think I can incorporate weights. Is there anyway I can calculate the dissimilarity index with my data?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long ID int weight float IN_OC byte ETHNIC
     101  718 41531  1
     202  501 66353  1
     205  929 52313  1
     401  739 86211  1
     501  730 70113  1
     502  635 86614  1
     601  806 71523  1
     602  677 55624  1
     603  841 85231  1
     801  673 47415  1
     802  632 47813  1
     901  952 49821 11
    1103 1321 41927  1
    1201  461 10913  3
    1202  449 10811  3
    1301  640 43532  1
    1302  562 43353  1
    1303  717 84323  1
    1401  708 93344  1
    1402  620 93344  1
    1601  692 30354  1
    1602  720 46112  1
    1701  694 86221  1
    1702  655 86221  2
    1801  540 85231  1
    1802  708 47711  1
    1902  695 13813  1
    2001  482 87614  1
    2002  540 41531  1
    2003  851 47711  1
    2004  527 45523  1
    2101  513 84221  1
    2102  672 33812  1
    2103  527 55927  1
    2402  484 25246  1
    2403  532 47711  1
    2504 1062 41531  1
    2601  579 86222  1
    2602  683 35242  1
    2802  471 24522  1
    2901  757 28246  1
    3202  533 84411  1
    3302  630 55624  3
    3402  650 64242  1
    3501  759 68354  1
    3601  760 85313  1
    3602  818 58247  1
    3701 1032 29313  3
    3801  557 84214  1
    3802  579 85612  1
    3901  826 90342  1
    3902  837 71212  1
    3904 1274 86612  6
    3905 1389 30541  6
    4001  844 86323 10
    4301  581 86321  1
    4302  834 85612  1
    4402  730 85231  3
    4501  848 96622  1
    4502  904 47354  1
    4602  698 85231  1
    4603  982 33524  1
    4604 1123 52311  1
    4702 1309 69353  1
    4703 1136 69353  1
    4801  746 85543  3
    4802  794 49821  1
    4901  761 85231  1
    4902  631 85231  1
    5001  611 68354  1
    5102  753 71212  1
    5301  609 62113  1
    5401  833 47119  1
    5402 1347 62213  1
    5601  603 85924  1
    5602  634 52712  1
    5701  641 84331  1
    5702  674 24125  1
    5801  860 85213  1
    5904 1129 68323  1
    6001  494 85231  1
    6002  683 71112  1
    6101  494 86223  1
    6102  693 85231  1
    6501  590 59341  1
    6502  632 59341  1
    6701  485 42411  1
    6702  545 25354  1
    6703  631 47711  1
    6801  751 55814  3
    6802  716 87321  3
    6803  915 93122  1
    6804  926 85231 10
    7003 1159 41912  1
    7103  488 86927  1
    7201  672 46113  1
    7203  906 93344  1
    7301  719 49821  1
    7303  959  1613  1
    7502  562 22113  1
    end
    label values weight PWT18
    label values ETHNIC ETHGBEUL
    label def ETHGBEUL 1 "White British", modify
    label def ETHGBEUL 2 "White Irish", modify
    label def ETHGBEUL 3 "Other White", modify
    label def ETHGBEUL 6 "Pakistani", modify
    label def ETHGBEUL 10 "Black/African/Caribbean/Black British", modify
    label def ETHGBEUL 11 "Other ethnic group", modify

  • #2
    Hello Daria,

    Excuse me for bothering you, i don't know how to submit my problem can you help me to submit it?


    I want to calculate the allocative efficiency, the command in stata is as follows

    dea_allocative lcapital1 Lcapital4 lcoùt_travail= LVA if lcapital1!=. & Lcapital4!=. & lcoùt_travail!=. & LVA!=., model(cost) numlist (3) unitvars(Lcapital4 lcoùt_travail lcapital1) rts(crs) saving(base_allo)

    I execute the command but stata does not recognize the command

    Comment


    • #3
      #2 Yvette Djoha Please open a new thread.

      #1 duncan is from SSC, as you are asked to explain (FAQ Advice #12: you are asked to tell us where community-contributed commands you refer to come from). I have never used it.

      ineq is also from SSC. I know more about it, but I don't see how to relate it to anything in your post.

      I don't understand what you want to do. You could look at occupations pairwise and compare them in terms of their ethnicity distributions, or you could look at ethnicities pairwise and compare them in terms of their occupation mix. Those questions aren't the same question. I see in your data example alone 76 distinct occupation codes and I can't easily imagine that you want a matrix with thousands of distinct entries.

      In any case what are the weights?

      With a clear explanation, the computation will be fairly easy for any experienced Stata user. I only wrote ineq (and dissim (SSC) and entropyetc (SSC)) because I appreciate that you need some experience before that is true, and because I wanted to make calculation very easy for myself.

      Comment


      • #4
        Thank you for the response Nick. I did not explain it very well. I am trying to calculate how ethnically diverse a particular industry/occupation pair is (I have many industry/occupation pairs as you pointed out).

        From what I understand, I need to calculate a dissimilarity index (i.e. one that ranges from 0-1 to indicate higher/lower ethnic diversity in each industry/occupation pair).

        The weight is a population weight.

        Comment


        • #5
          So, and again if I understand you correctly, you seek a half-matrix with thousands of distinct entries, each a dissimilarity index for a pair of occupations. (76 x 75 / 2 = 2850 for your data example alone). The help for duncan explains you can't do that above a certain limit. What's the strategy here? What are you going to do with the matrix? Do you want the results in a new dataset? I'd suspect that the detailed output here would be overwhelming without a plan, or even with it.

          Comment


          • #6
            I was going to save the stored matrix as a separate dataset so that for each industry-occupation pair, I have in the new dataset a dissimilarity index.

            I don't think i can manage this with "duncan" but with "duncan2" (also from SSC) there are no limitations on number of categories, but the group variable (in my case ethnicity) needs to be dichotomous.

            So I am not sure where to go from here

            Comment

            Working...
            X