Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating proportions of several categories

    Hello dear all,
    I am facing a serious problem to use the command proportion. I have a series of neighborhoods (psu) , within milieu which is also within a province. I am trying to get the proportion of people of the same tribe in each neighborhood in vain. In my dataset tribe is coded as 3digits figure. Whenever I try
    by prov milieu psu, sort: prop tribe
    Stata refuses to apply prop and responds that proportion does not work with "b". When I change and
    proportion (tribe), over (psu) Stata says I have too much options while the same command works with a variable like "sex" with only two options instead of "psi" which corresponds to 660 modalities. I have tried to write loops for that but still I cannot manage to get the right one. I will much appreciate you help.
    Thanks in advance

  • #2
    Well, if -proportion- could be combined with -by-, think what you would get: you would get 661 outputs of proportion. If you have, say, a total of 5 tribes, that's over 3300 individual proportions listed out in your Results window and log file. It would be nearly impossible to make any use of that data in that form.

    You didn't provide example data, so I'm emulating a tiny toy version of your data set by changing the names of some variables in the auto.dta that comes with your Stata data set. This only has 2 "psu"s and 5 "tribe"s but it will work no matter how many of these things you have:

    Code:
    sysuse auto, clear
    rename rep78 tribe
    rename foreign psu
    
    keep psu tribe
    
    levelsof tribe, local(tribes)
    foreach t of local tribes {
       by psu, sort: egen tribe`t'_prop = mean(`t'.tribe)
    }
    The code assumes that tribe is a numeric variable. If it is a string, then -encode- it to create a numeric equivalent and use that instead.

    At the end of this code, your data set will contain new variables, one for each tribe. The value of a tribe's variable in any observation will be that tribe's proportion in that observation's psu. Evidently this information is very repetitious, and if what you ultimately want is one observation per psu containing the psu identifier and the tribe proportions, use the -collapse- command at the end.

    Comment


    • #3
      Many many thanks Clyde. I do appreciate your prompt reaction to my request.

      I am sorry, my request was not very clear. and the advice you have provided is really clear. However, it cannot yet allow me to have a unique variable containing the computed proportions of each tribe. In fact, my final idea is to have a variable which is such that in any psu, all the individuals from the same tribe having the same proportion (of fellows of their specific tribe). I have tried to mimic the structure of the data I have using the same coding for the tribe and psu as it is in the dataset.
      id prov milieu psu tribe
      1 lemba city 1 11
      2 lemba city 1 33
      3 lemba city 1 11
      4 lemba village 1 36
      5 lemba village 1 39
      6 lemba village 1 39
      7 lemba village 1 39
      8 lemba Town 2 11
      9 lemba Town 2 11
      10 lemba Town 3 11
      11 lemba city 4 11
      12 lemba city 4 36
      13 lemba city 4 36
      14 lemba Town 3 39
      15 lemba Town 3 39
      16 lemba Town 3 33
      17 lemba Town 3 38
      18 lemba Town 2 40
      19 Kazozo Town 6 68
      20 Kazozo Town 6 68
      21 Kazozo Town 6 68
      22 Kazozo Town 6 39
      23 Kazozo Town 6 38
      24 Kazozo village 6 39
      25 Kazozo village 7 11
      26 Kazozo village 7 11
      27 Kazozo village 7 11
      28 Kazozo village 7 11
      29 Kazozo village 7 11
      30 Kazozo village 7 36
      31 Kazozo village 7 36
      32 Kazozo city 8 36
      33 Kazozo city 8 36
      34 Kazozo city 8 36
      35 Kazozo city 8 39
      36 Kazozo city 8 39
      37 Kazozo city 8 39
      38 Kazozo city 8 38
      39 Kazozo village 9 38
      40 Kazozo village 9 11
      41 Kazozo village 10 68
      42 Kazozo village 10 68
      43 Kazozo Town 10 11
      With this the idea would thus be to have an additional column containing the proportion of members of each tribe corresponding to each member of the same tribe in the same PSU.
      Many thanks again

      Comment


      • #4
        id prov milieu psu tribe
        1 lemba city 1 11
        2 lemba city 1 33
        3 lemba city 1 11
        4 lemba village 1 36
        5 lemba village 1 39
        6 lemba village 1 39
        7 lemba village 1 39
        8 lemba Town 2 11
        9 lemba Town 2 11
        10 lemba Town 3 11
        11 lemba city 4 11
        12 lemba city 4 36
        13 lemba city 4 36
        14 lemba Town 3 39
        15 lemba Town 3 39
        16 lemba Town 3 33
        17 lemba Town 3 38
        18 lemba Town 2 40
        19 Kazozo Town 6 68
        20 Kazozo Town 6 68
        21 Kazozo Town 6 68
        22 Kazozo Town 6 39
        23 Kazozo Town 6 38
        24 Kazozo village 6 39
        25 Kazozo village 7 11
        26 Kazozo village 7 11
        27 Kazozo village 7 11
        28 Kazozo village 7 11
        29 Kazozo village 7 11
        30 Kazozo village 7 36
        31 Kazozo village 7 36
        32 Kazozo city 8 36
        33 Kazozo city 8 36
        34 Kazozo city 8 36
        35 Kazozo city 8 39
        36 Kazozo city 8 39
        37 Kazozo city 8 39
        38 Kazozo city 8 38
        39 Kazozo village 9 38
        40 Kazozo village 9 11
        41 Kazozo village 10 68
        42 Kazozo village 10 68
        43 Kazozo Town 10 11

        Comment


        • #5
          Thanks to consider the second table.

          Comment


          • #6
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte id str6 prov str7 milieu byte(psu tribe)
             1 "lemba"  "city"     1 11
             2 "lemba"  "city"     1 33
             3 "lemba"  "city"     1 11
             4 "lemba"  "village"  1 36
             5 "lemba"  "village"  1 39
             6 "lemba"  "village"  1 39
             7 "lemba"  "village"  1 39
             8 "lemba"  "Town"     2 11
             9 "lemba"  "Town"     2 11
            10 "lemba"  "Town"     3 11
            11 "lemba"  "city"     4 11
            12 "lemba"  "city"     4 36
            13 "lemba"  "city"     4 36
            14 "lemba"  "Town"     3 39
            15 "lemba"  "Town"     3 39
            16 "lemba"  "Town"     3 33
            17 "lemba"  "Town"     3 38
            18 "lemba"  "Town"     2 40
            19 "Kazozo" "Town"     6 68
            20 "Kazozo" "Town"     6 68
            21 "Kazozo" "Town"     6 68
            22 "Kazozo" "Town"     6 39
            23 "Kazozo" "Town"     6 38
            24 "Kazozo" "village"  6 39
            25 "Kazozo" "village"  7 11
            26 "Kazozo" "village"  7 11
            27 "Kazozo" "village"  7 11
            28 "Kazozo" "village"  7 11
            29 "Kazozo" "village"  7 11
            30 "Kazozo" "village"  7 36
            31 "Kazozo" "village"  7 36
            32 "Kazozo" "city"     8 36
            33 "Kazozo" "city"     8 36
            34 "Kazozo" "city"     8 36
            35 "Kazozo" "city"     8 39
            36 "Kazozo" "city"     8 39
            37 "Kazozo" "city"     8 39
            38 "Kazozo" "city"     8 38
            39 "Kazozo" "village"  9 38
            40 "Kazozo" "village"  9 11
            41 "Kazozo" "village" 10 68
            42 "Kazozo" "village" 10 68
            43 "Kazozo" "Town"    10 11
            end
            
            gen proportion = .
            levelsof tribe, local(tribes)
            foreach t of local tribes {
               by psu, sort: egen tribe`t'_prop = mean(`t'.tribe)
               replace proportion = tribe`t'_prop if tribe == `t'
            }
            drop tribe*_prop
            In the future, when showing data examples, please use the -dataex- command to do so, as I have in this example. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

            Comment


            • #7
              Dear Clyde,
              Many thanks for the guidance and the advice on the use of -dataex-. This is much appreciated. Many thanks.

              Comment


              • #8
                Dear all,

                I am sorry to come back with a request for help. I want to build a variable which, for each observation, will give me the number of migrants prior to the observed individuals.. I have the duration in the migrant destination place (durstay) in the last column of the date sa-et below. The idea would thus be to have for each observation the number of other individuals with a duration of stay greater than its own. I’ve tried several unsuccessful things.

                I will much appreciate your help.


                clear
                input byte id str6 prov str7 milieu byte(psu tribe durstay)
                1 "lemba" "city" 1 11 2
                2 "lemba" "city" 1 33 4
                3 "lemba" "city" 1 11 4
                4 "lemba" "village" 1 36 2
                5 "lemba" "village" 1 39 1
                6 "lemba" "village" 1 39 2
                7 "lemba" "village" 1 39 3
                8 "lemba" "Town" 2 11 5
                9 "lemba" "Town" 2 11 10
                10 "lemba" "Town" 3 11 11
                11 "lemba" "city" 4 11 1
                12 "lemba" "city" 4 36 4
                13 "lemba" "city" 4 36 4
                14 "lemba" "Town" 3 39 7
                15 "lemba" "Town" 3 39 8
                16 "lemba" "Town" 3 33 3
                17 "lemba" "Town" 3 38 1
                18 "lemba" "Town" 2 40 1
                19 "Kazozo" "Town" 6 68 2
                20 "Kazozo" "Town" 6 68 3
                21 "Kazozo" "Town" 6 68 4
                22 "Kazozo" "Town" 6 39 2
                23 "Kazozo" "Town" 6 38 4
                24 "Kazozo" "village" 6 39 5
                25 "Kazozo" "village" 7 11 2
                26 "Kazozo" "village" 7 11 2
                27 "Kazozo" "village" 7 11 3
                28 "Kazozo" "village" 7 11 2
                29 "Kazozo" "village" 7 11 3
                30 "Kazozo" "village" 7 36 1
                31 "Kazozo" "village" 7 36 5
                32 "Kazozo" "city" 8 36 2
                33 "Kazozo" "city" 8 36 6
                34 "Kazozo" "city" 8 36 8
                35 "Kazozo" "city" 8 39 7
                36 "Kazozo" "city" 8 39 3
                37 "Kazozo" "city" 8 39 4
                38 "Kazozo" "city" 8 38 9
                39 "Kazozo" "village" 9 38 1
                40 "Kazozo" "village" 9 11 2
                41 "Kazozo" "village" 10 68 3
                42 "Kazozo" "village" 10 68 2
                43 "Kazozo" "Town" 10 11 5
                end

                Comment


                • #9
                  Code:
                  rangestat (count) num_with_longer_stay = durstay, interval(durstay 1 .)
                  Note, you do not say you want to do this separately by prov, so the code above does a count over the entire data set. If you want it separately by prove, just add a -by()- option to the command.

                  -rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer, and is available from SSC.

                  This question is not really relevant to the topic that started this thread. In the future, when switching topic, please start a new thread. While it is tempting to think of these threads as a dialog between a questioner and an answerer, in fact, other people read a long, and still others come and search this Forum for previous answers to their questions. Their ability to do that relies upon the titles of the threads being correct. So please always choose a descriptive title for your thread, and start a new one if you want to go off topic.

                  Comment


                  • #10
                    Dear Clyde, Many thanks for all. I really appreciate your help.

                    Comment


                    • #11
                      &
                      Last edited by Kamala Kaghoma; 31 Dec 2019, 15:55.

                      Comment


                      • #12
                        Dear Clyde,
                        Sorry to come back to you on the same issue. I have the impression that rangestat does not work with more than one condition or does not allow some traditional other commands of STATA. In fact based on my earlier example I wanted to consider two conditions: associating , in each psu, to each individual a new observation that will be obtained from a variable capturing the number of people of the same tribe and with a longer stay than her/himself. I’ve tryed unsuccesful several things like:

                        Code:
                        rangestat (count) num_longer_stay = durstay, interval(durstay 1 .) & tribe== tribe[`i'] by(psu)
                        or

                        Code:
                        keep if num_longer_stay!=.
                        The first one does not work. While the second works as a Stata command it does not really provide what I want. Is there anyway of combining any other traditional command with rangestat?

                        Many thanks once more for your help.

                        Comment


                        • #13
                          -rangestat- does not allow the kind of syntax you proposed. BUT if I grasp what you want here, you can get it very simply with:

                          Code:
                          rangestat (count) num_longer_stay = durstay, interval(durstay 1 .) by(psu tribe)
                          should do what I think you are getting at in #12.

                          As for -keep if num_longer_stay != .- not doing what you want, what is it that you do want? I'm sure there is some way to get it if you make it clear.

                          Comment


                          • #14
                            .
                            Last edited by Kamala Kaghoma; 01 Jan 2020, 00:55.

                            Comment


                            • #15
                              Dear Clyde, Many thanks.
                              I tried
                              Code:
                               rangestat (count) num_longer_stay = durstay, interval(durstay 1 .) by(psu tribe)
                              several times and was misreading the results. I've just tried it again and see that it really gives the figures I need. Many thanks once more and Happy new year 2020.

                              Comment

                              Working...
                              X