Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating "leave-out-mean" variable from DHS data

    Hi,

    I am working with DHS data for countries in South Asia and I want to generate a variable for the leave-out-mean of "age at first marriage" within a PSU. I can easily get the mean age of marriage for every PSU by running the cmd: egen mean_age=mean(v511), by(v001) v511: Age at first marriage, v001: PSU number

    However, I want the mean to not include the data for the corresponding observation.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long v001 byte v511
    1 16
    1 18
    1 20
    1 19
    1 17
    1 25
    2 20
    2 14
    2 17
    2 14
    2 20
    2 13
    3 22
    3 34
    3 20
    3 18
    3 24
    3 13
    3 16
    3 15
    3 19
    3 16
    4 17
    4 20
    4 22
    4 23
    4 29
    4 23
    5 23
    5 21
    5 23
    5 27
    5 26
    end

  • #2
    Check whether the following is doing the trick

    Code:
    . egen total = total(v511), by(v001)
    
    . egen count = count(v511), by(v001)
    
    . gen meanage = (total - v511)/(count - 1)
    
    . list, sepby(v001)
    
         +----------------------------------------+
         | v001   v511   total   count    meanage |
         |----------------------------------------|
      1. |    1     16     115       6       19.8 |
      2. |    1     18     115       6       19.4 |
      3. |    1     20     115       6         19 |
      4. |    1     19     115       6       19.2 |
      5. |    1     17     115       6       19.6 |
      6. |    1     25     115       6         18 |
         |----------------------------------------|
      7. |    2     20      98       6       15.6 |
      8. |    2     14      98       6       16.8 |
      9. |    2     17      98       6       16.2 |
     10. |    2     14      98       6       16.8 |
     11. |    2     20      98       6       15.6 |
     12. |    2     13      98       6         17 |
         |----------------------------------------|
     13. |    3     22     197      10   19.44444 |
     14. |    3     34     197      10   18.11111 |
     15. |    3     20     197      10   19.66667 |
     16. |    3     18     197      10   19.88889 |
     17. |    3     24     197      10   19.22222 |
     18. |    3     13     197      10   20.44444 |
     19. |    3     16     197      10   20.11111 |
     20. |    3     15     197      10   20.22222 |
     21. |    3     19     197      10   19.77778 |
     22. |    3     16     197      10   20.11111 |
         |----------------------------------------|
     23. |    4     17     134       6       23.4 |
     24. |    4     20     134       6       22.8 |
     25. |    4     22     134       6       22.4 |
     26. |    4     23     134       6       22.2 |
     27. |    4     29     134       6         21 |
     28. |    4     23     134       6       22.2 |
         |----------------------------------------|
     29. |    5     23     120       5      24.25 |
     30. |    5     21     120       5      24.75 |
     31. |    5     23     120       5      24.25 |
     32. |    5     27     120       5      23.25 |
     33. |    5     26     120       5       23.5 |
         +----------------------------------------+
    
    .

    Comment


    • #3
      See also rangestat from SSC, which includes a hook for this kind of calculation.

      Comment


      • #4
        Thank you both! I managed to get it done by the code shared by Joro.

        Comment


        • #5
          Joro's code is fine, so long as you have no missing values on your age variable.

          Comment


          • #6
            FYI: rangestat defaults to calculating means, so this is all you need. There aren't missing values in the data example, but rangestat would ignore them.

            Code:
            . rangestat v511, int(v001 0 0) excludeself 
            
            . list if v001 == 1, sep(0)
            
                 +------------------------+
                 | v001   v511   v511_m~n |
                 |------------------------|
              1. |    1     16       19.8 |
              2. |    1     18       19.4 |
              3. |    1     20         19 |
              4. |    1     19       19.2 |
              5. |    1     17       19.6 |
              6. |    1     25         18 |
                 +------------------------+

            Comment


            • #7
              Hey Nick, thank you for the further explanation.

              Apparently, I did not understand the definition of the actual variable I required. While I do need the lom age of marriage by PSU as stated above, more importantly I also need the PSU leave out mean (share) of Child Marriage at a particular age (ranging from 12-17) e.g PSU leave out mean (share) of CM at 14.

              Here is an explanation by the author of the paper
              "The instruments are the leave-out-mean contemporaneous and past incidence of child marriage in the primary sampling unit where a girl lives, as measured through the share of girls marrying at ages 12 through 17"
              So, how do I go about calculating the mean for each age of early marriage by PSU?

              Comment


              • #8
                The mean for each age? I don't follow that.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  The mean for each age? I don't follow that.
                  Sorry, I didn't phrase it correctly. Basically the leave out mean of the incidences of marriage at age 12,13,...17 etc by PSU.

                  Comment


                  • #10
                    Another excerpt from the paper regarding these variables
                    "the leave-out-mean rates at the level of the primary sampling unit in which the girls lives of early marriage at various ages"

                    Comment


                    • #11
                      Presumably some people never marry, so missings need to be considered. If you create


                      Code:
                      gen young = inrange(v511, 12, 17) if v511 < .
                      then it seems that you want the mean of that variable (which has values 0, 1, missing); so it's then the same technique as in #6.

                      Comment


                      • #12
                        Originally posted by Nick Cox View Post
                        Presumably some people never marry, so missings need to be considered. If you create


                        Code:
                        gen young = inrange(v511, 12, 17) if v511 < .
                        then it seems that you want the mean of that variable (which has values 0, 1, missing); so it's then the same technique as in #6.
                        I want to create leave-out average at the district level, with a few conditions as;

                        1. Suppose there are 50 districts
                        2. Each district has two strata (rural, urban)
                        2. (a). 5 districts have only 'Urban' areas


                        3. The variable for which I want to calculate 'leave-out average' is binary in nature i.e., 1=sucees, 0=others

                        4. Finally, I want to calculate the district-levelleave-out average for the rural area of each district, and then replace the urban area with the rural leave-out average of that district.

                        Best regards,
                        Mukesh

                        Comment

                        Working...
                        X