Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating cluster average

    Hi
    I would like to create a variable that measures the cluster average of women’s working status. I am using a survey dataset which is divided into geographical units, so called “clusters” which are census enumerations areas/villages.

    Women’s employment status is captured using a binary variable that takes a value 0 if unemployed and 1 if employed (work)

    The cluster variable is a continuous variable (clusterno)

    I am trying to capture the average employment rate in the vicinity of each woman. I am a bit lost on where to start with generating this variable, can anyone advise?

    An example of my data


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str15 hhid int(psu strata) long clusterno double perweight int awfactt byte work
    "       11306  6"  1  1  11306 1.564386 145 0
    "       11306 34"  1  1  11306 1.564386 104 0
    "       11306 91"  1  1  11306 1.564386 101 0
    "       11306119"  1  1  11306 1.564386 104 0
    "       12604 13"  1  1  12604 1.564386 130 0
    "       12604 45"  1  1  12604 1.564386 104 0
    "       20506 58"  2  2  20506 1.564386 115 0
    "       20506135"  2  2  20506 1.564386 102 1
    "       20506150"  2  2  20506 1.564386 103 0
    "       20506165"  2  2  20506 1.564386 104 0
    "       20506181"  2  2  20506 1.564386 110 0
    "       21202 32"  2  2  21202 1.564386 104 0
    "       21202 85"  2  2  21202 1.564386 103 0
    "       21202164"  2  2  21202 1.564386 163 1
    "       30204 43"  3  3  30204 1.564386 130 0
    "       30204 73"  3  3  30204 1.564386 103 1
    "       30204104"  3  3  30204 1.564386 104 0
    "       30608 19"  3  3  30608 1.564386 112 0
    "       30608 92"  3  3  30608 1.564386 110 0
    "       40604 24"  4  4  40604 1.564386 101 1
    "       40604 96"  4  4  40604 1.564386 118 0
    "       40604114"  4  4  40604 1.564386 105 0
    "       40604149"  4  4  40604 1.564386 103 0
    "       40604167"  4  4  40604 1.564386 105 0
    "       41701 58"  4  4  41701 1.564386 110 0
    "       41701128"  4  4  41701 1.564386 118 0
    "       50102 90"  5  5  50102 1.564386 104 1
    "       50306 29"  5  5  50306 1.564386 163 0
    "       50306110"  5  5  50306 1.564386 110 0
    "       50306164"  5  5  50306 1.564386 105 0
    "       60106155"  6  6  60106 1.564386 101 0
    "       60802 47"  6  6  60802 1.564386 110 1
    "       60802102"  6  6  60802 1.564386 105 0
    "       60802184"  6  6  60802 1.564386 101 0
    "       71204 64"  7  7  71204 1.564386 115 1
    "       71204 94"  7  7  71204 1.564386 101 1
    "       71204154"  7  7  71204 1.564386 101 0
    "       72801 22"  7  7  72801 1.564386 103 0
    "       72801 78"  7  7  72801 1.564386 101 1
    "       84302 64"  8  8  84302 1.564386 101 0
    "       84302101"  8  8  84302 1.564386 105 0
    "       85806114"  8  8  85806 1.564386 104 0
    "       90511  9"  9  9  90511 1.564386 103 1
    "       90511 24"  9  9  90511 1.564386 102 1
    "       90511 39"  9  9  90511 1.564386 103 0
    "       90511 53"  9  9  90511 1.564386 104 0
    "       90511 68"  9  9  90511 1.564386 103 1
    "       90511 83"  9  9  90511 1.564386 103 0
    "       90511 97"  9  9  90511 1.564386 234 0
    "       90511112"  9  9  90511 1.564386 105 1
    "       90511126"  9  9  90511 1.564386 104 0
    "       90511185"  9  9  90511 1.564386 104 0
    "       91013 58"  9  9  91013 1.564386 103 0
    "       91013 70"  9  9  91013 1.564386 103 1
    "       91013 82"  9  9  91013 1.564386 104 0
    "       91013106"  9  9  91013 1.564386 110 0
    "       91013118"  9  9  91013 1.564386 101 1
    "      100206  3" 10 10 100206 1.564386 102 1
    "      100206 26" 10 10 100206 1.564386 104 0
    "      100206 50" 10 10 100206 1.564386 101 0
    "      100206 74" 10 10 100206 1.564386 104 1
    "      100206 97" 10 10 100206 1.564386 185 0
    "      100206133" 10 10 100206 1.564386 103 1
    "      100206168" 10 10 100206 1.564386 106 0
    "      100214 45" 10 10 100214 1.564386 103 0
    "      100214 83" 10 10 100214 1.564386 106 0
    "      100214111" 10 10 100214 1.564386 104 0
    "      100214121" 10 10 100214 1.564386 101 0
    "      110808 14" 11 11 110808 1.564386 115 0
    "      110808 32" 11 11 110808 1.564386 104 0
    "      110808 67" 11 11 110808 1.564386 115 0
    "      110808 84" 11 11 110808 1.564386 130 0
    "      110808136" 11 11 110808 1.564386 103 0
    "      112905  2" 11 11 112905 1.564386 130 0
    "      112905 39" 11 11 112905 1.564386 103 1
    "      112905 58" 11 11 112905 1.564386 118 0
    "      112905133" 11 11 112905 1.564386 105 0
    "      112905170" 11 11 112905 1.564386 185 1
    "      121002 61" 12 12 121002 1.564386 163 1
    "      122003114" 12 12 122003 1.564386 102 1
    "      122003155" 12 12 122003 1.564386 126 0
    "      122003196" 12 12 122003 1.564386 102 1
    "      130702  6" 13 13 130702 1.564386 102 0
    "      130702 57" 13 13 130702 1.564386 104 0
    "      130702108" 13 13 130702 1.564386 104 1
    "      131804 65" 13 13 131804 1.564386 163 0
    "      140306 65" 14 14 140306 1.564386 115 0
    "      140306 88" 14 14 140306 1.564386 106 0
    "      140306134" 14 14 140306 1.564386 106 0
    "      140903  2" 14 14 140903 1.564386 104 0
    "      140903 79" 14 14 140903 1.564386 101 0
    "      140903 98" 14 14 140903 1.564386 277 0
    "      140903155" 14 14 140903 1.564386 104 0
    "      150303 12" 15 15 150303 1.564386 103 0
    "      150303122" 15 15 150303 1.564386 104 0
    "      150303150" 15 15 150303 1.564386 106 1
    "      151001 12" 15 15 151001 1.564386 101 0
    "      151001 37" 15 15 151001 1.564386 105 0
    "      151001 88" 15 15 151001 1.564386 104 0
    "      151001100" 15 15 151001 1.564386 102 0
    end

  • #2
    Isaac:
    are you looking for something along the following lines?:
    Code:
    . bysort clusterno: egen wanted=mean(work)
    
    . list clusterno work wanted in 1/10
    
         +--------------------------+
         | cluste~o   work   wanted |
         |--------------------------|
      1. |    11306      0        0 |
      2. |    11306      0        0 |
      3. |    11306      0        0 |
      4. |    11306      0        0 |
      5. |    12604      0        0 |
         |--------------------------|
      6. |    12604      0        0 |
      7. |    20506      0       .2 |
      8. |    20506      1       .2 |
      9. |    20506      0       .2 |
     10. |    20506      0       .2 |
         +--------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hi Carlo,

      Thank you for your response. I believe that works, I forgot to mention my dataset includes two periods therefore would this code generate what I am looking for?

      Code:
      bysort year clusterno: egen avgwork=mean(work)
      A follow up question:
      Do I need to construct the cluster average in a way that excludes the woman being considered in each observation to avoid an in-built correlation?


      Comment


      • #4
        Isaac:
        1) your code should work as expected
        2) I do not think so, as clustering has no effect on the sampe mean, but on dispersion only.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Thank you for your help Carlo.

          Comment

          Working...
          X