Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating proportion by cluster instead of individual observations

    The survey collected data on income (low 0/average 5 /higher 12) and access to electricity (yes 1/no 0) on 8,516 households from 383 communities (clusters). I can calculate the proportion of households having access/no access to electricity and so on at individual level by using the tabulate command. Is it possible to do so at community level in Stata 14? There are many other variables like this, and my goal is to convert them in a way as if they were measured at community level.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str15 caseid int cluster_no byte(income electricity)
    "       1   9  2" 1  0 1
    "       1   9  3" 1  0 1
    "       1  17  4" 1 12 1
    "       1  17  7" 1  0 1
    "       1  25  3" 1 12 1
    "       1  29  2" 1 12 1
    "       1  33  2" 1  5 1
    "       1  37  2" 1  5 1
    "       1  49  2" 1  0 1
    "       1  53  6" 1  0 1
    "       1  57  2" 1 12 1
    "       1  61  2" 1  0 1
    "       1  69  4" 1  0 1
    "       1  73  2" 1  5 1
    "       1  73  3" 1  5 1
    "       1  81  4" 1  5 1
    "       1  85  2" 1  0 1
    "       1  89  2" 1  0 1
    "       1  97  3" 1  5 1
    "       1 101  2" 1  5 1
    "       1 117  2" 1 12 1
    "       2  23  2" 2  5 1
    "       2  23  3" 2  0 1
    "       2  23  4" 2 12 1
    "       2  36 10" 2  0 0
    "       2  49  3" 2 12 1
    "       2  75  2" 2  5 1
    "       2  82  4" 2  0 1
    "       2  95  4" 2  5 1
    "       2 121  2" 2  0 1
    "       2 127  3" 2  0 1
    "       2 153  1" 2  0 1
    "       2 153  2" 2 12 1
    "       2 160  3" 2 12 1
    "       2 160  4" 2  0 1
    "       2 160  6" 2 12 1
    "       2 166  2" 2  0 1
    "       2 173  2" 2 12 1
    "       2 192  2" 2  5 1
    "       3  10  3" 3  5 1
    "       3  17  2" 3  0 1
    "       3  24  2" 3  0 1
    "       3  31  6" 3 12 1
    "       3  45  4" 3  5 1
    "       3  52  1" 3 12 1
    "       3  52  2" 3 12 1
    "       3  59  3" 3  0 1
    "       3  66  3" 3  5 1
    "       3  87  2" 3  5 1
    "       3  87  3" 3  0 1
    "       3  94  3" 3  0 1
    "       3 101  3" 3  5 1
    "       3 108  4" 3  5 1
    "       3 115  4" 3 12 1
    "       3 122  2" 3  0 1
    "       3 122  4" 3 12 1
    "       3 136  2" 3  0 1
    "       3 143  1" 3  0 1
    "       3 150  2" 3  0 1
    "       3 157  1" 3 12 1
    "       3 164  2" 3 12 1
    "       3 171  2" 3  5 1
    "       3 171  6" 3  0 1
    "       3 178  4" 3  5 1
    "       3 178  7" 3  0 1
    "       3 185  3" 3  0 1
    "       3 185  6" 3  0 1
    "       3 192  2" 3  0 1
    "       3 206  3" 3 12 1
    "       3 206  5" 3  0 1
    "       4   7  2" 4  0 1
    "       4  20  1" 4  0 1
    "       4  48  1" 4  0 1
    "       4  69  1" 4  5 1
    "       4  76  1" 4  0 1
    "       4  90  4" 4  0 0
    "       4  97  3" 4  5 1
    "       4  97  4" 4  0 1
    "       4 104  2" 4  0 1
    "       4 111  1" 4  0 1
    "       4 118  3" 4 12 1
    "       4 118  4" 4  0 1
    "       4 118  5" 4  5 1
    "       4 118  6" 4 12 0
    "       4 118  7" 4  0 0
    "       4 125  4" 4  5 1
    "       4 125  7" 4  5 1
    "       4 132  2" 4  0 1
    "       4 139  3" 4 12 1
    "       4 146  3" 4  0 1
    "       4 160  3" 4  0 1
    "       4 167  1" 4  0 1
    "       4 181  2" 4 12 1
    "       4 181  3" 4  0 1
    "       4 188  1" 4  5 1
    "       4 195  4" 4  0 1
    "       4 195  9" 4 12 1
    "       4 202  3" 4  0 1
    "       4 209  4" 4  0 1
    "       5  21  2" 5  0 1
    end

  • #2
    Please clarify. Within the same cluster, some caseid's can report electricity access and others not. Do you want to consider a cluster to have electricity access if any case within it does, or only if all do, or is there some other definition?

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Please clarify. Within the same cluster, some caseid's can report electricity access and others not. Do you want to consider a cluster to have electricity access if any case within it does, or only if all do, or is there some other definition?
      Thanks Clyde for the nice questions. I know I'm not being able to ask the right question here, so its a big step. Below is the description of method that was published using a similar DHS dataset. They used different variables, but they all are categorical and the calculation should be similar. I highlighted the method they used to generate the new variables.
      Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	245.4 KB
ID:	1486785

      To give you some more details, many researchers use this dataset in many ways, but this paper is unique in their calculation of community level factors using the Primary Sampling Units (PSUs)/clusters, something generally ignored in individual level analysis. https://www.ncbi.nlm.nih.gov/pmc/art...JDB-11-161.pdf

      Please also see, -dataex- copied only observations with the value "1" for electricity and not any zeros. How can I make a reproducible data with all possible values.

      Comment


      • #4
        Well, actually there are a few zeroes for the electricity variable in the data example you gave. Not so many that any of the clusters had a median of no electricity use, but that's ok here. The following does what you want:

        Code:
        assert inlist(electricity, 0, 1) | missing(electricity)
        by cluster_no, sort: egen electricity_access = median(electricity)
        by cluster_no: replace electricity_access = !!electricity_access
        egen cluster_flag = tag(cluster_no)
        tab electricity_access if cluster_flag
        The first -egen- command identifies the median of the 0/1 electricity responses within the cluster. The second line will replace a median of 1/2, should one arise, by 1. The next -egen- command identifies one observation per cluster.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, actually there are a few zeroes for the electricity variable in the data example you gave. Not so many that any of the clusters had a median of no electricity use, but that's ok here. The following does what you want:

          Code:
          assert inlist(electricity, 0, 1) | missing(electricity)
          by cluster_no, sort: egen electricity_access = median(electricity)
          by cluster_no: replace electricity_access = !!electricity_access
          egen cluster_flag = tag(cluster_no)
          tab electricity_access if cluster_flag
          The first -egen- command identifies the median of the 0/1 electricity responses within the cluster. The second line will replace a median of 1/2, should one arise, by 1. The next -egen- command identifies one observation per cluster.
          Thanks so much Clyde. This works like magic!
          If you don't mind, can you please help me understand one more thing here. The paper I cited here, reported slightly higher number of observations for community variables than individual level variables. Shouldn't it be smaller for community variables since the number of communities are smaller than that of individuals.


          Click image for larger version

Name:	Capture.PNG
Views:	1
Size:	82.4 KB
ID:	1486803

          Comment


          • #6
            It seems like the authors calculated the proportion of individuals living in the areas with/without electricity. This is why the community level factors have similar n (The differences are due to missing values). It didn't occur to my slow brain until now. Is it possible to do this using your codes as well, to calculating the proportion of individuals living in the clusters with/without electricity?
            Last edited by Sonnen Blume; 05 Mar 2019, 20:51.

            Comment


            • #7
              [code]
              by cluster_no, sort: egen has_electricity = median(electricity)
              replace has_electricity = !!has_electricity

              tab has_electricity
              [/code

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                [code]
                by cluster_no, sort: egen has_electricity = median(electricity)
                replace has_electricity = !!has_electricity

                tab has_electricity
                [/code
                Thank you for the idea! What is the use of this: !! (!!has_electricity)
                Last edited by Sonnen Blume; 05 Mar 2019, 21:27.

                Comment


                • #9
                  The variable electricity is a 0/1 variable. If the number of 0 and 1 responses in a cluster comes out exactly equal, then the median will be 0.5. But we want has_electricity to also be 0 or 1. I could have written it as -replace has_electricity = 1 if has_electricity == 0.5-. But -replace has_electricity == !!has_electricity- has the same effect here and is shorter to type. Let's unpack it.

                  Stata, in evaluating logical expressions, treats 0 as false, and anything other than 0 as true. The ! operator converts true to false and false to true: but it does it in a special way. The output of the ! operator is always 0 when the result is false, and 1 when the result is true. So !0 = 1 and !1 = 0. What is !0.5? Well 0.5 is understood to be true. So !0.5 must be false, i.e. 0. So !!0.5 = !0 = 1. Of course !!0 = !1 = 0, and !!1 = !0 = 1. So !!x maps 0 to 0, 1 to 1, and any other non-zero value (including 0.5) to 1.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    The variable electricity is a 0/1 variable. If the number of 0 and 1 responses in a cluster comes out exactly equal, then the median will be 0.5. But we want has_electricity to also be 0 or 1. I could have written it as -replace has_electricity = 1 if has_electricity == 0.5-. But -replace has_electricity == !!has_electricity- has the same effect here and is shorter to type. Let's unpack it.

                    Stata, in evaluating logical expressions, treats 0 as false, and anything other than 0 as true. The ! operator converts true to false and false to true: but it does it in a special way. The output of the ! operator is always 0 when the result is false, and 1 when the result is true. So !0 = 1 and !1 = 0. What is !0.5? Well 0.5 is understood to be true. So !0.5 must be false, i.e. 0. So !!0.5 = !0 = 1. Of course !!0 = !1 = 0, and !!1 = !0 = 1. So !!x maps 0 to 0, 1 to 1, and any other non-zero value (including 0.5) to 1.
                    Wonderful! It seems like Stata reveals its high level secrets to selected people only...

                    Comment


                    • #11
                      No, there's nothing secret about it. You can find all of the underlying principles spelled out in the [U] User's Guide volume of the PDF manuals that come with your Stata installation.

                      The use of !! to convert 0/non-zero to 0/1 is not specifically mentioned there, but, in principle, anybody could figure it out. In truth, I did not figure it out: I learned it from somebody as a trick in C programming (where logical expressions work the same way they do in Stata) a long time back.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        No, there's nothing secret about it. You can find all of the underlying principles spelled out in the [U] User's Guide volume of the PDF manuals that come with your Stata installation.

                        The use of !! to convert 0/non-zero to 0/1 is not specifically mentioned there, but, in principle, anybody could figure it out. In truth, I did not figure it out: I learned it from somebody as a trick in C programming (where logical expressions work the same way they do in Stata) a long time back.
                        Thanks indeed much for sharing these information... !! is a special knowledge.

                        Comment

                        Working...
                        X