Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouping variables to generate summary variables

    Hi all I'm using Stata 13.1 in Windows 7,

    I have a set of variables _0, _1, _2, _3, _4 upto _90. I want to generate a set of summary variables that summarise the variables into 5 year age bands - so _0, _1, _2, _3, and _4 would be _04, followed by _5, _6, _7, _8, _9 would be _59 all the way to _85, _86, _87, _88, _89 would be _8589.

    Is there a way I can automate this in Stata? I wondered whether I could reshape the data and try and summarise in a different manner.


    Code:
    clear
    input str11 ons_ccg_code int(_0 _1 _2 _3 _4 _5 _6 _7 _8 _9 _10 _11 _12 _13 _14 _15 _16 _17 _18 _19)
    "E38000056"  982 1100 1103 1150 1135 1034 1124 1156 1161 1104 1091 1071 1086 1040 1050 1113 1192 1161 1138  805
    "E38000068"  792  818  828  871  860  825  815  885  829  895  806  753  700  696  696  779  812  765  727  688
    "E38000091" 1063  941  994  999 1022  973  910  921  930  920  899  784  819  782  861  915  917  971  935  935
    "E38000101" 3062 2940 2850 2990 2838 2683 2684 2595 2568 2465 2271 2242 2213 2144 2303 2349 2475 2544 3024 4516
    "E38000151"  963  976 1018 1082 1011 1096 1073 1178 1074 1101 1000 1037  961  982  950  998 1080 1124 1140 1028
    end


  • #2
    Your question is unclear. You want to somehow combine _0, _1, _2, _3, and _4 into _04. But in what way do you want to combine them? What will be in _04? The sum? The product? The alternating sum? The sum of squares? The square of the sum? The minimum? The maximum? The median? The average... OK, you get the point.

    Comment


    • #3
      Thanks Clyde,

      You are right, apologies for the lack of information - I tried to edit the original post after I submitted but couldn't!

      Each of of the _0, _1 etc variables contains the number of people in that single age group. I wish to sum the variables in 5 year age groups.I.e.

      _0 + _1 + _2 + _3 + _4 = _04
      _5 + _6 + _7 + _8 + _9 = _59
      _10 + 11 + _12 + _13 + _14 = _1014

      and so on so that I have the total number of people in _04, _59, _1014 and so on.

      Comment


      • #4
        So, like this:

        Code:
        forvalues i = 0(5)85 {
            local ilast = `i' + 4
            gen _`i'_`ilast' = 0
            forvalues j = `i'/`ilast' {
                replace _`i'_`ilast' = _`i'_`ilast' + _`j'
            }
        }
        Notes:

        1. Your naming scheme for the new variables will fail. You can't create a new variable _59, because there is already a variable _59. So I have modified it, the new variables will be named _0_4, _5_9, _10_14, etc.

        2. The variable _90 never gets included in any of the new variables. You'll have to figure out what you want to do about that, if anything.




        Comment


        • #5
          Clyde,

          Thats amazing! Many thanks for this. I think I can find a work around for _90 that is perhaps a little more manual, but acceptable for one calculation.
          Would you mind expanding a little on actually what is happening with the code so that I understand the mechanics a little more?

          Many thanks
          Tim

          Comment


          • #6
            So, you need to read up in the online manuals on -forvalues- and on local macros.

            The outer loop creates an index i that runs from 0 through 85 in increments of 5. Within that outer loop we calculate a boundary value, ilast, which is i + 4. Then comes the inner loop on an index j that runs from i through ilast. So the first time through the outer loop, the inner loop goes from 0 through 4. The second time through the outer loop the inner loop goes from 5 through 9, etc. Within the inner loop, a running sum of the corresponding variables is accumulated, the sum having been initialized to zero in the outer loop.

            Comment


            • #7
              Thank you!

              Comment

              Working...
              X