Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapsing by multiple variables/matrix

    Hello all,
    What I'd like to do is collapse(sum) and collapse(mean) within overlapping groups. For example, the normal by(x) option at the end of the command groups variables per unique values of x, and by(x y) would do the same per unique value combinations of x and y. What I'm trying to do amounts to triple-counting values within a variable such that group 3 would sum/mean per values from 1-5, group 4 per values from 2-6, g5 contains 3-7, and so on. As a potentially useful step, I've created a dummy set out of the variable, such that var(d) == 1 for [(d)-2, (d)+2], assuming that would help what I'm trying to do. I can conceptually envision creating four additional byte variables for each dummy variable, so that I could group each 5 units individually, but I'm not sure where to go. On top of that, I'd like to collapse to further definition by education groups (ed) from 1-8). For reference, I've pasted the code in question below. In this context, the variable whose values I'd like to group by would be "expm", which here I just create as a dummy set.


    forval x = 3/43{
    gen expm`x' = 0
    replace expm`x' = 1 if exper == `x'
    replace expm`x' = 1 if exper == (`x'+1)
    replace expm`x' = 1 if exper == (`x'+2)
    replace expm`x' = 1 if exper == (`x'-1)
    replace expm`x' = 1 if exper == (`x'-2)
    }

    tab educr, gen (ed)
    rename ed1 ed0
    rename ed2 ed1
    rename ed3 ed2
    rename ed4 ed3

    sort educ experience
    forval ed = 0/3 {
    forval x = 3/43{
    gen ed`ed'xm`x' = 0
    }
    }
    forval ed = 0/3 {
    forval x = 3/43 {
    replace ed`ed'xm`x' =1 if expm`x' ==1
    }
    }
    forval ed= 0/3{
    recode ed`ed'xm3-ed`ed'xm43 (nonmissing = 0) if educr != `ed'
    }
    drop expm3-expm43
    drop ed0-ed3

    collapse (sum) var1 var2 var3 , by(????)


    I'm open to any ideas. perhaps I'm coming at this the wrong way. I'm an undergraduate, and by no means and expert.

    Thank you kindly,
    --Dan

  • #2
    .

    Comment


    • #3
      I don't fully follow what you're trying to do. But it seems you have a grouping variable, which I'll call group, which ranges from 1 to 45. And it seems that you have another variable (or perhaps several) that I'll call v, and you want to calculate means and sums of v aggregated over consecutive blocks of 5 consecutive values of group. And you want to do this within values of an education variable, ed.

      Code:
      forvalues g = 3/43 {
          by ed, sort: egen v_sum`g' = total(cond(inrange(group, `g'-2, `g'+2), `v', .))
          by ed, sort: egen v_mean`g' = mean(cond(inrange(group, `g'-2, `g'+2), `v', .))
      }
      keep ed v_sum* v_mean*
      by ed, sort: keep if _n == 1
      reshape long v_sum* v_mean*, i(ed) j(g5center)
      gen g_from = g5center - 2
      gen g_to = g5center + 2
      I think the above will be what you want, or at least close enough that you can take it from there.

      Notes:

      If there are several variables v you want to do this with you can wrap everything from -forvalues g =...- through -keep ed v_sum*...- in a loop over that list of variables and expand the list of variables in -reshape- accordingly.

      The last two lines of code are just a convenience to identify the lower and upper values of the grouping variable in a row. If you don't need those, you can skip them, as they contain no information that isn't implicit in the value of g5center, which identifies the midpoint of that range.

      Comment

      Working...
      X