Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count a dummy variable for each group

    Hey everybody,

    I want to count how many time a dummy variable equals one if an identifier has a specific value.
    In the end I want to compute percentage share when the dummy equals if an identifier has a specific value.

    I tried that code:
    Code:
    gen freq = _N if id==1
    egen dummy  = count(mpg) if mpg==1 & id==1
    egen cdummy = sum(dummy) & id==1
    gen share = dummy/freq if id==1
    It seems to work out.


    But it would be even better if I can compute the percentage share for different values of id so that I don't have to repeat the command for each identifier.

    Any suggestions?

    Many thanks in advance.

    Bene

  • #2
    Let's take some of the details here one by one. Most pedantically,

    Code:
    egen cdummy = sum(dummy) & id==1
    looks illegal to me and should not have worked. Perhaps the parentheses are just in the wrong place, but I can't be confident of what you were intending.

    Now let's be more positive. I can't happily read mpg as meaning an indicator, so I use index with that meaning below.

    First, be aware of count as doing what it says. Often it's a good idea to save its results in a local macro before that is overwritten.

    Code:
    count if id == 1
    local N = r(N)
    It seems that what you seek is, combined, for one identifier.

    Code:
    count if id == 1  
    local N = r(N)  
    count if index == 1 & id == 1
    local N1 = r(N)  
    di `N1'/`N'
    That's a good approach for one identifier, and saves on the creation of variables that just hold constants. But we need something better otherwise.

    Second, egen, sum() still works but since Stata 9 is undocumented in favour of total(). More crucially, that function is very flexible. So also is mean().

    Now a proportion (share) is just the mean of an indicator. So, I guess you want

    Code:
    egen prop = mean(index == 1), by(id)
    and the factor of 100 converts to a percentage

    Code:
    egen pc = mean(100 * (index == 1)), by(id)
    Last edited by Nick Cox; 24 Nov 2015, 14:37.

    Comment


    • #3
      Thank you for the answer.

      I will try it out tomorrow.


      You are also right with illegal code.

      That is correct:
      Code:
      egen cdummy = sum(dummy) if id==1

      Comment

      Working...
      X