Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Egen mean using only one value per subgroup?

    Hi!

    I want to use the egen command to create group means, using the by() command.
    Basically, "egen MEAN = mean(YearlyVariable), by(country)"
    But, one twist - I want to command to consider only one value per each subgroup.

    I have three-level data: individuals nested in country-years nested in countries. I have variables at each level. Now I want to take a country-year variable (which is same for all individuals in that country-year but different across years within a country) and compute a mean within each country. But I don't want years with more individuals to get more emphasis. Thus, I want to base the mean on only one value per country-year, but I want the variable to extend to all individuals within that country.

    Is there a way to do this without workarounds using for example =tag command?
    And if I use =tag anyway, how do I assign the value of the new variable to all individuals within the same country?

    This is how I would do it with =tag:
    egen TAGcountryyear = tag (country year)
    egen MEAN = mean(YearlyVariable), by(country) if TAGcountryyear == 1
    ----- but then the new variable is assigned only to those individuals who were tagged, how do I extend it to all in the same country?

  • #2
    Code:
    egen TAGcountryyear = tag (country year)
    
    egen MEAN = mean(cond(TAG == 1, YearlyVariable, .)), by(country)
    See also Section 9 in https://www.stata-journal.com/articl...article=dm0055

    Comment


    • #3
      Or a "two pass," more lines, but one does not need to know what the cond() function does.

      Code:
      egen TAGcountryyear = tag (country year)
      
      egen MEANtemp = mean(YearlyVariable) if TAGcountryyear == 1, by(country)
      
      egen MEAN = mean(MEANtemp), by(country)
      Or the famous "Nick's approach of division by zero"

      Code:
      egen TAGcountryyear = tag (country year)
      
      egen MEAN = mean(YearlyVariable/(TAGcountryyear == 1)), by(country)




      Comment


      • #4
        I don't know how famous that approach is, but for the record

        * it was written up in Section 10 of the paper cited in #2

        * some people have pushed back against that as too tricksy and depending too much on knowing what Stata does precisely, so I tend not to mention it any more.

        It is an accident of notation that the syntax is a little like a standard mathematical notation for conditional quantities

        Code:
        mean(YearlyVariable | TAGcountryyear == 1)
        although conversely in Stata that is legal and here (and usually) means something completely different and so gives the wrong answer.

        If you're puzzled but still interested, here is the story.

        For the ratio

        variable / (something == 1)


        there are two possible results.

        If something == 1, the denominator (parenthesised) expression is true, so returned as 1, and the expression is just variable / 1 or variable.

        Other way round if something != 1, the denominator expression is false, so returned as 0, and the expression is then variable / 0.

        As your teachers told you years ago "you can't divide by 0" except that Stata's version is that it shrugs its nanoscale shoulders and returns missing.

        Now your values have been segregated into actual values (presumably those you want) and missing values (presumably those you want to ignore).

        Stata ignores missing values to the extent possible. In this particular case the
        mean() function of egen will ignore the missings and only return missing as a result if all presented values are missing.

        And the use of
        cond() within #2 means exactly the same thing.

        A general point, widely applicable, is that many
        egen functions will accept expressions, which can be more general than single variable names.
        Last edited by Nick Cox; 01 Oct 2020, 06:59.

        Comment


        • #5
          Thanks a lot!!
          I went with the first suggestion, the cond() function. It works very nicely. Thanks!

          Comment

          Working...
          X