Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using EGEN to Create Count Variable

    Hello Statalisters,

    I am trying to create a variable that counts the number of times a particular county (denoted by FIPS and County Name) received funding over a five year period. I've included a data example for review:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long(fips county) int year double empgfunds
    1049 305 2014         0
    1049 305 2015         0
    1049 305 2016         0
    1049 305 2017     77426
    1049 305 2018     50000
    1049 305 2019     50000
    1049 305 2020     50000
    2068 314 2014    8533.3
    2068 314 2015   9289.25
    2068 314 2016         0
    2068 314 2017  10054.93
    2068 314 2018     12000
    2068 314 2019     12000
    2068 314 2020     12000
    2090 368 2014    142000
    2090 368 2015 139133.45
    2090 368 2016         0
    2090 368 2017    175000
    2090 368 2018    160000
    2090 368 2019    160000
    2090 368 2020    160000
    2110 559 2014    110000
    2110 559 2015    160000
    2110 559 2016         0
    2110 559 2017    160000
    2110 559 2018    160000
    2110 559 2019 154081.93
    2110 559 2020    160000
    2122 570 2014    142000
    2122 570 2015    160000
    end
    label values county county
    label def county 305 "DeKalb", modify
    label def county 314 "Denali Borough", modify
    label def county 368 "Fairbanks North Star Borough", modify
    label def county 559 "Juneau City Borough", modify
    label def county 570 "Kenai Peninsula Borough", modify
    As you can see through the data there are some places which received funding in each year and other that received funding only once, twice, or more. I'd like to create a variable that counts by fips code the number of times funding was received the maximum number of times being 5 for all years in the dataset. I believe egen will do this but after reading the help file and I think either anycount or count is the syntax I need but I'm not sure. So something like
    Code:
    egen count_empg= count(empgfunds), by fips
    for each year in the dataset so I would end up with 7 variables with a 0 if they did not receive funds and 1 if they did in a given year. However, in the help file it states that the count syntax can't be combined with by so I'm stumped. Any guidance would be great.
    Last edited by Davia Downey; 17 Jun 2022, 14:07.

  • #2
    actually, the help file specifically says that the "count" function can be combined with "by"; there are typos in your code anyway and it is not consistent with what you say; try this:
    Code:
    egen count_empg= count(empgfunds) if empgfunds>0, by(fips county)

    Comment


    • #3
      count() counts non-missing values and 0 is perfectly fine as a non-missing value, which was your problem in #1.


      count(exp) (allows by varlist
      creates a constant (within varlist) containing the number of nonmissing observations of exp. Also
      see rownonmiss() and rowmiss().

      @Rich Goldstein's code will count positive and missing values but will populate only those observations with positive or missing values.

      Code:
      egen count_empg= total(empgfunds > 0), by(fips county)
      may be closer to what you want, and
      Code:
      egen count_empg= total(empgfunds > 0 & empgfunds < .), by(fips county)
      is more cautious yet -- to exclude missing values if you have any outside your data example. But for unique observations defined by
      fips and year, egen will work as above, but is not needed.
      Code:
      bysort fips year : gen wanted = empfunds > 0 if empfunds < .

      gets you an indicator variable, 1 if positive, 0 if zero, and missing if missing. I assume that negative values do not occur.
      What Rich and I are using -- just like you -- is the older syntax with a by() option, which is undocumented but still works. But as Rich implies, you left out the parentheses. As the quotation from the help says by varlist:: is supported with count(), and with total(). If you have a version of Stata older than 17, you should tell us, as the FAQ Advice explains, but my impression is that this egen function has worked this way for many years now. anycount() is a drstraction here, but i will suppress the details.

      Comment

      Working...
      X