Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating count variable for each county year if they fulfill a certain condition

    I have county-wise annual data on windspeed. Now I want to count how many events (an event is defined as when a county faces windspeed > 25m/s for that year) are experienced by each county, for each year.

    I am trying to get something like this:

    by fips (year), sort: gen Num_of_Hurr = count(vmax_sust>25)

    Where ideally the new variable will return the number of events experienced by each county for each year.

  • #2
    Code:
    bysort fips: egen num_of_events = total(vmax_sust > 25 & !missing(vmax_sust))
    No need to sort by year here since order doesn't matter. Stata considers missing to be the largest possible number, so we need to deal with that case explicitly in the expression. Alternatively, if you want a running count year-over-year, sort by fips code and year and then use the sum() function on the expression.
    Last edited by Daniel Schaefer; 01 Jan 2024, 14:28.

    Comment


    • #3
      On the assumption that your data are suitably organized for this purpose, (which I cannot tell as you show no example data) what you want is
      Code:
      by fips year, sort: egen num_of_hurr = total(vmax_sust > 25)
      Notes:
      1. Pay particular attention to the absence of parentheses around year. With the parentheses you would get a single count for each county, a count of all the hurricane events in all the years combined. You say you want a separate count per year, so the parentheses must not be there.

      2. There is no -count()- function that works with -gen-. There is an -egen- -count()- function, but it does not do what you need. That -count()- function gives a count of the number of times its argument expression evaluates to a non-missing value. Now, vmax_sust>25 is a logical expression, and it always evaluates to either 0 or 1, both of which are non-missing. So -by fips (year), sort: egen new_variable = count(vmax_sust>25)- would just give you a count of the number of observations in each county. What you want, the number of observations in which vmax_sust > 25 is true, requires a different function. vmax_sust>25 will be 1 whenever it is true and 0 when it isn't. So if you add up all the 1's and 0's you would get the number of observations where vmax_sust > 25 is true. Hence the use of the -total()- function in the code I show above.

      Added: Crossed with #2. Note that the solution there gives the total number of vmax_sust>25 events the county experiences in all years combined, whereas the code offered here produces a separate total for each year for each county, which is what I understand O.P. to want. However, his inclusion of -& !missing(vmax_sust)- is a real improvement on my code. I had tacitly assumed that vmax_sust is never missing, but if it can be, then it would (erroneously) be counted as > 25 by the code here.
      Last edited by Clyde Schechter; 01 Jan 2024, 14:37.

      Comment


      • #4
        Just to follow up on #3, based on OP's description of the data as "county-wise annual" I came away with the impression that there should be exactly one observation for each county in each year - in which case a total number of events should not be greater than 1. If OP has a different data structure such that there are multiple observations per county/year, that wasn't clear to me from #1. A -dataex- example could have resolved this particular ambiguity.

        Comment


        • #5
          And my interpretation of the original post was influenced by "Now I want to count how many events (an event is defined as when a county faces windspeed > 25m/s for that year) are experienced by each county, for each year" and "Where ideally the new variable will return the number of events experienced by each county for each year."

          Clearly O.P. has contradicted himself, as his data description is inconsistent with his stated purpose. And Daniel Schaefer is quite correct that having example data would have made the situation clearer. But still, O.P. needs to word his post more carefully.

          Comment

          Working...
          X