Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Max occurrence of variable in an observation

    Software:
    OSX, Stata 13.1

    Problem:
    I want to generate a variable that contains the maximum times an event occurs in an observation. I will try to illuminate my problem, but please ask questions if I am unclear; I believe it is complicated and this is my first post on Statalist.

    Data Information and problem elaborated:
    I have cross-sectional data with variables that can be described as uniqueID, locations, years, and events. One of the events is a variable "numuadded" and another event is a variable describing numuadded called "x". I want to display the total number of times, an event described by x occurs in a particular location and year, in a variable called xTot. When the event numuadded occurs once in a particular location and year, there is a uniqueid. When the event numuadded occurs more than once in a particular location and year, there are as many observations as there are numuadded + 1, but only one observation has a uniqueID. The observation with multiple numuadded and a uniqueID is comprised of all the different identifiers "x" with the other observations, but is not what I am examining. When numuadded>1 and uniqueID is missing, those observations contain only a single x that describes the numuadded in that year. I am concerned with the observations that have 1 numuadded in a particular year and location with uniqueIDs and observations that numuadded>1in a particular year and location without uniqueIDs. This structure is purposeful, albeit complicated, so I may transfer my results into a panel dataset in the future.

    An example: an observation for LocationA over years 1 3 and 5 has numuadded with x occurring in year 1 a total of two times and x occurs with numuadded in year 3 one time. In year 1 the variable xTot should report 2 and in year 3 it should report 1. This number should repeat for all xTot in a particular location and year.
    As of now, the variable I'm interested in counts up to the total number of x in a year, so it's partly right in that it is at least showing me the highest value a single x. For example, in PointA Year1, there were 2 actions described by x occurring, but one observation for PointA Year1states "1" for xTot and the other observation for PointA Year1 states "2" for xTot. I want them both to say 2.

    Variables:
    year "a year"
    point "a location"
    uniqueid "unique string identifier for an observation"
    numuadded "an event"
    x "describes event numuadded"
    xTot "the total number of times of x occurring in a particular point and year"

    Code:
    gen x=0
    bys year point: replace x=1 if action=="x" & numuadded == 1 | action=="x" & numuadded>1 & missing(uniqueid)
    gen xTot = 0
    sort point year
    bys year point: replace xTot = sum(x) if numuadded == 1 | numuadded>1 & missing(uniqueid)

    I have attempted using the max(var) function, but have been unsuccessful in its employment and am confused with it as it appears to only work when multiple variables are used as input (I only want one variable's max, not the max of multiple variables).

    Any help is greatly appreciated. Thank you.
    Last edited by Chris Daigle; 13 Sep 2015, 23:38.

  • #2
    Chris:
    have you already taken a look at -help egen-, especially -max()- option, and related entry in Stata .pdf manual?
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Carlo:

      Thanks for your response! I have tried to use the max function, but I have not attempted egen with it; perhaps that is my problem. I reviewed the documentation and my problem is fixed!

      Thank you so much!
      -Chris

      Comment

      Working...
      X