Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating variable that =1 if past variables equal something

    Hi,

    Below is some data that I have that has provider names, the quarter we are in, and the units provided by that provider. I want to create a variable that =1 if there is a provider "entry" or a provider "exit."

    So, essentially my new var provider_enter would =0 but then =1 in qtr 14 for Beth (this is when she enters the market as she was providing 0 units in qtr 13). But, I don't want it to =1 in the quarters after that even though she is providing units. Basically =1 in the qtr where the previous qtr is 0 and this one she has provided some units.

    My new var provider_exit would =0 but then =1 in qtr 19, because then is when she drops from providing 236 units to 0 units, but not after that. TIA

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str66 provider_name byte qtr double units
    " Beth C." 11   0
    "Beth C." 12   0
    "Beth C." 13   0
    "Beth C." 14  24
    "Beth C." 15  88
    " Beth C." 16 112
    "Beth C." 17 124
    "Beth C." 18 236
    "Beth C." 19 0

  • #2
    This is pretty straightforward, but there are two ways you can easily get in trouble here. One is if there is typographic variation in the provider_name variable. Your example exhibits this phenomenon, or at least it does if I am correct in assuming that " Beth C." is supposed to be the same person as "Beth C." Stata will not recognize " Beth C." as the same as "Beth C." So the first step is to make the typography uniform by eliminating leading and trailing blanks as well as serial internal blanks. Also, anticipating that there may also be capitalization errors, putting everything in upper case will guard against that. What cannot be guarded against in a fully automated way is misspelling. You need to check your data set carefully for that.

    The second way you can get in trouble is if there are gaps in the time series. For example, if there were no observation for quarter 13, we would have Beth C with 0 units produced in quarter 12 and 24 units produced in quarter 14. But we would not really know whether she entered in quarter 14, or perhaps did so in quarter 13 but we lack the evidence.

    Code:
    //    FIRST FIX FORMATTING ERRORS IN PROVIDER_NAME VARIABLE
    replace provider_name = trim(itrim(upper(provider_name)))
    //    CREATE A NUMERIC PROVIDER ID TO USE WITH -xtset-
    encode provider_name, gen(provider)
    
    xtset provider qtr
    
    by provider (qtr): gen byte provider_enter = ///
        (units != 0) & !missing(units) & (L1.units == 0 | _n == 1)
    by provider (qtr): replace provider_enter = . if missing(L1.units & _n > 1)
    
    by provider (qtr): gen byte provider_exit = ///
        (units == 0) & !missing(L1.units) & (L1.units != 0 & _n > 1)
    by provider (qtr): replace provider_exit = . if missing(L1.units & _n > 1 )
    Note: In the event the full data set does have gaps in the time series between a 0 and non-zero units observation, the values of provider_exit and provider_enter are set missing value there.
    Last edited by Clyde Schechter; 23 Oct 2023, 15:49.

    Comment


    • #3
      Thank you!

      Comment

      Working...
      X