Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating an indicator variable using relational operators referring to multiple variables

    How do I create an indicator variable which refers to multiple variables? All the relevant variables have the same words at the beginning of their names – for example “ev_mon_” in this example data file
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float(month ff ev_mon_1 ev_mon_2)
    2708 657     -.005559 658 670
    2708 658 -.0023612394 658 670
    2708 659  .0009129993 658 670
    2708 660 -.0016764477 658 670
    2708 661 -.0025546604 658 670
    end
    format %tmCCYY_Mon month
    format %tmCCYY_Mon ev_mon_1
    format %tmCCYY_Mon ev_mon_2
    The new indicator variable will be 1 where the month observation is equal to the date in the “ev_mon_” variables but 0 if after 6 months following the “ev_mon_” date (and zero if before that date). The thing I am having difficulty with is for the expression to apply relational operators across the full sequence of “ev-mon_” variables (ie ev_mon_1, ev_mon_2, ev_mon_3 etc out to 120 variables).
    I get the command for referring to one of the “ev_mon” variables, eg
    Code:
    gen byte new_var=0
    replace new_var = (month>ev_mon_1 & month<(ev_mon_1+7))
    But I don’t yet see the appropriate way to refer to multiple variables, eg (for a failed attempt)
    Code:
    gen byte exog2 = (month>ev_mon_* & month<(ev_mon_*+7))
    I hope that makes sense? I am using Stata 15. Thank you for your help, Dan

  • #2
    I don't understand what you want. Suppose that month is after ev_mon_1 and no later than ev_mon_1 + 6, but the corresponding statement is not true for ev_mon_2 and ev_mon_2+6. Is the new variable 1 (because it works for one of the ev_mon_* variables) or is it 0 (because it doesn't work for all of them.) Are you looking for all of the ev_mon conditions to be met, or are you looking only for any one or more of them to be met? Or maybe you have something entirely different in mind?

    Comment


    • #3
      Thank you for your response Clyde. Yes, if a month is after ev_mon_1 and no later than ev_mon_1 + 6, then the new variable will be 1. If the corresponding statement is not true for ev_mon_2 and ev_mon_2+6, the new variable will still be 1. Yes, you are correct in saying "because it works for one of the ev_mon_* variables". i.e. I am only looking for at least one of the ev_mon conditions to be met. Does that help clarify my objective? Thanks again, Dan

      Comment


      • #4
        I can recommend two different approaches:

        Code:
        // METHOD 1
        gen byte new_var = 0
        foreach v of varlist ev_mon_* {
            replace new_var = 1 if inrange(month, `v'+1, `v'+6)
        }
        OR
        Code:
        //    METHOD 2
        gen long obs_no = _n
        reshape long ev_mon, i(obs_no) j(_j) string
        by obs_no, sort: egen new_var = max(inrange(month, ev_mon + 1, ev_mon + 6))
        Which to choose depends on what else you will be doing with this data. As you have no doubt seen mentioned in many threads on Statalist, most things in Stata are most easily done with data in long layout. So it is likely that Method 2, because it converts your data to long layout, will position you to move forward efficiently with the rest of your work. But if this step is near the end of your analysis plan and you find the wide layout more convenient for data display or other reasons, then Method 1 is best.

        Note that both methods work regardless of the number of ev_mon_* variables.

        Comment


        • #5
          Thank you Clyde, both are great and yes, I see your point. I'll apply method 1 at the moment as there is some regression analysis necessary at this stage. However, thank you for your guidance on creating a long layout (i.e. method 2) it will be very helpful for subsequent development of the data set. When I get to that stage I'll review the long to wide commands to get the long layout back to wide layout for regressions and graphs. Thank you, Dan

          Comment

          Working...
          X