Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creation of a variable with multiple conditions : what is the best suited format?

    Hi everyone,

    I need some feedback, and potentially some code corrections please.

    Basically, I want to create a new variable called -tariff_3_more_50000_w-. It should be equal to 1 if values are greater than 50,000, and 0 otherwise.
    But, this new variable depends on a lot of conditions:
    1. If one of the values of power_p1 power_p2 power_p3 power_p4 power_p5 power_p6, or all the values of these variables for each of the observations is greater than 50,000.
    2. this new variable -tariff_3_more_50000_w- also depends on another variable: -tariff_3-. -tariff_3- is a dummy variable equal to 1 if the contract in question has tariff values 3.0 3.0A and 3.1a , 0 other.
    3. Under no circumstances should this new variable be equal to 1 if all or some of the values for each observation for the variables power_p1 to power_p6 are missing.
    4. Finally, let's imagine that power_p1 equals 100, and power_p6 equals 125,000. The new variable tariff_3_more_50000_w should be equal to 1, even if only one of the values is greater than 50,000.
    Below it's the code that I tried. But I am not sure if it is correct for what I need:

    Code:
    gen tariff_3 = 1 if inrange(tariff_ekon_id_encod, 11,13) // tariffs 3.0, 3.0 and 3.1a
    replace tariff_3 = 0 if tariff_3 == .
    gen tariff_3_more_50000_w = tariff_3 == 1 & ((power_p1 >50e3 & !missing(power_p1))  ///
                                              | (power_p2 >50e3  & !missing(power_p2))  ///
                                              | (power_p3 >50e3  & !missing(power_p3))  ///
                                              | (power_p4 >50e3  & !missing(power_p4))  ///
                                              | (power_p5 >50e3  & !missing(power_p5))  ///
                                              | (power_p6 >50e3  & !missing(power_p6)))
    
    
    replace tariff_3_more_50000_w = . if tariff_3 == 1 & (missing(power_p1)  ///
                                                       &  missing(power_p2)  ///
                                                       &  missing(power_p3)  ///
                                                       &  missing(power_p4)  ///
                                                       &  missing(power_p5)  ///
                                                       &  missing(power_p6))
    
    
    drop if tariff_3 == 1 & tariff_3_more_50000_w == 1
    The last line corresponds to the sample that I want to keep, i.e. contracted powers per period (variables -power_p`i') below 50,000.

    Could you give me plese some suggestions for improving the code to make it what I want it to be, or if some styling improvements would be possible?

    Thanks in advance.

    Best,

    Michael

  • #2
    You want I think a relative of

    Code:
    gen wanted = max(power_1, power_2) > 50000 if !missing(power_p1, power_p2)
    where the condition

    Code:
    if !missing(power_p1, power_p2)
    ensures that the result will be missing if any variable is missing and otherwise

    Code:
    max(power_1, power_2) > 50000
    will be true (1) if any variable is so and false (0) otherwise .

    Naturally, you should type in the other four variable names as needed.

    The precise rules for tariff_3 are a little unclear to me but I surmise that

    Code:
    gen wanted = (tariff_3 == 1) & (max(power_1, power_2) > 50000) if !missing(power_p1, power_p2)
    will be correct or close to correct.

    Comment


    • #3
      Hi Nick Cox :

      Beautiful! It is what I needed.

      I realise that I've wasted an enormous amount of time, even though the code is... 1 line.

      Thank you so much for your help. Really appreciated.
      Best.

      Michael

      Comment


      • #4
        Brevity can be over-valued. I once spent some time working with a language called J, which prizes brevity above all things.

        The entire documentation for its embedded code editor was -- I do not exaggerate -- one sentence long. It took me two hours, and a lot of messing around, to read that sentence in the right way. You can read that as a measure of my stupidity too.

        mean =. +/ % #

        is an entire program in that language.

        Clarity is what matters! (Oh, and correctness too.)

        Comment


        • #5
          Yes, that's true.

          Still, it's lovely to see how you, and others who excel on stata that I had the privilege to exchange on this forum, manage to simplify things and be as brief as possible.

          Thank you for the existence of this platform.

          Michael

          Comment

          Working...
          X