Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variable dependent on multiple conditions

    Hi everyone!

    I want to create a dummy variable (Dum2) that is 1 based on the condition that another dummy (Dum1) is 1 in a certain condition (year; Cond1) for all observations of ID.

    This table should be helpful to illustrate my problem and to show how I want Dum2 to behave:

    ID Cond1 (year) Cond2 (value; 0-1) Dum1 (Cond2>.5) Dum2
    A 2001 .5 0 0
    A 2002 .8 1 0
    A 2003 .2 0 0
    B 2001 .3 0 1
    B 2002 .3 0 1
    B 2003 .6 1 1
    C 2001 .6 1 0
    C 2002 .7 1 0
    C 2003 .6 1 0

    To generate Dum1 I did
    Code:
    generate Dum1 = Cond2>.5 if !missing(Cond2)
    Dum2 should be 1 for all ID, if Dum1 is 1 under Cond1==2002 (in year 2002)

    Put differently, Dum2 should be 1 for all Cond1, if Cond2 is met under Cond1==2002.

    I tried to generate Dum2 by

    Code:
    generate Dum2 = Cond2<=.5 if Cond1==2002 & !missing(Cond2)
    or constructions like

    Code:
    levelsof ID, local(levels)
    foreach ID of local levels {
    generate Dum2=1 if Dum1==1 & Cond1==2002
    }
    but this causes Dum2==1 only when Cond1==2002 and thus not for all Cond1 of ID, which is what I want.

    I am a novice to Stata and statistical software in general so I hope this is not too dull to be posted here, but I did search through all the documentation and manuals I could find and wasn't able to come up with anything that worked.

    For your help thank you very much in advance!


    Best regards,
    Roman




  • #2
    Thanks for your example. Note that using dataex (SSC) as we request in the FAQ Advice would have been even more helpful.

    I'm having a hard time seeing what you ask as consistent.

    Consider this. I won't use the name Cond2 for year!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 ID float(year Cond2 Dum1 wanted)
    "A" 2001 .5 0 0
    "A" 2002 .8 1 0
    "A" 2003 .2 0 0
    "B" 2001 .3 0 1
    "B" 2002 .3 0 1
    "B" 2003 .6 1 1
    "C" 2001 .6 1 0
    "C" 2002 .7 1 0
    "C" 2003 .6 1 0
    end
    
    egen Dum2 = total(year == 2002 & Dum1 ==1), by(D)
    
    list, sepby(ID)
    
         +------------------------------------------+
         | ID   year   Cond2   Dum1   wanted   Dum2 |
         |------------------------------------------|
      1. |  A   2001      .5      0        0      1 |
      2. |  A   2002      .8      1        0      1 |
      3. |  A   2003      .2      0        0      1 |
         |------------------------------------------|
      4. |  B   2001      .3      0        1      0 |
      5. |  B   2002      .3      0        1      0 |
      6. |  B   2003      .6      1        1      0 |
         |------------------------------------------|
      7. |  C   2001      .6      1        0      1 |
      8. |  C   2002      .7      1        0      1 |
      9. |  C   2003      .6      1        0      1 |
         +------------------------------------------+
    I am counting, for each ID, whether Dum1 is 1 in 2002. That count will be either 1 (it happened) or 0 (it didn't). So, the count is automatically an indicator variable. (I won't use the poor term "dummy" of my own free will.)

    I think that's what you are asking for, but it doesn't tally with your listing.

    Note that the code for this has to work with blocks of observations for each identifier.
    Last edited by Nick Cox; 20 Jan 2017, 05:56.

    Comment


    • #3
      Note that the by(D) in the code should be by(ID). The code does work; I just had a version with D not ID that wasn't completely edited.

      Comment


      • #4
        Dear Nick,

        Thank you very much for your immediate and comprehensive reply! It's been very helpful as I wouldn't have come up with this solution by myself.

        It became clear to me that it is suboptimal to post abstract instead of practical examples.

        Thanks also for the note that your code's "by-option" of egen was lacking an "I". I realised that when playing around with it. No harm done.

        Taking the chance to make use dataex (thanks for the hint and congrats on that nice tool of yours!) here is my example again in a more practical manner, in hope this might be helpful to someone else in the future.

        The variable PCInBoard is the percentage of independent directors on a board. CompliantYear is an indicator variable that is 1 in case the value of PCIndBoard suggests compliance (which is >.5) in a given year. As I am undertaking an event study I am interested in a variable that indicates whether a company (variable ID) has been noncompliant in year 2002 (variable NonCompl2002).


        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        
        * This is your code applied: 
        
        input str1 ID float(year PCIndBoard CompliantYear wanted NonCompl2002)
        "A" 2001 .5 0 0 0
        "A" 2002 .8 1 0 0
        "A" 2003 .2 0 0 0
        "B" 2001 .3 0 1 1
        "B" 2002 .3 0 1 1
        "B" 2003 .6 1 1 1
        "C" 2001 .6 1 0 0
        "C" 2002 .7 1 0 0
        "C" 2003 .6 1 0 0
        end
        
        egen NonCompl2002_pre = total(year == 2002 & CompliantYear ==1), by(ID)
        
        * This is what I added to come up with what I needed.
        * (It can be done way more efficiently, I am affraid, but it does the trick.)
        
        generate NonCompl2002 = NonCompl2002_pre==0 if !missing(NonCompl2002_pre)
        drop NonCompl2002_pre
        
        list, sepby(ID)
        
        
             +-----------------------------------------------------+
             | ID   year   PCIndB~d   Compli~r   wanted   Non~2002 |
             |-----------------------------------------------------|
          1. |  A   2001         .5          0        0          0 |
          2. |  A   2002         .8          1        0          0 |
          3. |  A   2003         .2          0        0          0 |
             |-----------------------------------------------------|
          4. |  B   2001         .3          0        1          1 |
          5. |  B   2002         .3          0        1          1 |
          6. |  B   2003         .6          1        1          1 |
             |-----------------------------------------------------|
          7. |  C   2001         .6          1        0          0 |
          8. |  C   2002         .7          1        0          0 |
          9. |  C   2003         .6          1        0          0 |
             +-----------------------------------------------------+
        Thank you very much again and have a nice weekend!


        Regards,
        Roman

        Comment


        • #5
          It seems that you just to reverse the indicator, which I think is .

          Code:
           
           egen NonCompl2002_pre = max(year == 2002 & CompliantYear == 0), by(ID)

          Comment


          • #6
            Lightning fast reply that is perfectly working!
            Thanks! Very appreciated.


            BTW, am I supposed to close or tag this post solved or something similar? If so: where can this be done? Thanks in advance.

            Comment


            • #7
              It is sufficient to report when a problem is solved. There is no closure otherwise; indeed a thread remains open for other posts on the same subject.

              Comment

              Working...
              X