Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping the whole group based on condition relating to missing variable

    Dear all,

    Today I found a problem regarding dropping the whole group conditionally. This time it seems that I got the idea but the syntax get wrong.

    My dataset is

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year1 long TYPE2 float(treat1 TAT_w1 FAT_w1 post33 post33treat1)
    2019  4 1         .         . 1 1
    2020  4 1  1.575724  6.876883 1 1
    2017  6 1         .         . 0 0
    2018  6 1 1.4682583 2.2230647 0 0
    2019  6 1  1.369082 2.0798314 1 1
    2020  6 1  1.083867  1.665593 1 1
    2021  6 1  .9985812 1.4405484 1 1
    2016  7 1         .         . 0 0
    2017  7 1 1.4990585  9.328573 0 0
    2018  7 1 1.4415038 14.143867 0 0
    2019  7 1  .9044999 16.777643 1 1
    2020  7 1  .9892513 19.836254 1 1
    2021  7 1  .6773922 13.363945 1 1
    2016 10 0         .         . 0 0
    2019 10 0         .         . 1 0
    2020 10 0 1.2700052 10.358336 1 0
    2016 11 0         .         . 0 0
    2017 11 0  .5638924  .6820889 0 0
    2018 11 0  .6587502  .8348646 0 0
    2019 11 0  .8550852 1.1720647 1 0
    2020 11 0  .9116079 1.3943592 1 0
    2021 11 0  .9081573  1.464206 1 0
    2017 13 0         .         . 0 0
    2018 13 0  1.738603 14.170573 0 0
    2019 13 0 2.1920629 17.122456 1 0
    2020 13 0 1.5401927 10.125214 1 0
    end
    label values TYPE2 TYPE2
    label def TYPE2 4 "2563UZ", modify
    label def TYPE2 6 "2580PG", modify
    label def TYPE2 7 "25846A", modify
    label def TYPE2 10 "2621N5", modify
    label def TYPE2 11 "2622UU", modify
    label def TYPE2 13 "2625KH", modify
    In this dataset, I want to delete all observations of any TYPE2 that has FAT_w1 or TAT_w1 missing in the year1=2019 and treat1=1. For the example above, the observation 1,2 would be deleted

    From what I learnt from Statalist so far, I try to generate the code myself as below, however, it got the issue here
    Code:
    invalid syntax
    r(198);
    My code is:

    Code:
         bysort TYPE2 post33treat1 (year1): generate indicator2=1 if year1=2019 & (FAT_w1=.|TAT_w1=.)
         egen dropmiss = total(indicator2==1 & treat1=1) , by(TYPE2)
       drop if dropmiss
    Could you please help me to sort it out?

    Thanks in advance and best regards.

    Last edited by Phuc Nguyen; 16 Sep 2022, 16:47.

  • #2
    In Stata, you have to distinguish = from ==. The single = is used in commands like -generate-, -replace-, and -egen- to separate the variable to be created (or, in the case of -replace-, modified) from the expression to be assigned to it. By contrast, the operator designating that two things are equal in value is denoted with the double-equals operator ==. They are not interchangeable--they have different meanings, and in most situations*, using the wrong one will also result in a syntax error.

    Code:
    bysort TYPE2 post33treat1 (year1): generate indicator2=1 if year1==2019 & (FAT_w1==.|TAT_w1==.)
    egen dropmiss = total(indicator2==1 & treat1==1) , by(TYPE2)
    As an aside, you are creating this variable indicator2 is a 1/. variable. The way you are using it, this will give correct results. But, in general, and you should form the habit, it is best to create yes/no variables as 1/0 variables. And in most case, as here, this can be done in one step.
    Code:
    bysort TYPE2 post33treat1 (year1): generate indicator2 = year1==2019 & (FAT_w1==.|TAT_w1==.)
    You actually used this way of creating a 1/0 expression in the -egen dropmiss- command, so you are familiar with it. You should make it a habit to do this whenever you create yes/no variables.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      In Stata, you have to distinguish = from ==. The single = is used in commands like -generate-, -replace-, and -egen- to separate the variable to be created (or, in the case of -replace-, modified) from the expression to be assigned to it. By contrast, the operator designating that two things are equal in value is denoted with the double-equals operator ==. They are not interchangeable--they have different meanings, and in most situations*, using the wrong one will also result in a syntax error.

      Code:
      bysort TYPE2 post33treat1 (year1): generate indicator2=1 if year1==2019 & (FAT_w1==.|TAT_w1==.)
      egen dropmiss = total(indicator2==1 & treat1==1) , by(TYPE2)
      As an aside, you are creating this variable indicator2 is a 1/. variable. The way you are using it, this will give correct results. But, in general, and you should form the habit, it is best to create yes/no variables as 1/0 variables. And in most case, as here, this can be done in one step.
      Code:
      bysort TYPE2 post33treat1 (year1): generate indicator2 = year1==2019 & (FAT_w1==.|TAT_w1==.)
      You actually used this way of creating a 1/0 expression in the -egen dropmiss- command, so you are familiar with it. You should make it a habit to do this whenever you create yes/no variables.
      It worked nicely and I got another lesson to learn. Thanks a heap Clyde Schechter .

      Comment

      Working...
      X