Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Flagging one observation per ID with specific conditions

    Hello all,

    I would like to know how many individuals have at least one observation on the variables RevenuMoyenE11 PC and QPC6. I am expecting these variables to have the same distribution for 0 or 1, which is making me doubt my code. Please note that I have 937 individuals in my sample with anywhere from 1 to 16 measurements. The variable RevenuMoyenE11 PC and QPC6 should have the same value at each measurement time therefore I am expecting <937 cases where my code flags a 1. But that is not what is happening. I provide a sample data below.

    This is the code that I have used to generate a variable that tags the first observation for each ID.

    Code:
      egen tagrev= tag(ID RevenuMoyenE11), missing
     egen tagpc= tag(ID PC), missing
     egen tagQPC6= tag(ID QPC6), missing
    with the seemingly erroneous results. Does anyone have any suggestion?

    Code:
    tagrev -- tag(ID RevenuMoyenE11)
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0     |       8085      84.42      84.42      84.42
            1     |       1492      15.58      15.58     100.00
            Total |       9577     100.00     100.00           
    -----------------------------------------------------------
    
    tagpc -- tag(ID PC)
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0     |       6662      69.56      69.56      69.56
            1     |       2915      30.44      30.44     100.00
            Total |       9577     100.00     100.00           
    -----------------------------------------------------------
    
    tagQPC6 -- tag(ID QPC6)
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   0     |       8076      84.33      84.33      84.33
            1     |       1501      15.67      15.67     100.00
            Total |       9577     100.00     100.00           
    -----------------------------------------------------------



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double(ID cycle) strL PC double(RevenuMoyenE11 QPC6)
    11000  1 ""                       . .
    11000  2 "..."                    . .
    11000  3 "..."                    . .
    11000  4 "..."                    . .
    11000  5 "..."                    . .
    11000  7 ""                       . 1
    11000  8 "..."                    . .
    11000  9 "E1A8V1"             46642 1
    11000 10 "E1A8V1"             46642 1
    11000 11 "..."                    . .
    11000 12 "..."                    . .
    11001  1 ""                       . .
    11001  2 "..."                    . .
    11001  3 "..."                    . .
    11001  4 "..."                    . .
    11001  5 "..."                    . .
    11001  7 "E..."                   . .
    11002  1 ""                       . .
    11002  2 "..."                    . .
    11002  3 "..."                    . .
    11002  5 "..."                    . .
    11002 10 "E1A8T4"             50240 4
    11002 13 "..."                    . .
    11002 14 "..."                    . .
    11002 15 "..."                    . .
    11003  1 ""                       . .
    11003  3 "..."                    . .
    11003  4 "..."                    . .
    11003  5 "..."                    . .
    11003  7 "E1A6V5" 55795.46300832925 5
    end

    Thank you for your time

    Best wishes
    Patrick

  • #2
    Code:
    by ID, sort : egen float mycount1 = count(RevenuMoyenE11)
    by ID, sort : egen float mycount2 = count( QPC6 )
    egen byte mytag1 = tag( mycount1 )
    egen byte mytag2 = tag( mycount2 )
    tab mycount1 if mytag1 ==1
    tab mycount2 if mytag2 ==1
    Best regards,

    Marcos

    Comment


    • #3
      The variable RevenuMoyenE11 PC and QPC6 should have the same value at each measurement time
      This makes no sense. First of all, PC is a string, whereas RevenuMoyenE11 and QPC6 are numeric, so they can't possibly have the same value at each measurement time. Second of all looking at your data example it is quite clear in, for example, observation 17 that PC is non-missing, but RevenuMoyenE11 and QPC6 are. Moreover, the values of RevenuMoyenE11 and QPC6 are never the same, except when they are both missing. In fact they are not even of remotely the same order of magnitude. Finally, if it were true that these three variables are all the same, why would you carry them as three separate variables?

      I would like to know how many individuals have at least one observation on the variables RevenuMoyenE11 PC and QPC6.
      To determine that, you could do something like:
      Code:
      foreach v of varlist RevenuMoyenE11 PC QPC6 {
          by ID, sort: egen has_valid_obs_`v' = count(`v')
          replace has_valid_obs_`v' = (has_valid_obs_`v' > 0)
      }
      egen id_flag = tag(ID)
      tab1 has_valid_obs_* if id_flag
      Added: Crossed with #2.

      Comment


      • #4
        Thank you for your help Marcos and Clyde.

        The variable RevenuMoyenE11 PC and QPC6 should have the same value at each measurement time
        I agree with you Clyde the sentence does not make sense. I meant to say if one variable has data then the others should have data too.

        Patrick

        Comment

        Working...
        X