Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I Use Assert Command to Determine if the Dataset is Correctly Setup?

    I have a long format sample and it is layout as follows,
    I want to use assert command to verify if the dataset is setup correctly,

    Within id, as long as there is a "1" and no missing value on the "state" variable, then the value of "retain" should be equal to 1 within id.
    Within id, if all values of "state" variable are equal to 0 and there is no missing value on this variable, then the value of "retain" should be equal to 0 within id.
    Within id, as long as there is a missing value on the state variable, then the value of "retain" should be equal to missing value within id.

    Can someone help me to do this with Stata code?
    Thank you!



    clear
    input str10 id byte (grade state retain)
    1 1 0 0
    1 2 0 0
    1 3 0 0
    2 1 0 1
    2 1 1 1
    2 2 0 1
    3 1 0 .
    3 2 0 .
    3 . . .
    3 4 0 .
    4 1 0 1
    4 2 0 1
    4 2 1 1
    end
    Last edited by smith Jason; 15 Nov 2022, 15:38.

  • #2
    Code:
    . sort id, stable
    
    . 
    . by id: egen max_state = max(state), missing
    (4 missing values generated)
    
    . list, noobs sepby(id)
    
      +----------------------------------------+
      | id   grade   state   retain   max_st~e |
      |----------------------------------------|
      |  1       1       0        0          0 |
      |  1       2       0        0          0 |
      |  1       3       0        0          0 |
      |----------------------------------------|
      |  2       1       0        1          1 |
      |  2       1       1        1          1 |
      |  2       2       0        1          1 |
      |----------------------------------------|
      |  3       1       0        .          . |
      |  3       2       0        .          . |
      |  3       .       .        .          . |
      |  3       4       0        .          . |
      |----------------------------------------|
      |  4       1       0        1          1 |
      |  4       2       0        1          1 |
      |  4       2       1        1          1 |
      +----------------------------------------+
    
    . 
    . assert retain==max_state
    
    .

    Comment


    • #3
      Originally posted by William Lisowski View Post
      Code:
      . sort id, stable
      
      .
      . by id: egen max_state = max(state), missing
      (4 missing values generated)
      
      . list, noobs sepby(id)
      
      +----------------------------------------+
      | id grade state retain max_st~e |
      |----------------------------------------|
      | 1 1 0 0 0 |
      | 1 2 0 0 0 |
      | 1 3 0 0 0 |
      |----------------------------------------|
      | 2 1 0 1 1 |
      | 2 1 1 1 1 |
      | 2 2 0 1 1 |
      |----------------------------------------|
      | 3 1 0 . . |
      | 3 2 0 . . |
      | 3 . . . . |
      | 3 4 0 . . |
      |----------------------------------------|
      | 4 1 0 1 1 |
      | 4 2 0 1 1 |
      | 4 2 1 1 1 |
      +----------------------------------------+
      
      .
      . assert retain==max_state
      
      .
      Thanks for your help.
      However, I just found the rule above I said is wrong.
      The correct rule should be as follows,
      Within id, as long as there is a "1" on the "state" variable and whether this variable has missing values or not, then the value of "retain" should be equal to 1 within id.
      Within id, if all values of "state" variable are equal to 0 and there is no missing value on this variable, then the value of "retain" should be equal to 0 within id.
      Within id, when the state variable just consists of missing values and zero, then the value of "retain" should be equal to missing value within id.
      Also, I don't understand why you added coma and missing after max(state) in your initial response.

      Below is the corrected dataset.

      clear
      input str10 id byte (grade state retain)
      1 1 0 0
      1 2 0 0
      1 3 0 0
      2 1 0 1
      2 1 1 1
      2 2 0 1
      3 1 0 .
      3 2 0 .
      3 . . .
      3 4 0 .
      4 1 0 1
      4 2 0 1
      4 . . 1
      4 4 0 1
      4 4 1 1
      end

      Thank you!
      Last edited by smith Jason; 15 Nov 2022, 17:02.

      Comment


      • #4
        Also, I don't understand why you added coma and missing after max(state) in your initial response.
        You should start by removing the missing option (on commands a comma is used to separate the options from the main part of the command) from the command, rerun the example, and see what the difference is.

        You will find that that the egen max() function ignores missing values (unless all values are missing) so max_state will be zero rather than missing when the missing option is omitted. Adding the missing option causes egen max() to work like the standard max() function, where a missing value is larger than any nonmissing value, giving the results shown in post #2.

        Code:
        . sort id, stable
        
        . 
        . by id: egen state_1 = max(state==1)
        
        . by id: egen state_m = max(state==.), missing
        
        . 
        . generate byte test = .
        (15 missing values generated)
        
        . replace       test = 1 if state_1==1
        (8 real changes made)
        
        . replace       test = 0 if state_1==0 & state_m==0 
        (3 real changes made)
        
        . 
        . list, noobs sepby(id)
        
          +--------------------------------------------------------+
          | id   grade   state   retain   state_1   state_m   test |
          |--------------------------------------------------------|
          |  1       1       0        0         0         0      0 |
          |  1       2       0        0         0         0      0 |
          |  1       3       0        0         0         0      0 |
          |--------------------------------------------------------|
          |  2       1       0        1         1         0      1 |
          |  2       1       1        1         1         0      1 |
          |  2       2       0        1         1         0      1 |
          |--------------------------------------------------------|
          |  3       1       0        .         0         1      . |
          |  3       2       0        .         0         1      . |
          |  3       .       .        .         0         1      . |
          |  3       4       0        .         0         1      . |
          |--------------------------------------------------------|
          |  4       1       0        1         1         1      1 |
          |  4       2       0        1         1         1      1 |
          |  4       .       .        1         1         1      1 |
          |  4       4       0        1         1         1      1 |
          |  4       4       1        1         1         1      1 |
          +--------------------------------------------------------+
        
        . 
        . assert retain==test
        
        .

        Comment


        • #5
          Thank you. Is there other way to write the code in assert directly like this?============ bys id: egen xx= if blah blah==.....
          I don't know if the code can be written in this way.

          Comment


          • #6
            Restating the rules in post #3, within each id
            • state should be 0, 1, or . (missing)
            • If state==1 then retain==1
            • otherwise if state==. then retain==.
            • otherwise retain==0
            Code:
            . sort id, stable
            
            . 
            . assert inlist(state,0,1,.)
            
            . 
            . by id: egen rank = min( cond(state==1,1,0) + cond(state==.,2,0) + cond(state==0,3,0) )
            
            . list, noobs sepby(id)
            
              +------------------------------------+
              | id   grade   state   retain   rank |
              |------------------------------------|
              |  1       1       0        0      3 |
              |  1       2       0        0      3 |
              |  1       3       0        0      3 |
              |------------------------------------|
              |  2       1       0        1      1 |
              |  2       1       1        1      1 |
              |  2       2       0        1      1 |
              |------------------------------------|
              |  3       1       0        .      2 |
              |  3       2       0        .      2 |
              |  3       .       .        .      2 |
              |  3       4       0        .      2 |
              |------------------------------------|
              |  4       1       0        1      1 |
              |  4       2       0        1      1 |
              |  4       .       .        1      1 |
              |  4       4       0        1      1 |
              |  4       4       1        1      1 |
              +------------------------------------+
            
            . 
            . assert (rank==1 & retain==1) | (rank==2 & retain==.) | (rank==3 & retain==0)
            
            .

            Comment

            Working...
            X