Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a variable that tracks the first time 2 IDs appear simultaneously

    Hi,

    I am new to Stata, so I am not very familiar with the codes. I have a dataset with multiple variables. For each subject, they have multiple visits. I am interested in people that showed a variable x score >0 at 2 time points a year apart. Because they have multiple visits, there are individuals who could show variable x score >0 at year 1, 2, 3, 4 OR at year 2,3 OR 2,4 . I want to track the first 2 time point that they scored >0 without dropping the other visits. How could I proceed with this?

    An example of my data set:
    patientID_all visit_num variable x ID_metcriteria
    1 1 2 1
    1 2 1 1
    1 3 0
    1 4 0
    2 1 0
    2 2 3 2
    2 3 2 2
    2 4 3
    3 1 0
    3 2 0
    4 1 0
    4 2 2
    4 3 0
    4 4 3

    Apologies, if my question is a little confusing.

  • #2
    Please use dataex. That is indeed confusing: 5 (or is it 4) variable names; variously 3 or 4 values. Too much like hard work to decode.

    Comment


    • #3
      I don't know that this will get you exactly what you want, but this should point you in the right direction. Also, for pid==4, wouldn't that patient meet the criteria because visit#2 & #4 are greater than 1?

      If you need help using dataex to share data, I created a tutorial video on Youtube here

      Code:
      dataex pid visit_num x_score met_criteria
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(pid visit_num x_score met_criteria)
      1 1 2 1
      1 2 1 1
      1 3 0 .
      1 4 0 .
      2 1 0 .
      2 2 3 2
      2 3 2 2
      2 4 3 .
      3 1 0 .
      3 2 0 .
      4 1 0 .
      4 2 2 .
      4 3 0 .
      4 4 3 .
      end
      
      rename patientid_all pid  // just shortening
      gen x_1 = (x_score >=1)
      bysort pid (visit_num): egen count = total(x_1)
      
      . list, sepby(pid) abbrev(15)
      
           +--------------------------------------------------------+
           | pid   visit_num   x_score   met_criteria   x_1   count |
           |--------------------------------------------------------|
        1. |   1           1         2              1     1       2 |
        2. |   1           2         1              1     1       2 |
        3. |   1           3         0              .     0       2 |
        4. |   1           4         0              .     0       2 |
           |--------------------------------------------------------|
        5. |   2           1         0              .     0       3 |
        6. |   2           2         3              2     1       3 |
        7. |   2           3         2              2     1       3 |
        8. |   2           4         3              .     1       3 |
           |--------------------------------------------------------|
        9. |   3           1         0              .     0       0 |
       10. |   3           2         0              .     0       0 |
           |--------------------------------------------------------|
       11. |   4           1         0              .     0       2 |
       12. |   4           2         2              .     1       2 |
       13. |   4           3         0              .     0       2 |
       14. |   4           4         3              .     1       2 |
           +--------------------------------------------------------+
      Last edited by David Benson; 15 Jan 2019, 18:21.

      Comment


      • #4
        Your request is not entirely clear. I'll assume that David Benson has properly interpreted your data and am using his -dataex- output as a starting point. In addition, I don't understand your met_criteria variable: look at pat_id 4. That person has x > 2 at visit_num 2 and 4: so the criteria should be met at those visits--but you don't mark that.

        Also, there are only limited situations in Stata where it is helpful to create a variable with 1 and . as its values, and this isn't one of them. It is likely to get you into trouble later when you use it. Better to create the variable with 1 and 0. I have done so here:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(pid visit_num x_score met_criteria)
        1 1 2 1
        1 2 1 1
        1 3 0 .
        1 4 0 .
        2 1 0 .
        2 2 3 2
        2 3 2 2
        2 4 3 .
        3 1 0 .
        3 2 0 .
        4 1 0 .
        4 2 2 .
        4 3 0 .
        4 4 3 .
        end
        
        //    COUNT NUMBER OF TIMES X > 0
        by pid (visit_num), sort: gen pos_x = sum(x > 0)
        
        //    MARK THE FIRST TWO SUCH OBSERVATIONS IF THERE ARE AT LEAST 2
        by pid (visit_num): gen mark = inlist(pos_x, 1, 2) & pos_x != pos_x[_n-1] & pos_x[_N] >= 2
        If this is not what you were looking for, post back with a -dataex- from your actual data, and a clearer explanation.

        Comment


        • #5
          I think there was some confusion because of how I wrote my data. This is what part of my data looks like:
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          *dataex  naccid naccvnum MBI_total newid
          clear
          input str10 id byte naccvnum float(MBI_total newid)
          "000011" 1  2 1
          "000011" 2  1 1
          "000011" 3  0 0
          "000011" 4  0 0
          "000162" 1  1 1
          "000162" 2  0 0
          "000162" 3  0 0
          "000162" 4  0 0
          "000162" 5  0 0
          "000162" 6  0 0
          "000162" 7  0 0
          "000162" 8  0 0
          "000162" 9  3 1
          "000271" 1  0 0
          "000271" 2  0 0
          "000271" 3  1 1
          "000271" 4  6 1
          "000271" 5  4 1
          "000271" 6  3 1
          "000304" 1  0 0
          "000304" 2  0 0
          "000304" 3  0 0
          "000304" 4  0 0
          "000304" 5  0 0
          "000304" 6  0 0
          "000304" 7  0 0
          "000382" 1  4 1
          "000382" 2  4 1
          "000382" 3  5 1
          "000382" 4  4 1
          "000382" 5  2 1
          "000385" 1  1 1
          "000385" 2  0 0
          "000385" 3  0 0
          "000385" 4  3 1
          "000385" 5  1 1
          "000385" 6  2 1
          "000385" 7  0 0
          
          end

          I want to be able to exclude all IDs of a person, if they scored MBI_total >0 even once. I was wondering how I could do that. I introduced new column with newid=1 if MBI_total >0 and newid=0 if MBI_total=0. However, when I drop newid=1, it only drops that specific row. How would I be able to drop all the rows with that ID. So for instance: drop all 000011 ids, but keep 000162 because they scored 0 at all visits.

          Comment


          • #6
            Well, your data contradicts your explanation. In the data you show, ID 000162 scores 1 at the very first visit.

            Assuming that you really do want to drop those and only those IDs where MBI_total > 0 on any observation, (which means you do not want to keep 000162), the code would be:

            Code:
            by id, sort: egen to_drop = max(MBI_total > 0)
            drop if to_drop

            Comment


            • #7
              Thank you, that works. I missed the 000162 score 1. Based on my explanation ID 000304 is representative of what I was trying to do.

              Comment

              Working...
              X