Generating a variable that tracks the first time 2 IDs appear simultaneously

Sascha Gill

Join Date: Jan 2019

Posts: 7
#1

Generating a variable that tracks the first time 2 IDs appear simultaneously

15 Jan 2019, 11:28

Hi,

I am new to Stata, so I am not very familiar with the codes. I have a dataset with multiple variables. For each subject, they have multiple visits. I am interested in people that showed a variable x score >0 at 2 time points a year apart. Because they have multiple visits, there are individuals who could show variable x score >0 at year 1, 2, 3, 4 OR at year 2,3 OR 2,4 . I want to track the first 2 time point that they scored >0 without dropping the other visits. How could I proceed with this?

An example of my data set:
patientID_all visit_num variable x ID_metcriteria
1 1 2 1
1 2 1 1
1 3 0
1 4 0
2 1 0
2 2 3 2
2 3 2 2
2 4 3
3 1 0
3 2 0
4 1 0
4 2 2
4 3 0
4 4 3

Apologies, if my question is a little confusing.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35700
#2

15 Jan 2019, 11:35

Please use dataex. That is indeed confusing: 5 (or is it 4) variable names; variously 3 or 4 values. Too much like hard work to decode.
Comment

David Benson

Join Date: Oct 2018
Posts: 489

15 Jan 2019, 18:18

I don't know that this will get you exactly what you want, but this should point you in the right direction. Also, for pid==4, wouldn't that patient meet the criteria because visit#2 & #4 are greater than 1?

If you need help using dataex to share data, I created a tutorial video on Youtube here

Code:

dataex pid visit_num x_score met_criteria
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(pid visit_num x_score met_criteria)
1 1 2 1
1 2 1 1
1 3 0 .
1 4 0 .
2 1 0 .
2 2 3 2
2 3 2 2
2 4 3 .
3 1 0 .
3 2 0 .
4 1 0 .
4 2 2 .
4 3 0 .
4 4 3 .
end

rename patientid_all pid  // just shortening
gen x_1 = (x_score >=1)
bysort pid (visit_num): egen count = total(x_1)

. list, sepby(pid) abbrev(15)

     +--------------------------------------------------------+
     | pid   visit_num   x_score   met_criteria   x_1   count |
     |--------------------------------------------------------|
  1. |   1           1         2              1     1       2 |
  2. |   1           2         1              1     1       2 |
  3. |   1           3         0              .     0       2 |
  4. |   1           4         0              .     0       2 |
     |--------------------------------------------------------|
  5. |   2           1         0              .     0       3 |
  6. |   2           2         3              2     1       3 |
  7. |   2           3         2              2     1       3 |
  8. |   2           4         3              .     1       3 |
     |--------------------------------------------------------|
  9. |   3           1         0              .     0       0 |
 10. |   3           2         0              .     0       0 |
     |--------------------------------------------------------|
 11. |   4           1         0              .     0       2 |
 12. |   4           2         2              .     1       2 |
 13. |   4           3         0              .     0       2 |
 14. |   4           4         3              .     1       2 |
     +--------------------------------------------------------+

Last edited by David Benson; 15 Jan 2019, 18:21.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#4

15 Jan 2019, 22:18

Your request is not entirely clear. I'll assume that David Benson has properly interpreted your data and am using his -dataex- output as a starting point. In addition, I don't understand your met_criteria variable: look at pat_id 4. That person has x > 2 at visit_num 2 and 4: so the criteria should be met at those visits--but you don't mark that.

Also, there are only limited situations in Stata where it is helpful to create a variable with 1 and . as its values, and this isn't one of them. It is likely to get you into trouble later when you use it. Better to create the variable with 1 and 0. I have done so here:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(pid visit_num x_score met_criteria) 1 1 2 1 1 2 1 1 1 3 0 . 1 4 0 . 2 1 0 . 2 2 3 2 2 3 2 2 2 4 3 . 3 1 0 . 3 2 0 . 4 1 0 . 4 2 2 . 4 3 0 . 4 4 3 . end // COUNT NUMBER OF TIMES X > 0 by pid (visit_num), sort: gen pos_x = sum(x > 0) // MARK THE FIRST TWO SUCH OBSERVATIONS IF THERE ARE AT LEAST 2 by pid (visit_num): gen mark = inlist(pos_x, 1, 2) & pos_x != pos_x[_n-1] & pos_x[_N] >= 2

If this is not what you were looking for, post back with a -dataex- from your actual data, and a clearer explanation.
Comment

Sascha Gill

Join Date: Jan 2019
Posts: 7

22 Jan 2019, 13:12

I think there was some confusion because of how I wrote my data. This is what part of my data looks like:

Code:

* Example generated by -dataex-. To install: ssc install dataex
*dataex  naccid naccvnum MBI_total newid
clear
input str10 id byte naccvnum float(MBI_total newid)
"000011" 1  2 1
"000011" 2  1 1
"000011" 3  0 0
"000011" 4  0 0
"000162" 1  1 1
"000162" 2  0 0
"000162" 3  0 0
"000162" 4  0 0
"000162" 5  0 0
"000162" 6  0 0
"000162" 7  0 0
"000162" 8  0 0
"000162" 9  3 1
"000271" 1  0 0
"000271" 2  0 0
"000271" 3  1 1
"000271" 4  6 1
"000271" 5  4 1
"000271" 6  3 1
"000304" 1  0 0
"000304" 2  0 0
"000304" 3  0 0
"000304" 4  0 0
"000304" 5  0 0
"000304" 6  0 0
"000304" 7  0 0
"000382" 1  4 1
"000382" 2  4 1
"000382" 3  5 1
"000382" 4  4 1
"000382" 5  2 1
"000385" 1  1 1
"000385" 2  0 0
"000385" 3  0 0
"000385" 4  3 1
"000385" 5  1 1
"000385" 6  2 1
"000385" 7  0 0

end

I want to be able to exclude all IDs of a person, if they scored MBI_total >0 even once. I was wondering how I could do that. I introduced new column with newid=1 if MBI_total >0 and newid=0 if MBI_total=0. However, when I drop newid=1, it only drops that specific row. How would I be able to drop all the rows with that ID. So for instance: drop all 000011 ids, but keep 000162 because they scored 0 at all visits.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#6

22 Jan 2019, 14:16

Well, your data contradicts your explanation. In the data you show, ID 000162 scores 1 at the very first visit.

Assuming that you really do want to drop those and only those IDs where MBI_total > 0 on any observation, (which means you do not want to keep 000162), the code would be:

Code:

by id, sort: egen to_drop = max(MBI_total > 0) drop if to_drop
Comment
Sascha Gill

Join Date: Jan 2019

Posts: 7
#7

22 Jan 2019, 15:01

Thank you, that works. I missed the 000162 score 1. Based on my explanation ID 000304 is representative of what I was trying to do.
Comment

Announcement