explicit subscripting for groups of variables in Stata 17

Ben Lazrus

Join Date: Apr 2022
Posts: 25

explicit subscripting for groups of variables in Stata 17

20 Jan 2023, 00:42

Dear Statalist,

I'm working with patient hospitalization data and hoping to group together multiple separations that occur on sequential dates into the same overall admission

Specifically, I'm hoping to create a new variable that looks like "desired_var" from the given data for each individual patient

Click image for larger version

Name: image_statalist_4.png
Views: 1
Size: 11.5 KB
ID: 1697989

Using the following data:

Code:

input id date_start date_end dummy_overlap
1    100     200     1
1    200     300     1
1    500     600     .
1    700     800     .
1    900     1000     .
1    1100    1200    1
1    1200    1300    1
1    1300    1400    1
1    1700    1800    .
1    1900    2000    .
1    2200    2300    .
1    2400    2500    1
1    2500    2600    1
1    2600    2700    1
2    100     200     1
2    200     300     1
2    500     600     .
2    700     800     .
2    900     1000     .
2    1100    1200    1
2    1200    1300    1
2    1300    1400    1
2    1700    1800    .
2    1900    2000    .
2    2200    2300    .
2    2400    2500    1
2    2500    2600    1
2    2600    2700    1
end

I've tried the following code, but it doesn't create the variable I'm hoping to create

bysort id (date_end date_start): gen desiredvar = _n if dummy_overlap==1

Kind regards,
Ben

Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4443

20 Jan 2023, 01:11

Try something along these lines:

Code:

generate byte desiredvar = !missing(dummy_overlap) * missing(dummy_overlap[_n-1])
replace desiredvar = cond(missing(dummy_overlap), ., sum(desiredvar))

Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4443

20 Jan 2023, 01:19

Forgot about your IDs. Amend what I show above:

Code:

generate long row = _n

bysort id (row): generate byte desiredvar = !missing(dummy_overlap) * missing(dummy_overlap[_n-1])
by id: replace desiredvar = cond(missing(dummy_overlap), ., sum(desiredvar))

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35761

20 Jan 2023, 02:23

I read this as being about groups of observations (not variables); and in your data example they don't overlap in time; they are adjacent in time in that the start of one interval equals the end of the previous one.

So one way of thinking about it which seems consistent with Joseph Coveney's ideas is to consider this as labelling spells of observations. Using tsspell from SSC I get this

Code:

clear
input id date_start date_end dummy_overlap
1    100     200     1
1    200     300     1
1    500     600     .
1    700     800     .
1    900     1000     .
1    1100    1200    1
1    1200    1300    1
1    1300    1400    1
1    1700    1800    .
1    1900    2000    .
1    2200    2300    .
1    2400    2500    1
1    2500    2600    1
1    2600    2700    1
2    100     200     1
2    200     300     1
2    500     600     .
2    700     800     .
2    900     1000     .
2    1100    1200    1
2    1200    1300    1
2    1300    1400    1
2    1700    1800    .
2    1900    2000    .
2    2200    2300    .
2    2400    2500    1
2    2500    2600    1
2    2600    2700    1
end

bysort id (date_start) : gen pstime = _n
tsset id pstime

tsspell, cond(dummy_overlap == 1)
replace _spell = . if _spell == 0
list, sepby(id)

    +---------------------------------------------------------------------+
     | id   date_s~t   date_end   dummy_~p   pstime   _seq   _spell   _end |
     |---------------------------------------------------------------------|
  1. |  1        100        200          1        1      1        1      0 |
  2. |  1        200        300          1        2      2        1      1 |
  3. |  1        500        600          .        3      0        .      0 |
  4. |  1        700        800          .        4      0        .      0 |
  5. |  1        900       1000          .        5      0        .      0 |
  6. |  1       1100       1200          1        6      1        2      0 |
  7. |  1       1200       1300          1        7      2        2      0 |
  8. |  1       1300       1400          1        8      3        2      1 |
  9. |  1       1700       1800          .        9      0        .      0 |
 10. |  1       1900       2000          .       10      0        .      0 |
 11. |  1       2200       2300          .       11      0        .      0 |
 12. |  1       2400       2500          1       12      1        3      0 |
 13. |  1       2500       2600          1       13      2        3      0 |
 14. |  1       2600       2700          1       14      3        3      1 |
     |---------------------------------------------------------------------|
 15. |  2        100        200          1        1      1        1      0 |
 16. |  2        200        300          1        2      2        1      1 |
 17. |  2        500        600          .        3      0        .      0 |
 18. |  2        700        800          .        4      0        .      0 |
 19. |  2        900       1000          .        5      0        .      0 |
 20. |  2       1100       1200          1        6      1        2      0 |
 21. |  2       1200       1300          1        7      2        2      0 |
 22. |  2       1300       1400          1        8      3        2      1 |
 23. |  2       1700       1800          .        9      0        .      0 |
 24. |  2       1900       2000          .       10      0        .      0 |
 25. |  2       2200       2300          .       11      0        .      0 |
 26. |  2       2400       2500          1       12      1        3      0 |
 27. |  2       2500       2600          1       13      2        3      0 |
 28. |  2       2600       2700          1       14      3        3      1 |
     +---------------------------------------------------------------------+

This assumes that you already have dummy_overlap. I fear that I may be misunderstanding what you want and/or that your real data are more complicated than your data example.

Last edited by Nick Cox; 20 Jan 2023, 02:31.

Comment

Ben Lazrus

Join Date: Apr 2022

Posts: 25
#5

20 Jan 2023, 04:12

This is fantastic, thanks very much Nick Cox and Joseph Coveney!
Comment

Announcement

explicit subscripting for groups of variables in Stata 17

Comment

Comment

Comment

Comment