Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • explicit subscripting for groups of variables in Stata 17

    Dear Statalist,

    I'm working with patient hospitalization data and hoping to group together multiple separations that occur on sequential dates into the same overall admission

    Specifically, I'm hoping to create a new variable that looks like "desired_var" from the given data for each individual patient

    Click image for larger version

Name:	image_statalist_4.png
Views:	1
Size:	11.5 KB
ID:	1697989



    Using the following data:

    Code:
    input id date_start date_end dummy_overlap
    1    100     200     1
    1    200     300     1
    1    500     600     .
    1    700     800     .
    1    900     1000     .
    1    1100    1200    1
    1    1200    1300    1
    1    1300    1400    1
    1    1700    1800    .
    1    1900    2000    .
    1    2200    2300    .
    1    2400    2500    1
    1    2500    2600    1
    1    2600    2700    1
    2    100     200     1
    2    200     300     1
    2    500     600     .
    2    700     800     .
    2    900     1000     .
    2    1100    1200    1
    2    1200    1300    1
    2    1300    1400    1
    2    1700    1800    .
    2    1900    2000    .
    2    2200    2300    .
    2    2400    2500    1
    2    2500    2600    1
    2    2600    2700    1
    end

    I've tried the following code, but it doesn't create the variable I'm hoping to create

    bysort id (date_end date_start): gen desiredvar = _n if dummy_overlap==1


    Kind regards,
    Ben


  • #2
    Try something along these lines:
    Code:
    generate byte desiredvar = !missing(dummy_overlap) * missing(dummy_overlap[_n-1])
    replace desiredvar = cond(missing(dummy_overlap), ., sum(desiredvar))

    Comment


    • #3
      Forgot about your IDs. Amend what I show above:
      Code:
      generate long row = _n
      
      bysort id (row): generate byte desiredvar = !missing(dummy_overlap) * missing(dummy_overlap[_n-1])
      by id: replace desiredvar = cond(missing(dummy_overlap), ., sum(desiredvar))

      Comment


      • #4
        I read this as being about groups of observations (not variables); and in your data example they don't overlap in time; they are adjacent in time in that the start of one interval equals the end of the previous one.

        So one way of thinking about it which seems consistent with Joseph Coveney's ideas is to consider this as labelling spells of observations. Using tsspell from SSC I get this


        Code:
        clear
        input id date_start date_end dummy_overlap
        1    100     200     1
        1    200     300     1
        1    500     600     .
        1    700     800     .
        1    900     1000     .
        1    1100    1200    1
        1    1200    1300    1
        1    1300    1400    1
        1    1700    1800    .
        1    1900    2000    .
        1    2200    2300    .
        1    2400    2500    1
        1    2500    2600    1
        1    2600    2700    1
        2    100     200     1
        2    200     300     1
        2    500     600     .
        2    700     800     .
        2    900     1000     .
        2    1100    1200    1
        2    1200    1300    1
        2    1300    1400    1
        2    1700    1800    .
        2    1900    2000    .
        2    2200    2300    .
        2    2400    2500    1
        2    2500    2600    1
        2    2600    2700    1
        end
        
        bysort id (date_start) : gen pstime = _n
        tsset id pstime
        
        tsspell, cond(dummy_overlap == 1)
        replace _spell = . if _spell == 0
        list, sepby(id)
        
            +---------------------------------------------------------------------+
             | id   date_s~t   date_end   dummy_~p   pstime   _seq   _spell   _end |
             |---------------------------------------------------------------------|
          1. |  1        100        200          1        1      1        1      0 |
          2. |  1        200        300          1        2      2        1      1 |
          3. |  1        500        600          .        3      0        .      0 |
          4. |  1        700        800          .        4      0        .      0 |
          5. |  1        900       1000          .        5      0        .      0 |
          6. |  1       1100       1200          1        6      1        2      0 |
          7. |  1       1200       1300          1        7      2        2      0 |
          8. |  1       1300       1400          1        8      3        2      1 |
          9. |  1       1700       1800          .        9      0        .      0 |
         10. |  1       1900       2000          .       10      0        .      0 |
         11. |  1       2200       2300          .       11      0        .      0 |
         12. |  1       2400       2500          1       12      1        3      0 |
         13. |  1       2500       2600          1       13      2        3      0 |
         14. |  1       2600       2700          1       14      3        3      1 |
             |---------------------------------------------------------------------|
         15. |  2        100        200          1        1      1        1      0 |
         16. |  2        200        300          1        2      2        1      1 |
         17. |  2        500        600          .        3      0        .      0 |
         18. |  2        700        800          .        4      0        .      0 |
         19. |  2        900       1000          .        5      0        .      0 |
         20. |  2       1100       1200          1        6      1        2      0 |
         21. |  2       1200       1300          1        7      2        2      0 |
         22. |  2       1300       1400          1        8      3        2      1 |
         23. |  2       1700       1800          .        9      0        .      0 |
         24. |  2       1900       2000          .       10      0        .      0 |
         25. |  2       2200       2300          .       11      0        .      0 |
         26. |  2       2400       2500          1       12      1        3      0 |
         27. |  2       2500       2600          1       13      2        3      0 |
         28. |  2       2600       2700          1       14      3        3      1 |
             +---------------------------------------------------------------------+
        This assumes that you already have dummy_overlap. I fear that I may be misunderstanding what you want and/or that your real data are more complicated than your data example.
        Last edited by Nick Cox; 20 Jan 2023, 02:31.

        Comment


        • #5
          This is fantastic, thanks very much Nick Cox and Joseph Coveney!

          Comment

          Working...
          X