Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • data management lags leads...

    hello everyone i have a small dataset with diseases A, B and C which can have different severity scores ranking from 1 to 4. according to disease and sevirty score there is a theoretical length of stay. in addition there is a social precariousness score coded 0 or 1.
    i would like to generate a new length of stay that would actually correspond to the length of stay in the next severity category when the social precariousness score is 1. i have been struggling with explicit subscripting, replace, using st or ts commands...
    does anyone have a clean way of doing this
    here is an example data set (the real one is much much larger...)
    Thanks
    Mat
    Disease severityscore length of stay social score new length of stay
    A 1 3 1
    A 2 4 0
    A 2 4 1
    A 3 5 0
    A 3 5 1
    A 3 5 1
    A 4 7 1
    B 1 2 1
    B 2 4 0
    B 2 4 1
    B 2 4 1
    B 3 5 1
    C 1 2 0
    C 2 4 0
    C 3 5 1
    C 4 6 1

  • #2
    Welcome to Statalist.

    Your data is not as helpful as it could be. Can you explain why there are two identical observations for disease A severity 3 social 1? And similarly for disease B severity 2 social 1?

    In general, your example data is not presented helpfully. "length of stay" is not the name of a Stata variable, nor is "social score". You leave those of us who hope to help you having to make assumptions and explanations just to get to the point of having useful data to base sample code on.

    You will increase the likelihood of a helpful response by showing some example data output from Stata. Be sure to use the dataex command to do this. If you are running version 15.1 or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use dataex.

    Comment


    • #3
      ok i get it i was trying to clarify the names... each line is a patient with a given disease which can have different levels of severity (so different lines can have the same disease and severity). the days in hospital is how many days this type of patient stays on average nationally. the idea is that if a patient is socially precarious it should be counted as an additional severity factor with an increment of 1 on the severity score and therefore the average hospital stay should be longer and therefore compared to that of patients with a severity of +1. i hope it is clearer.
      so here is the data
      patientid Disease severityscore daysinhosp socialscore
      1 A 1 3 1
      2 A 2 4 0
      3 A 2 4 1
      4 A 3 5 0
      5 A 3 5 1
      6 A 3 5 1
      7 A 4 7 1
      8 B 1 2 1
      9 B 2 4 0
      10 B 2 4 1
      11 B 2 4 1
      12 B 3 5 1
      13 C 1 2 0
      14 C 2 4 0
      15 C 3 5 1
      16 C 4 6 1

      here are some the commands i tried gen daysinhosp2=.
      bysort dis: replace daysinhosp2=daysinhosp[_n+1] if dissevr[_n+1]!=dissevr[_n] & socialscore==1

      *does not do the trick so i tried something else
      bysort dis: gen disrank=_n
      bysort dis:gen disN=_N
      bysort dis severity: gen dissevrank=_n
      bysort dis severity: gen dissevlead=_n+1
      bysort dis severity: gen dissevlag=_n-1
      bysort dis severity: gen dissevN=_N



      bysort dis: gen failure=1 if dissevrank== dissevN
      stset severity, id( dis) fail(failure)
      recode failure .=1
      gen daysinhosp2=daysinhosp[_n+1] if dissevrank[_n]!=dissevrank[_n-1] if socialscore==1

      stfill daysinhosp2 if socialscore==1, forward
      bysort dis severity: replace daysinhosp2=daysinhosp2[_n+1] if daysinhosp2==. if socialscore==1

      the results are not pretty...
      thanks for suggestions
      mat

      Comment


      • #4
        Can you provide:
        1) The data using the dataex command (including all variables used in your code -dissevr is not in your example data) see: https://www.statalist.org/forums/help#stata
        2) Examples of the values you expect in daysinhosp2 for several observations and how you would get to that outcome by hand
        Stata/MP 14.1 (64-bit x86-64)
        Revision 19 May 2016
        Win 8.1

        Comment


        • #5
          oups copying bits in wrong order
          here is some of the syntax i tried and does not work

          bysort diseas: gen disrank=_n
          bysort diseas:gen disN=_N
          bysort diseas severity: gen dissevrank=_n
          bysort diseas severity: gen dissevlead=_n+1
          bysort diseas severity: gen dissevlag=_n-1
          bysort disease severity: gen dissevN=_N

          gen daysinhosp2=.
          bysort diseas: replace daysinhosp2=daysinhosp[_n+1] if dissevr[_n+1]!=dissevr[_n] & socialscore==1

          *does not do the trick so i tried something else




          bysort diseas: gen failure=1 if dissevrank== dissevN
          stset severity, id( dis) fail(failure)
          recode failure .=1
          gen daysinhosp2=daysinhosp[_n+1] if dissevrank[_n]!=dissevrank[_n-1] if socialscore==1

          stfill daysinhosp2 if socialscore==1, forward
          bysort diseas severity: replace daysinhosp2=daysinhosp2[_n+1] if daysinhosp2==. & socialscore==1

          Comment


          • #6
            ok here is the data with the daysinhospl2 filled manually as i would like to instruct stata to do
            patientid Disease severityscore daysinhosp socialscore daysinhosp2
            1 A 1 3 1 4
            2 A 2 4 0 2
            3 A 2 4 1 5
            4 A 3 5 0 5
            5 A 3 5 1 7
            6 A 3 5 1 7
            7 A 4 7 1 7
            8 B 1 2 1 4
            9 B 2 4 0 4
            10 B 2 4 1 5
            11 B 2 4 1 5
            12 B 3 5 1 5
            13 C 1 2 0 2
            14 C 2 4 0 4
            15 C 3 5 1 6
            16 C 4 6 1 6

            Comment


            • #7
              patientid Disease severityscore daysinhosp socialscore daysinhosp2
              1 A 1 3 1 4
              2 A 2 4 0 4
              3 A 2 4 1 5
              4 A 3 5 0 5
              5 A 3 5 1 7
              6 A 3 5 1 7
              7 A 4 7 1 7
              8 B 1 2 1 4
              9 B 2 4 0 4
              10 B 2 4 1 5
              11 B 2 4 1 5
              12 B 3 5 1 5
              13 C 1 2 0 2
              14 C 2 4 0 4
              15 C 3 5 1 6
              16 C 4 6 1 6

              Comment


              • #8
                datsinhosp2 is the same as daysinhosp when the severity score is maximum

                Comment


                • #9
                  In the future, please provide all the output from the dataex command referenced in #4:
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input byte patientid str1 disease byte(severityscore daysinhosp socialscore daysinhosp2)
                   1 "A" 1 3 1 4
                   2 "A" 2 4 0 4
                   3 "A" 2 4 1 5
                   4 "A" 3 5 0 5
                   5 "A" 3 5 1 7
                   6 "A" 3 5 1 7
                   7 "A" 4 7 1 7
                   8 "B" 1 2 1 4
                   9 "B" 2 4 0 4
                  10 "B" 2 4 1 5
                  11 "B" 2 4 1 5
                  12 "B" 3 5 1 5
                  13 "C" 1 2 0 2
                  14 "C" 2 4 0 4
                  15 "C" 3 5 1 6
                  16 "C" 4 6 1 6
                  end

                  I still do not understand the logic for daysinhosp2. Where does the value come from in each observation, for example 4 in obs 1?

                  Code:
                  . list, sepby(disease)
                  
                       +----------------------------------------------------------------+
                       | patien~d   disease   severi~e   daysin~p   social~e   daysin~2 |
                       |----------------------------------------------------------------|
                    1. |        1         A          1          3          1          4 |
                    2. |        2         A          2          4          0          4 |
                    3. |        3         A          2          4          1          5 |
                    4. |        4         A          3          5          0          5 |
                    5. |        5         A          3          5          1          7 |
                    6. |        6         A          3          5          1          7 |
                    7. |        7         A          4          7          1          7 |
                       |----------------------------------------------------------------|
                    8. |        8         B          1          2          1          4 |
                    9. |        9         B          2          4          0          4 |
                   10. |       10         B          2          4          1          5 |
                   11. |       11         B          2          4          1          5 |
                   12. |       12         B          3          5          1          5 |
                       |----------------------------------------------------------------|
                   13. |       13         C          1          2          0          2 |
                   14. |       14         C          2          4          0          4 |
                   15. |       15         C          3          5          1          6 |
                   16. |       16         C          4          6          1          6 |
                       +----------------------------------------------------------------+
                  Stata/MP 14.1 (64-bit x86-64)
                  Revision 19 May 2016
                  Win 8.1

                  Comment


                  • #10
                    with dataex
                    input byte patientid str1 disease byte(severityscore daysinhosp socialscore) float(disrank disN dissevrank dissevlead dissevlag dissevN daysinhosp2 failure)
                    1 "A" 1 3 1 1 7 1 2 0 1 . 1
                    2 "A" 2 4 0 2 7 1 2 0 2 . .
                    3 "A" 2 4 1 3 7 2 3 1 2 5 1
                    4 "A" 3 5 0 4 7 1 2 0 3 . .
                    5 "A" 3 5 1 5 7 2 3 1 3 5 .
                    6 "A" 3 5 1 6 7 3 4 2 3 7 1
                    7 "A" 4 7 1 7 7 1 2 0 1 . 1
                    8 "B" 1 2 1 1 5 1 2 0 1 . 1
                    9 "B" 2 4 0 2 5 1 2 0 3 . .
                    10 "B" 2 4 1 3 5 2 3 1 3 4 .
                    11 "B" 2 4 1 4 5 3 4 2 3 5 1
                    12 "B" 3 5 1 5 5 1 2 0 1 . 1
                    13 "C" 1 2 0 1 4 1 2 0 1 . 1
                    14 "C" 2 4 0 2 4 1 2 0 1 . 1
                    15 "C" 3 5 1 3 4 1 2 0 1 . 1
                    16 "C" 4 6 1 4 4 1 2 0 1 . 1
                    end
                    [/CODE]

                    Comment


                    • #11
                      for the first line/patient because socialscore was 1 i switched the daysinhosp reference value to that of the next severity level for that disease which is in the 2nd line daysinhosp column 4 days instead of 3

                      Comment


                      • #12
                        for the second line because social score is zero the daysinhosp2 is the same as daysinhosp
                        for line 3 the social score is one so daysinhosp2 takes the value of the next severity level 5 days (the value of daysinhosp line 4)

                        Comment


                        • #13
                          I believe the following will get you what you want:

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input byte patientid str1 disease byte(severityscore daysinhosp socialscore daysinhosp2)
                           1 "A" 1 3 1 4
                           2 "A" 2 4 0 4
                           3 "A" 2 4 1 5
                           4 "A" 3 5 0 5
                           5 "A" 3 5 1 7
                           6 "A" 3 5 1 7
                           7 "A" 4 7 1 7
                           8 "B" 1 2 1 4
                           9 "B" 2 4 0 4
                          10 "B" 2 4 1 5
                          11 "B" 2 4 1 5
                          12 "B" 3 5 1 5
                          13 "C" 1 2 0 2
                          14 "C" 2 4 0 4
                          15 "C" 3 5 1 6
                          16 "C" 4 6 1 6
                          end
                          
                          
                          bysort disease severityscore: gen n1=_N
                          bysort disease severityscore: gen n2=_n
                          egen group=group(disease severityscore)
                          bysort disease: egen lastgroup=max(group)
                          gen wanted=daysinhosp
                          replace wanted=daysinhosp[_n+n1-n2+1] if socialscore==1 & group!=lastgroup
                          list, sepby(disease)
                          Stata/MP 14.1 (64-bit x86-64)
                          Revision 19 May 2016
                          Win 8.1

                          Comment


                          • #14
                            Great thank you so much for your help!

                            Comment

                            Working...
                            X