Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data - Following the death of a participant, how to remove all following observations for them

    Hi everyone!

    I currently am cleaning a very big dataset (52 variables, 82284 observations) for longitudinal analysis. The dataset is based on information returned from 6 different surveys. I have converted the dataset to long format so currently there are about 6 different observations (in years) for each ID. There are approximately 13,000 unique ID variables. This dataset is confidential so I have created a fake example dataset to use for this question (hopefully inserted correctly below).

    So this is my issue - I have tried to create a "death after this wave" variable - which would indicate that this was the last wave of data from the person before dying. Therefore, I need to delete the waves that the person didn't participate in (so if someone only participated in three waves and then died == then only have 3 rows of data, whereas someone who was alive for all waves, will have 6 rows of data), however I am struggling to find a code that will achieve this. Does anyone have any ideas? Apologies, I am quite a novice!

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double idalias int year float(wave_sg Death_After_This_Wave)
    1 1901 0 .
    1 1904 1 .
    1 1907 2 1
    1 1910 3 .
    1 1913 4 .
    1 1916 5 .
    2 1901 0 .
    2 1904 1 .
    2 1907 2 .
    2 1910 3 .
    2 1913 4 1
    2 1916 5 .
    3 1901 0 .
    3 1904 1 .
    3 1907 2 1
    3 1910 3 .
    3 1913 4 .
    3 1916 5 .
    4 1901 0 .
    4 1904 1 .
    4 1907 2 .
    4 1910 3 .
    4 1913 4 .
    4 1916 5 1
    5 1901 0 .
    5 1904 1 .
    5 1907 2 1
    5 1910 3 .
    5 1913 4 .
    5 1916 5 .
    6 1901 0 .
    6 1904 1 .
    6 1907 2 .
    6 1910 3 .
    6 1913 4 1
    6 1916 5 .
    7 1901 0 .
    7 1904 1 1
    7 1907 2 .
    7 1910 3 .
    7 1913 4 .
    7 1916 5 .
    end

    I was thinking something like this: by idalias, sort: drop in 2/5 if _n=1 for Death_After_This_Wave (which to me means: for each ID, drop the years 1904 1907 1910 1913 1916 (i.e. observations 2 to 5) if the person has died just after the first observation (1901). I could then just edit this code and repeat it for the remaining years.


    Thanks for taking the time to read my query.
    Warm regards,
    Sarah
    Last edited by Sarah Gnib; 12 Aug 2020, 18:19.

  • #2



    Deleted
    Last edited by Sarah Gnib; 12 Aug 2020, 18:00. Reason: Data example uploaded correctly

    Comment


    • #3
      Click image for larger version

Name:	dataexample.PNG
Views:	2
Size:	99.0 KB
ID:	1568324

      Ideally, I would be able to click "drop selected data" for the example in the image but I'm unable to. Please let me know if there is anything I can do to remedy this.

      Comment


      • #4
        Click image for larger version

Name:	dataexample 2.PNG
Views:	2
Size:	99.0 KB
ID:	1568326

        Ideally, I would be able to click "drop selected data" for the example in the image but I'm unable to. Please let me know if there is anything I can do to remedy this.

        Comment


        • #5
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double idalias int year float(wave_sg Death_After_This_Wave)
          1 1901 0 .
          1 1904 1 .
          1 1907 2 1
          1 1910 3 .
          1 1913 4 .
          1 1916 5 .
          2 1901 0 .
          2 1904 1 .
          2 1907 2 .
          2 1910 3 .
          2 1913 4 1
          2 1916 5 .
          3 1901 0 .
          3 1904 1 .
          3 1907 2 1
          3 1910 3 .
          3 1913 4 .
          3 1916 5 .
          4 1901 0 .
          4 1904 1 .
          4 1907 2 .
          4 1910 3 .
          4 1913 4 .
          4 1916 5 1
          5 1901 0 .
          5 1904 1 .
          5 1907 2 1
          5 1910 3 .
          5 1913 4 .
          5 1916 5 .
          6 1901 0 .
          6 1904 1 .
          6 1907 2 .
          6 1910 3 .
          6 1913 4 1
          6 1916 5 .
          7 1901 0 .
          7 1904 1 1
          7 1907 2 .
          7 1910 3 .
          7 1913 4 .
          7 1916 5 .
          end
          
          bys idalias (year): gen todrop= sum(sum(Death_After_This_Wave))
          drop if todrop>1
          Res.:

          Code:
          . l, sepby(idalias)
          
               +----------------------------------------------+
               | idalias   year   wave_sg   Death_~e   todrop |
               |----------------------------------------------|
            1. |       1   1901         0          .        0 |
            2. |       1   1904         1          .        0 |
            3. |       1   1907         2          1        1 |
               |----------------------------------------------|
            4. |       2   1901         0          .        0 |
            5. |       2   1904         1          .        0 |
            6. |       2   1907         2          .        0 |
            7. |       2   1910         3          .        0 |
            8. |       2   1913         4          1        1 |
               |----------------------------------------------|
            9. |       3   1901         0          .        0 |
           10. |       3   1904         1          .        0 |
           11. |       3   1907         2          1        1 |
               |----------------------------------------------|
           12. |       4   1901         0          .        0 |
           13. |       4   1904         1          .        0 |
           14. |       4   1907         2          .        0 |
           15. |       4   1910         3          .        0 |
           16. |       4   1913         4          .        0 |
           17. |       4   1916         5          1        1 |
               |----------------------------------------------|
           18. |       5   1901         0          .        0 |
           19. |       5   1904         1          .        0 |
           20. |       5   1907         2          1        1 |
               |----------------------------------------------|
           21. |       6   1901         0          .        0 |
           22. |       6   1904         1          .        0 |
           23. |       6   1907         2          .        0 |
           24. |       6   1910         3          .        0 |
           25. |       6   1913         4          1        1 |
               |----------------------------------------------|
           26. |       7   1901         0          .        0 |
           27. |       7   1904         1          1        1 |
               +----------------------------------------------+

          Comment


          • #6
            This worked perfectly, thank-you so much Andrew. I have spent most of the day going through excel and doing this manually so you have saved me another day's worth of work. Thank-you again. I really appreciate it.

            Comment

            Working...
            X