Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Longitudinal data management: only keep follow up test values and dates obtained 1 year after a 1-year baseline study

    Hi everyone. I am stuck on a data management task. After reading Stata books, reading old posts in this Forum, watching YouTube videos, and drinking a lot of coffee, I decided to again ask for help. With the help of a statistician, I have already merged / appended a baseline study data set with a follow up / registry-based data set. Now, I am on my own with the data management. I have already skillfully dropped the duplicates from the follow up / baseline == 0 / registry-based data set. My task now is to only keep the follow up / registry-based HbA1c values and test dates that were done 1 year (364.25 days) after each participant's baseline HbA1c value (which was collected in over 1000 participants from October1 2008 to April 1 2010). After merging, and dropping duplicates, I have about 21,000 observations. Before merging, the baseline data set was labeled baseline == 1 and had the following variables: two different ID numbers, the date of the HbA1c test (status_dato), the HbA1c value (HbA1c_mmolmol), the year the HbA1c test was done (y_status_dato), the absolute # of days between each participant's birthday and the date of the HbA1c test (new_diff_days), the "baseline" variable (labeled as 1 or 0), and a bunch of useless string variables. The follow up / registry-based data set is labeled as baseline == 1 and has all of the same variables except it did not include any of the string variables that were in the baseline data set.

    My question = What code should I use to only keep the follow up / baseline == 0 / registry-based HbA1c values and test dates that were done 1 year (364.25 days) after each participant's baseline study's HbA1c test date?


    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double(ID1_number status_dato HbA1c_mmolmol) float(y_status_dato new_diff_days baseline) int ID_number str12 string_ID1 str9(string_ID string_status_dato)
    000000000 17875 61 2008 343 0 . "" "" ""
    111111111 17979 61 2009 81 1 180 "111111111" "0180" "23mar2009"
    222222222 18071 66 2009 173 0 . "" "" ""
    333333333 18281 64 2017 18 0 . "" "" ""
    444444444 18788 62 2019 160 0 . "" "" ""
    555555555 19025 59 20009 32 0 . "555555555" "0180" "20nov2009"

    The above dataex example is a dummy data set. It is not my real data.
Working...
X