Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two datasets to calculate an amount of days between two dates

    Dear Statalisters,

    I'd like my data to be confidential so please forgive me if I remain vague during this thread.

    My first dataset is organized by country and by daily date. So basically, each country has 365 lines that represent a day from March 2020 to March 2021. Now, my main variable of interest is event, which is equal to 1 if something happened during the day displayed by fulldate.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 country_name float(date event)
    "Albania" 21975 0
    "Albania" 21976 0
    "Albania" 21977 0
    "Albania" 21978 0
    "Albania" 21979 0
    "Albania" 21980 0
    "Albania" 21981 0
    "Albania" 21982 0
    "Albania" 21983 0
    "Albania" 21984 0
    "Albania" 21985 0
    "Albania" 21986 0
    "Albania" 21987 1
    "Albania" 21988 1
    "Albania" 21989 1
    "Albania" 21990 1
    "Albania" 21991 1
    "Albania" 21992 1
    "Albania" 21993 1
    "Albania" 21994 1
    "Albania" 21995 1
    "Albania" 21996 1
    "Albania" 21997 1
    "Albania" 21998 1
    "Albania" 21999 1
    end
    format %td date
    My second dataset is at the individual level, which means there is a unique ID for each individual. There is also a variable datesurvey indicating the date the individual was surveyed.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str28 country_name float(id datesurvey)
    "Albania"  1 22075
    "Albania"  2 22081
    "Albania"  3 22077
    "Albania"  4     .
    "Albania"  5 22084
    "Albania"  6 22077
    "Albania"  7 22078
    "Albania"  8 22082
    "Albania"  9 22076
    "Albania" 10 22084
    "Albania" 11 22089
    "Albania" 12 22083
    "Albania" 13 22078
    "Albania" 14     .
    "Albania" 15 22081
    "Albania" 16 22071
    "Albania" 17 22079
    "Albania" 18 22074
    "Albania" 19 22092
    "Albania" 20 22075
    "Albania" 21 22089
    "Albania" 22 22082
    "Albania" 23 22091
    "Albania" 24 22090
    "Albania" 25 22081
    end
    format %d datesurvey
    My purpose is to calculate the number of days where event equals to 1 (by doing a simple -egen = total()- command) between the first of March of year 2020 and the date displayed by datesurvey. It seems complicated to me as I can't find a variable that might be used to merge the two datasets and I don't want to use a many-to-many merge. I'd like to avoid a case-by-case solution where I look at the datesurvey for individual x and report it in my first dataset. I would be very grateful if someone could help me finding an answer to my request.

    Regards,
    Last edited by Julia Simon; 14 Jun 2022, 19:25.

  • #2
    Code:
    use dataset1, clear
    bysort country (date): gen wanted = sum(event)
    drop event
    merge 1:m country date using dataset2, keep(2 3)
    Last edited by Ali Atia; 14 Jun 2022, 19:38.

    Comment


    • #3
      Dear Ali,
      The code worked well. Thank you very much for your help.

      Comment

      Working...
      X