Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating time between dates when data formatted long

    Hello Statalist; I have a dataset formatted long, with survey waves and corresponding date administered. How can I efficiently create a new variable that calculates the number of days between the date of Wave 12 and each previous date?



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(person_id survey_wave survey_date)
    2  1 20784
    2  2 20811
    2  2 20809
    2  2 20787
    2  2 20810
    2  2 20800
    2  2 20802
    2  2 20788
    2  2 20812
    2  2 20797
    2  2 20793
    2  2 20813
    2  2 20792
    2  2 20808
    2  2 20798
    2  2 20801
    2  2 20806
    2  2 20807
    2  2 20789
    2  2 20786
    2  2 20805
    2  2 20794
    2  2 20790
    2  2 20791
    2  2 20785
    2  2 20799
    2  2 20795
    2  2 20803
    2  2 20804
    2  2 20814
    2  2 20796
    2  3 20819
    2  4 20980
    2  5 20994
    2  5 20988
    2  5 20985
    2  5 20992
    2  5 20984
    2  5 20982
    2  5 20990
    2  5 20991
    2  5 20989
    2  5 20993
    2  5 20981
    2  5 20987
    2  5 20986
    2  5 20983
    2  7 21013
    2  8 21069
    2 12 21449
    3  1 20784
    3  2 20811
    3  2 20813
    3  2 20799
    3  2 20802
    3  2 20809
    3  2 20785
    3  2 20789
    3  2 20810
    3  2 20807
    3  2 20795
    3  2 20793
    3  2 20794
    3  2 20805
    3  2 20800
    3  2 20798
    3  2 20790
    3  2 20814
    3  2 20812
    3  2 20801
    3  2 20792
    3  2 20791
    3  2 20787
    3  2 20803
    3  2 20797
    3  2 20788
    3  2 20806
    3  2 20804
    3  2 20786
    3  3 20817
    3  4 20979
    3  5 20984
    3  5 20987
    3  5 20986
    3  5 20985
    3  5 20988
    3  5 20981
    3  5 20982
    3  5 20989
    3  5 20983
    3  5 20993
    3  5 20994
    3  6 21132
    3  7 21013
    3  8 21069
    3  9 21209
    3 10 21226
    3 10 21222
    3 10 21220
    3 12 21434
    end
    format %td survey_date

  • #2
    In your example data, wave 12 is always the last wave in any person's data. The code below relies on this regularity, and verifies it before proceeding. If this is not true in your full data set, the code will break at the -assert- command. In that case, post back with an example where the final wave is not wave 12 and more general code will be forthcoming.

    Code:
    by person_id (survey_wave), sort: assert survey_wave[_N] == 12
    by person_id (survey_wave): gen wanted = survey_date[_N] - survey_date

    Comment


    • #3
      Thanks Clyde Schechter . You are prescient here, as Wave 12 is not always the last wave in my full dataset--I was thinking about this as I posted, actually. Here is some data where there are waves after Wave 12:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(person_id survey_wave survey_date)
      2 1 20784
      2 2 20811
      2 2 20809
      2 2 20787
      2 2 20810
      2 2 20800
      2 2 20802
      2 2 20788
      2 2 20812
      2 2 20797
      2 2 20793
      2 2 20813
      2 2 20792
      2 2 20808
      2 2 20798
      2 2 20801
      2 2 20806
      2 2 20807
      2 2 20789
      2 2 20786
      2 2 20805
      2 2 20794
      2 2 20790
      2 2 20791
      2 2 20785
      2 2 20799
      2 2 20795
      2 2 20803
      2 2 20804
      2 2 20814
      2 2 20796
      2 3 20819
      2 4 20980
      2 5 20994
      2 5 20988
      2 5 20985
      2 5 20992
      2 5 20984
      2 5 20982
      2 5 20990
      2 5 20991
      2 5 20989
      2 5 20993
      2 5 20981
      2 5 20987
      2 5 20986
      2 5 20983
      2 7 21013
      2 8 21069
      2 12 21449
      2 13 21509
      2 14 21513
      3 1 20784
      3 2 20811
      3 2 20813
      3 2 20799
      3 2 20802
      3 2 20809
      3 2 20785
      3 2 20789
      3 2 20810
      3 2 20807
      3 2 20795
      3 2 20793
      3 2 20794
      3 2 20805
      3 2 20800
      3 2 20798
      3 2 20790
      3 2 20814
      3 2 20812
      3 2 20801
      3 2 20792
      3 2 20791
      3 2 20787
      3 2 20803
      3 2 20797
      3 2 20788
      3 2 20806
      3 2 20804
      3 2 20786
      3 3 20817
      3 4 20979
      3 5 20984
      3 5 20987
      3 5 20986
      3 5 20985
      3 5 20988
      3 5 20981
      3 5 20982
      3 5 20989
      3 5 20983
      3 5 20993
      3 5 20994
      3 6 21132
      3 7 21013
      3 8 21069
      3 9 21209
      3 10 21226
      3 10 21222
      3 10 21220
      3 12 21434
      3 13 21438
      end
      format %td survey_date
      Last edited by Dakota McAvoy; 25 Aug 2023, 20:56.

      Comment


      • #4
        Code:
        by person_id (survey_wave), sort: egen wave_12_date = max(cond(survey_wave == 12, survey_date, .))
        gen wanted = wave_12_date - survey_date if survey_wave < 12
        I assumed that you do not want to calculate the difference from the date of the wave 12 survey for waves that come after 12. However, if I assumed wrong, you can just remove the -if wave < 12- clause from the last command. Do understand that in that situation, the difference will be negative.

        Comment


        • #5
          For more on @Clyde Schechter's method in #4 see Section 9 of https://journals.sagepub.com/doi/pdf...867X1101100210

          The whole paper may be useful as a ragbag of tricks.

          Comment


          • #6
            Thank you to Clyde Schechter and Nick Cox

            Comment

            Working...
            X