Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Date Reformatting

    Hello,

    I have a dataset that looks like this:

    id date1 date2
    10180 3/14/18 1/1/18
    10180 9/3/18 6/5/18
    10180 9/1/18

    Basically, my goal is to find a date2 that is the closest to each of the date1 observations, in this case, 9/3/18 in date1 and 9/1/18 in date2.

    and I want to make it look like this so I can actually calculate the day difference between date1 and date2 and pick the smallest (absolute) value.

    id date1 date2 day_diff
    10180 3/14/18 1/1/18 ..
    10180 3/14/18 6/5/18 ..
    10180 3/14/18 9/1/18 ..
    10180 9/3/18 1/1/18 ..
    10180 9/3/18 6/5/18 ..
    10180 9/3/18 9/1/18 ..

    Bu I do not know of any codes that could help me achieve this transformation. I already tried searching for solutions but I am not even sure of what should I call this problem that I am having.

    Any input is largely appreciated. Thank you!
    Last edited by David Shi; 06 Jul 2023, 14:21.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id str7 date1 str6 date2
    10180 "3/14/18" "1/1/18"
    10180 "9/3/18"  "6/5/18"
    10180 ""        "9/1/18"
    end
    
    //  REPLACE THE STRING VARIABLES THAT READ LIKE DATES WITH ACTUAL STATA DATE VARIABLES
    foreach v of varlist date* {
        gen _`v' = daily(`v', "MD20Y"), after(`v')
        assert missing(_`v') == missing(`v')
        format _`v' %td
        drop `v'
        rename _`v' `v'
    }
    
    // PUT DATE2 INTO A SEPARATE DATA SET
    preserve
    keep id date2
    tempfile second_dates
    save `second_dates'
    
    //    PAIR UP EACH DATE1 WITH EVERY DATE2 FROM THE SAME ID
    restore
    drop date2
    drop if missing(date1)
    joinby id using `second_dates', unmatched(master)
    drop _merge
    
    //    PICK A PAIR WITH MINIMUM DIFFERENCE BETWEEN DATE1 AND DATE2
    //    FOR EACH ID DATE1 PAIR
    gen delta = abs(date2-date1)
    by id date1 (delta), sort: keep if _n == 1
    Your question leaves several important details to the imagination. Because you set up a table of example data rather than using the -dataex- command, we cannot tell if you have real Stata date variables in your data set, or just strings that look like dates but cannot be used in calculation. The code above assumes the latter. Skip over the conversion if that assumption is wrong. We also cannot tell which century your dates are from. The code assumes the current century. You have an id variable in your tableau, but you do not state whether you only want to consider date1-date2 combinations from the same id, or if id is just irrelevant for present purposes. The code assumes the former. Finally, it is possible in your full data set that there will be two date2 values that are equally far from date1, but in opposite directions. For example, we could have date1 = 6 July 2023, and date2 = 4 July 2023 and another date2 = 8 July 2023. Both are two days apart from date1. You don't say which one you would want to pick. The code assumes you truly don't care and picks one at random (and irreproducibly on rerunning the code).

    Finally, either I am seriously misunderstanding what you want to do, or there is an error in your example data. In your end result tableau, you show 9/1/18 as one of the values of date2. But in the first tableau it is a value of date1 instead. In the code above, I structured the input to have it as a value of date2, not date1.

    Some of the unanswered questions in your post would have been answered had you used the -dataex- command to post your example data. In the future, please always do that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    Last edited by Clyde Schechter; 06 Jul 2023, 15:02.

    Comment

    Working...
    X