matching dates

Carla Hope

Join Date: Nov 2018

Posts: 25
#1

matching dates

26 Jan 2021, 11:37

Hi,

I need some help please aligning two different date variables. I have the date people starting training (startgrade) and assessement (outcome) data. I have the time period the assessment covered (axstart). These do not match up.

I need to match axstart and axend (assessment start and end date) to the appropriate startgrade/ endgrade. Ie. if axstart is 2012 then match to startgrade in 2012. This way I can match the outcome to the appropriate grade. Ideally axstart needs to be wtihing +/-90 of one of the startgrades.

I hope that makes sense! I tried switching the data to wide but am still missing something.
I've attached a sample dataset, the actual dataset has >8000 outcomes and around 1500 individual trainees.

Grateful for any help,
Thanks,
Carla

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long id float(axstart startgrade) long(outcome level) float axend byte grade float endgrade 1 19242 17806 5 12 19603 3 18176 1 19604 18177 1 12 19625 4 18540 1 19626 18541 6 13 19997 5 18904 1 . 18905 1 11 . 6 19268 1 . 19269 1 11 . 7 19633 1 . 19634 1 11 . 8 19997 end format %td axstart format %td startgrade format %td axend format %td endgrade label values outcome outcome label def outcome 1 " 1.0", modify label def outcome 5 " 5.0", modify label def outcome 6 " 6.0", modify label values level level_at_arcp1 label def level_at_arcp1 11 "ST6", modify label def level_at_arcp1 12 "ST7", modify label def level_at_arcp1 13 "ST8", modify

------------------ copy up to and including the previous line -
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10190

26 Jan 2021, 16:29

Here is a start for start grades. The procedure is analogous for end grades.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id float(axstart startgrade) long(outcome level) float axend byte grade float endgrade
1 19242 17806 5 12 19603 3 18176
1 19604 18177 1 12 19625 4 18540
1 19626 18541 6 13 19997 5 18904
1     . 18905 1 11     . 6 19268
1     . 19269 1 11     . 7 19633
1     . 19634 1 11     . 8 19997
end
format %td axstart
format %td startgrade
format %td axend
format %td endgrade
label values outcome outcome
label def outcome 1 "  1.0", modify
label def outcome 5 "  5.0", modify
label def outcome 6 "  6.0", modify
label values level level_at_arcp1
label def level_at_arcp1 11 "ST6", modify
label def level_at_arcp1 12 "ST7", modify
label def level_at_arcp1 13 "ST8", modify

frame put id startgrade grade, into(grades)
frame grades{
    gen start= year(startgrade)
    drop startgrade
    drop if missing(start)
    rename grade start_grade
}
gen start= year(axstart)
frlink m:1 id start, frame(grades)
frget start_grade, from(grades)
frame drop grades

Res.:

Code:

. l id axstart startgrade start grade start_grade, sepby(id)

     +-------------------------------------------------------+
     | id     axstart   startgr~e   start   grade   start_~e |
     |-------------------------------------------------------|
  1. |  1   06sep2012   01oct2008    2012       3          7 |
  2. |  1   03sep2013   07oct2009    2013       4          8 |
  3. |  1   25sep2013   06oct2010    2013       5          8 |
  4. |  1           .   05oct2011       .       6          . |
  5. |  1           .   03oct2012       .       7          . |
  6. |  1           .   03oct2013       .       8          . |
     +-------------------------------------------------------+

Comment

Carla Hope

Join Date: Nov 2018

Posts: 25
#3

27 Jan 2021, 04:57

thank you very much, I hadn't come across frames before. this works on my example data but when I use the actual dataset, I cannot do the frlink because I get the following error message:

'invalid match variables for 1:1 or m:1 match
The variables you specified for matching do not uniquely identify
the observations in frame grades. Each observation in the current
frame default must link to one observation in grades.'

Thanks,
Carla
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#4

27 Jan 2021, 05:19

Then you have cases where "startgrade" is observed more than once in a given year, e.g.,

Code:

06oct2010 12dec2010

In such cases, what is your rule for matching an "axstart" observation in 2010?
Comment
Carla Hope

Join Date: Nov 2018

Posts: 25
#5

27 Jan 2021, 07:01

Ah I see. In that case the axstart would cover all the observations beginning in the same year. So one (axstart) assessment for the 2 observations (startgrades) occurring within 2010. If that's possible?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#6

27 Jan 2021, 07:27

You can check running the following code:

Code:

gen year= year(startgrade) bys id startgrade year: gen tag= _N>=2 l id startgrade grade if tag, sepby(id)

There is no issue if you have axstart year repeated for an individual, e.g., observation 2 and 3 in #1. The issue is that if you have several grades in the same startyear for an individual, then which one should you use to match the axstart observations in that year? In #3, for the year 2013, the grade was 8, so there is no ambiguity. What if we have 8 and 10? You need to specify a selection rule in such cases. For example, a rule could be the grade that is closest to that axstart date.

Last edited by Andrew Musau; 27 Jan 2021, 07:30.
Comment
Carla Hope

Join Date: Nov 2018

Posts: 25
#7

27 Jan 2021, 07:44

yes the grade closest to the axstart date would be good. Really appreciate your help with this
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

27 Jan 2021, 10:24

I have modified your example somewhat to have multiple startgrades in the same year for an individual. What the code does is form all pairwise combinations between axstart and startgrade and then to keep the observation that represents the minimum absolute difference between these two date pairs. The implication is that an axstart can be matched with a startgrade that occurs before or after that date, whichever is closer in terms of number of days. If you want to impose that the match occurs only before or after, then the modification is simple. I think this should do it, but do report if you see any anomalous results.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id float(axstart startgrade) long(outcome level) float axend byte grade float endgrade
1 19242 17806 5 12 19603 3 18176
1 19604 18177 1 12 19625 4 18540
1 19626 18541 6 13 19932 5 18904
1     . 18905 1 11 19941 6 19268
1     . 19269 1 11     . 7 19633
1     . 19634 1 11     . 8 19997
2 19242 17806 5 12 19603 3 18176
2 19604 19605 1 12 19625 4 18540
2 19638 19643 6 13 19997 5 18904
2 19642 19639 1 11 19999 6 19268
2     . 19269 1 11     . 7 19633
2     . 19634 1 11     . 8 19997
end
format %td axstart
format %td startgrade
format %td axend
format %td endgrade
label values outcome outcome
label def outcome 1 "  1.0", modify
label def outcome 5 "  5.0", modify
label def outcome 6 "  6.0", modify
label values level level_at_arcp1
label def level_at_arcp1 11 "ST6", modify
label def level_at_arcp1 12 "ST7", modify
label def level_at_arcp1 13 "ST8", modify

tempfile start
preserve
keep id grade startgrade
rename * *2
drop if missing(startgrade2)
rename (id2 startgrade2) (id axstart2)
save `start'
restore
joinby id using `start'
bys id axstart axstart2: gen diff= abs(axstart - axstart2) if !missing(axstart)
bys id axstart (diff): drop if _n>1 & !missing(axstart)
bys id startgrade: keep if _n==1
replace grade2=. if missing(axstart)

"diff" represents the absolute difference in days, "axstart2" represents the closest startgrade and "grade2" is your wanted variable below.

Res.:

Code:

. l id axstart startgrade grade axstart2 diff grade2, sepby(id)

     +----------------------------------------------------------------+
     | id     axstart   startgr~e   grade    axstart2   diff   grade2 |
     |----------------------------------------------------------------|
  1. |  1   06sep2012   01oct2008       3   03oct2012     27        7 |
  2. |  1   03sep2013   07oct2009       4   03oct2013     30        8 |
  3. |  1   25sep2013   06oct2010       5   03oct2013      8        8 |
  4. |  1           .   05oct2011       6   07oct2009      .        . |
  5. |  1           .   03oct2012       7   06oct2010      .        . |
  6. |  1           .   03oct2013       8   03oct2012      .        . |
     |----------------------------------------------------------------|
  7. |  2   06sep2012   01oct2008       3   03oct2012     27        7 |
  8. |  2           .   03oct2012       7   03oct2012      .        . |
  9. |  2   03sep2013   04sep2013       4   04sep2013      1        4 |
 10. |  2           .   03oct2013       8   12oct2013      .        . |
 11. |  2   11oct2013   08oct2013       6   12oct2013      1        5 |
 12. |  2   07oct2013   12oct2013       5   08oct2013      1        6 |
     +----------------------------------------------------------------+

Comment

Carla Hope

Join Date: Nov 2018

Posts: 25
#9

29 Jan 2021, 06:49

That is brilliant, thanks very much for all your help
carla
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment