Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • matching dates

    Hi,

    I need some help please aligning two different date variables. I have the date people starting training (startgrade) and assessement (outcome) data. I have the time period the assessment covered (axstart). These do not match up.

    I need to match axstart and axend (assessment start and end date) to the appropriate startgrade/ endgrade. Ie. if axstart is 2012 then match to startgrade in 2012. This way I can match the outcome to the appropriate grade. Ideally axstart needs to be wtihing +/-90 of one of the startgrades.

    I hope that makes sense! I tried switching the data to wide but am still missing something.
    I've attached a sample dataset, the actual dataset has >8000 outcomes and around 1500 individual trainees.


    Grateful for any help,
    Thanks,
    Carla

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float(axstart startgrade) long(outcome level) float axend byte grade float endgrade
    1 19242 17806 5 12 19603 3 18176
    1 19604 18177 1 12 19625 4 18540
    1 19626 18541 6 13 19997 5 18904
    1     . 18905 1 11     . 6 19268
    1     . 19269 1 11     . 7 19633
    1     . 19634 1 11     . 8 19997
    end
    format %td axstart
    format %td startgrade
    format %td axend
    format %td endgrade
    label values outcome outcome
    label def outcome 1 "  1.0", modify
    label def outcome 5 "  5.0", modify
    label def outcome 6 "  6.0", modify
    label values level level_at_arcp1
    label def level_at_arcp1 11 "ST6", modify
    label def level_at_arcp1 12 "ST7", modify
    label def level_at_arcp1 13 "ST8", modify
    ------------------ copy up to and including the previous line -

  • #2
    Here is a start for start grades. The procedure is analogous for end grades.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float(axstart startgrade) long(outcome level) float axend byte grade float endgrade
    1 19242 17806 5 12 19603 3 18176
    1 19604 18177 1 12 19625 4 18540
    1 19626 18541 6 13 19997 5 18904
    1     . 18905 1 11     . 6 19268
    1     . 19269 1 11     . 7 19633
    1     . 19634 1 11     . 8 19997
    end
    format %td axstart
    format %td startgrade
    format %td axend
    format %td endgrade
    label values outcome outcome
    label def outcome 1 "  1.0", modify
    label def outcome 5 "  5.0", modify
    label def outcome 6 "  6.0", modify
    label values level level_at_arcp1
    label def level_at_arcp1 11 "ST6", modify
    label def level_at_arcp1 12 "ST7", modify
    label def level_at_arcp1 13 "ST8", modify
    
    frame put id startgrade grade, into(grades)
    frame grades{
        gen start= year(startgrade)
        drop startgrade
        drop if missing(start)
        rename grade start_grade
    }
    gen start= year(axstart)
    frlink m:1 id start, frame(grades)
    frget start_grade, from(grades)
    frame drop grades
    Res.:

    Code:
    . l id axstart startgrade start grade start_grade, sepby(id)
    
         +-------------------------------------------------------+
         | id     axstart   startgr~e   start   grade   start_~e |
         |-------------------------------------------------------|
      1. |  1   06sep2012   01oct2008    2012       3          7 |
      2. |  1   03sep2013   07oct2009    2013       4          8 |
      3. |  1   25sep2013   06oct2010    2013       5          8 |
      4. |  1           .   05oct2011       .       6          . |
      5. |  1           .   03oct2012       .       7          . |
      6. |  1           .   03oct2013       .       8          . |
         +-------------------------------------------------------+

    Comment


    • #3
      thank you very much, I hadn't come across frames before. this works on my example data but when I use the actual dataset, I cannot do the frlink because I get the following error message:

      'invalid match variables for 1:1 or m:1 match
      The variables you specified for matching do not uniquely identify
      the observations in frame grades. Each observation in the current
      frame default must link to one observation in grades.'



      Thanks,
      Carla

      Comment


      • #4
        Then you have cases where "startgrade" is observed more than once in a given year, e.g.,

        Code:
        06oct2010
        12dec2010
        In such cases, what is your rule for matching an "axstart" observation in 2010?

        Comment


        • #5
          Ah I see. In that case the axstart would cover all the observations beginning in the same year. So one (axstart) assessment for the 2 observations (startgrades) occurring within 2010. If that's possible?

          Comment


          • #6
            You can check running the following code:

            Code:
            gen year= year(startgrade)
            bys id startgrade year: gen tag= _N>=2
            l id startgrade grade if tag, sepby(id)

            There is no issue if you have axstart year repeated for an individual, e.g., observation 2 and 3 in #1. The issue is that if you have several grades in the same startyear for an individual, then which one should you use to match the axstart observations in that year? In #3, for the year 2013, the grade was 8, so there is no ambiguity. What if we have 8 and 10? You need to specify a selection rule in such cases. For example, a rule could be the grade that is closest to that axstart date.
            Last edited by Andrew Musau; 27 Jan 2021, 07:30.

            Comment


            • #7
              yes the grade closest to the axstart date would be good. Really appreciate your help with this

              Comment


              • #8
                I have modified your example somewhat to have multiple startgrades in the same year for an individual. What the code does is form all pairwise combinations between axstart and startgrade and then to keep the observation that represents the minimum absolute difference between these two date pairs. The implication is that an axstart can be matched with a startgrade that occurs before or after that date, whichever is closer in terms of number of days. If you want to impose that the match occurs only before or after, then the modification is simple. I think this should do it, but do report if you see any anomalous results.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input long id float(axstart startgrade) long(outcome level) float axend byte grade float endgrade
                1 19242 17806 5 12 19603 3 18176
                1 19604 18177 1 12 19625 4 18540
                1 19626 18541 6 13 19932 5 18904
                1     . 18905 1 11 19941 6 19268
                1     . 19269 1 11     . 7 19633
                1     . 19634 1 11     . 8 19997
                2 19242 17806 5 12 19603 3 18176
                2 19604 19605 1 12 19625 4 18540
                2 19638 19643 6 13 19997 5 18904
                2 19642 19639 1 11 19999 6 19268
                2     . 19269 1 11     . 7 19633
                2     . 19634 1 11     . 8 19997
                end
                format %td axstart
                format %td startgrade
                format %td axend
                format %td endgrade
                label values outcome outcome
                label def outcome 1 "  1.0", modify
                label def outcome 5 "  5.0", modify
                label def outcome 6 "  6.0", modify
                label values level level_at_arcp1
                label def level_at_arcp1 11 "ST6", modify
                label def level_at_arcp1 12 "ST7", modify
                label def level_at_arcp1 13 "ST8", modify
                
                tempfile start
                preserve
                keep id grade startgrade
                rename * *2
                drop if missing(startgrade2)
                rename (id2 startgrade2) (id axstart2)
                save `start'
                restore
                joinby id using `start'
                bys id axstart axstart2: gen diff= abs(axstart - axstart2) if !missing(axstart)
                bys id axstart (diff): drop if _n>1 & !missing(axstart)
                bys id startgrade: keep if _n==1
                replace grade2=. if missing(axstart)
                "diff" represents the absolute difference in days, "axstart2" represents the closest startgrade and "grade2" is your wanted variable below.

                Res.:

                Code:
                . l id axstart startgrade grade axstart2 diff grade2, sepby(id)
                
                     +----------------------------------------------------------------+
                     | id     axstart   startgr~e   grade    axstart2   diff   grade2 |
                     |----------------------------------------------------------------|
                  1. |  1   06sep2012   01oct2008       3   03oct2012     27        7 |
                  2. |  1   03sep2013   07oct2009       4   03oct2013     30        8 |
                  3. |  1   25sep2013   06oct2010       5   03oct2013      8        8 |
                  4. |  1           .   05oct2011       6   07oct2009      .        . |
                  5. |  1           .   03oct2012       7   06oct2010      .        . |
                  6. |  1           .   03oct2013       8   03oct2012      .        . |
                     |----------------------------------------------------------------|
                  7. |  2   06sep2012   01oct2008       3   03oct2012     27        7 |
                  8. |  2           .   03oct2012       7   03oct2012      .        . |
                  9. |  2   03sep2013   04sep2013       4   04sep2013      1        4 |
                 10. |  2           .   03oct2013       8   12oct2013      .        . |
                 11. |  2   11oct2013   08oct2013       6   12oct2013      1        5 |
                 12. |  2   07oct2013   12oct2013       5   08oct2013      1        6 |
                     +----------------------------------------------------------------+

                Comment


                • #9
                  That is brilliant, thanks very much for all your help
                  carla

                  Comment

                  Working...
                  X