Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to join parts of a data to another data based on some conditions?

    Hi all,

    Please consider the following example data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 id float year str1 pcd float school
    "111" 2011 "" 123
    "111" 2012 "" 124
    "111" 2013 "" 124
    "112" 2010 "" 123
    "112" 2011 "" 123
    "112" 2012 "" 123
    "113" 2014 "" 126
    "113" 2015 "" 127
    "113" 2016 "" 128
    "113" 2017 "" 128
    end
    save "trialdata1.dta"
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 id float year str3 pcd float school
    "111" 2011 ""    123
    "111" 2012 ""    124
    "111" 2013 ""    124
    "112" 2010 ""    123
    "112" 2011 ""    123
    "112" 2012 ""    123
    "113" 2014 ""    126
    "113" 2015 ""    127
    "113" 2016 ""    128
    "113" 2017 ""    128
    "114" 2013 "AAA" 123
    "114" 2014 "AAB" 128
    "114" 2015 "AAA" 124
    "115" 2014 "AAA" 124
    "115" 2015 "AAC" 127
    "115" 2016 "AAB" 128
    "115" 2017 "AAC" 128
    "116" 2015 "AAC" 127
    "116" 2016 "AAA" 124
    "116" 2017 "AAB" 126
    end
    save "trialdata2.dta"
    Now, I want to join parts of trialdata2 to trialdata1, based on the following condition: keep only those observations from trialdata2 that have the year-school combination available in trialdata1, and then append those observations to trialdata1 to form trialdata3 as below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 id float year str3 pcd float school
    "115" 2015 "AAC" 127
    "115" 2016 "AAB" 128
    "115" 2017 "AAC" 128
    "116" 2015 "AAC" 127
    end
    I generated the trialdata3 by
    Code:
    keep if year==2011 & school==123|year==2012 & school==124|year==2013 & school==124|year==2010 & school==123|year==2011 & school==123|year==2012 & school==123|year==2014 & school==126|year==2015 & school==127|year==2016 & school==128|year==2017 & school==128
    But ideally I want the code to select these observations from trialdata2 based on the combination from trialdata1, and create the subset data, which I then append to trialdata1 and create trialdata3.

    I tried
    Code:
    joinby  year school
    but that did not give desired result.

    Would appreciate any help, thanks



  • #2
    Hi Titir Bhattacharya ,

    I'm not sure I fully understand your aim. In Stata terms, append means 'add observations to the dataset'. Your resulting trialdata3 does not seem to reflect trialdata1 with added observations. Do you rather want to merge additional variables to the existing dataset? Or do you want to create an intersection of the observations in both datasets?

    In case you really want to append observations, here's an approach that does this:
    Code:
    **# declare temporary files
    tempfile trialdata1 trialdata2 trialdata3
    
    **# given trialdata 1
    clear
    input str3 id float year str1 pcd float school
    "111" 2011 "" 123
    "111" 2012 "" 124
    "111" 2013 "" 124
    "112" 2010 "" 123
    "112" 2011 "" 123
    "112" 2012 "" 123
    "113" 2014 "" 126
    "113" 2015 "" 127
    "113" 2016 "" 128
    "113" 2017 "" 128
    end
    save "`trialdata1'"
    
    **# given trialdata 2
    clear
    input str3 id float year str3 pcd float school
    "111" 2011 ""    123
    "111" 2012 ""    124
    "111" 2013 ""    124
    "112" 2010 ""    123
    "112" 2011 ""    123
    "112" 2012 ""    123
    "113" 2014 ""    126
    "113" 2015 ""    127
    "113" 2016 ""    128
    "113" 2017 ""    128
    "114" 2013 "AAA" 123
    "114" 2014 "AAB" 128
    "114" 2015 "AAA" 124
    "115" 2014 "AAA" 124
    "115" 2015 "AAC" 127
    "115" 2016 "AAB" 128
    "115" 2017 "AAC" 128
    "116" 2015 "AAC" 127
    "116" 2016 "AAA" 124
    "116" 2017 "AAB" 126
    end
    save "`trialdata2'"
    
    **# auto-create trialdata 3
    clear
    use year school using `"`trialdata1'"'
    duplicates drop
    merge 1:m year school using `"`trialdata2'"', keep(match) nogenerate
    save "`trialdata3'"
    
    **# append trialdata 3 to trialdata 1
    clear
    append using `trialdata1' `trialdata3'
    
    **# beware of duplicates
    **  some observations are present in both trialdata 1 and trialdata 2
    duplicates drop
    
    **# let's see the result
    list
    The 'tricky' part is to create a mini-dataset that just contains year and school from trialdata1, and make all combinations unique. This can be used to merge with trialdata2, only keeping the matching observations. The result can be added to trialdata1.

    All the best
    Bela

    Comment

    Working...
    X