Hi. I have MEPS data files ranging from 1996-2016. I want to run the following code on the files for 2007-2015. The code involves cleaning the data, merging it with a file named “MMS_WAC_data”, performing a bunch of data manipulations and saving multiple subsets of the data along the way before the final cleaned dataset. The code below is only for the 2007 dataset. Instead of replicating it independently for each of the files from 2007-2015, I know that it is possible to write a program to have one code file that runs across each of these files. I have never written a program before and all my attempts so far have failed. What I want to do which needs to be changed in the code below is:
- import the MEPS files from 2007-2015
- some of the variable names in each of the files from 2007-2015 have the year as the suffix – eg rxsf07x rxmr07x rxmd07x. Ensure that the right variables in each year file is retained.
- run the same code on each of them, and in each of the subsets of the data saved change “2007” to the relevant year name
Code:
set more off clear all capture log close set matsize 10000 set maxvar 10000 * set graph parameters set scheme s1mono graph set window fontface "Palatino Linotype" global blue "75 156 211" global orange "211 122 75" global green "76 184 72" *** MAIN DIRECTORY: EDIT THIS LINE TO RUN THE CODE AS A DIFFERENT USER *** global db "C:\Users\ricks\Dropbox" * global directories global meps "${db}\MEPS" global drug "${db}\UNC\Dissertation\List Price" global output "${db}\UNC\Dissertation\List Price\Cleaned Data" use "${meps}\MEPS Prescription Raw\P2007.dta" *keeping only the variables that matter - id, weight, payment vars & drug code and name keep DUPERSID RXNDC PERWT07F RXSF07X RXMR07X RXMD07X RXPV07X RXXP07X RXNAME *converting var names to lowercase rename *, lower //Data is in long form by id and drug code - collapsing on drug code condl on id //calculating totals and averages by id and drug code collapse (sum) rxsf07x rxmr07x rxmd07x rxpv07x rxxp07x (mean) rxsf07x_mean=rxsf07x /// rxmr07x_mean=rxmr07x rxmd07x_mean=rxmd07x rxpv07x_mean=rxpv07x rxxp07x_mean=rxxp07x /// (first) perwt07f rxname, by (dupersid rxndc) gen year = 2007 //to merge with MMS data //rename rxname Product gen id2=_n drop if rxname=="-9" //drop all missing product name obs drop if real(substr(rxname,1,1))<. //drop all obs w product names starting with a number save "${output}\P2007_dummy.dta", replace *************Combinig with the MMS dataset*********** /*The code below identifies drug names that have similarities e.g. "ALLERX"/"ALLERX (AM/PM DOSE PACK 30)" It provides a similarity score - similscore - ranging between 0-1 as a measure of the overlap */ matchit id2 rxname using "${drug}\MMS_WAC_data.dta", idu(id1) txtu(Product) //1- matchit first by the most relevant pair of columns - drug name // 2- bring back the other columns joinby id1 using "${drug}\MMS_WAC_data.dta" joinby id2 using "${output}\P2007_dummy.dta" save "${output}\2007+MMS.dta", replace clear use "${output}\2007+MMS.dta", replace //year1 is the year var in the MMS dataset. Restricting to 2007 meds keep if year1==2007 gen keep1=1 if similscore==1 gen keep7=1 if similscore>=0.7 & similscore<=1 /*Match observations if: 1. rxname is contained in Product, OR 2. Product is contained in rxname (e.g. this would match TRICOR with TRICOR / TRILIPIX) Match on the first word of rxname and Product (e.g. this would match DIOVAN W/HCTZ with DIOVAN / HCT */ //3 - rxname is contained in Product or Product is contained in rxname gen byte either_contains = (strpos(Product, rxname) > 0) | (strpos(rxname, Product) > 0) /*Alternate gen overlap1=strpos(rxname, Product) > 0 gen overlap2=strpos(Product, rxname) > 0 //Product is contained in rxname gen overlap=1 if overlap1==1 | overlap2==1 //single var combining the above 2 */ //4- Match on the first word of rxname and Product gen rxname1 = rxname replace rxname1 = subinstr(rxname1, ",", " ",.) replace rxname1 = subinstr(rxname1, "/", " ",.) replace rxname1 = subinstr(rxname1, "(", " ",.) replace rxname1 = subinstr(rxname1, ")", " ",.) replace rxname1 = subinstr(rxname1, "-", " ",.) gen Product1 = Product replace Product1 = subinstr(Product1, ",", " ",.) replace Product1 = subinstr(Product1, "/", " ",.) replace Product1 = subinstr(Product1, "(", " ",.) replace Product1 = subinstr(Product1, ")", " ",.) replace Product1 = subinstr(Product1, "-", " ",.) //removing special characters that mess up the fist name match gen byte firstsame = lower(word(rxname1,1)) == lower(word(Product1, 1)) preserve keep if keep7==1 export excel rxname rxname1 Product Product1 keep1 keep7 either_contains /// firstsame similscore using "${drug}\EXported checks\MMS_MEPS_2007.xls", replace restore /*Next: limit to >.7 pull into a csv and look to see. can delete dups. Delete useless entries. Merge into main file. And keep only merged ones. */ keep if keep1==1 | keep7==1 | either_contains==1 | firstsame==1 save "${output}\2007_clean.dta", replace