Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging many datasets - gen newvar if variable was numeric in one set and string in the other

    Hello! This probably isn't the most efficient way to merge multiple databases (suggestions are welcome), however my dilemma is Stata treating my "nonmissing conflict _merge==5" values as missing when I use the replace and force options when merging.

    Context:
    I'm working with health system data that changes over time, in both added and dropped patients, as well as status changes over time (i.e. "most recent pain assessment rating"). The example code below runs through roughly 25 datasets, each with 3000 observations and 30 variables. For variables like the pain rating above, it is sometimes coded as a "3" on a Likert-type scale or as "Moderate" pain; therefore some databases are numeric/double while others are string/str3. I'm looking for a way to generate a new var in the middle of the merge process, something like "gen newvar = oldvar if _merge==5" that way the values of the oldvar are not overwritten/turned into missing data and the newvar values are retained in their own varlist.

    Code:
    use "DatabaseA.dta"
    merge 1:1 studyId using "DatabaseB.dta", replace update force
    drop _merge
    save "DatabaseAB.dta"
    clear
    
    use "DatabaseAB.dta"
    merge 1:1 studyId using "DatabaseC.dta", replace update force
    drop _merge
    save "DatabaseABC.dta"
    clear
    
    use "DatabaseABC.dta"
    merge 1:1 studyId using "DatabaseD.dta", replace update force
    drop _merge
    save "DatabaseABCD.dta"
    clear
    This post seems similar to what I'm asking, but was not quite resolved:
    https://www.statalist.org/forums/for...udinal-dataset

    Thanks!

  • #2
    Hi! I don't know if I got your problem right already. Did you try to convert your string variables to numerics already? Would that solve your problem?

    Comment

    Working...
    X