Merging when two distinct variables have the same name

Andy El-Zayaty

Join Date: Jun 2019

Posts: 17
#1

Merging when two distinct variables have the same name

12 Jul 2019, 15:30

Hi Statalist,

I'm currently merging 72 datasets together using the following loop:

Code:

clear use "S:\elzayaty\Mobility Paper\prodclean1.dta", clear forvalues i = 1/72 { merge m:m investorid using "S:\elzayaty\Mobility Paper\prodclean`i'.dta" drop _merge }

This works fine but I have one issue. The contents of these 72 files is produced by the loop you see here:

Code:

clear* use "S:\elzayaty\Mobility Paper\NAICS thing with industry pairings.dta", clear forvalues i = 1/72 { capture preserve if !inlist(c(rc), 0, 621) { display as error "Unexpected error preserving data set" exit c(rc) } forvalues j = 1/72 { gen prod_`i'`j' = ever_`i'*ever_`j' } keep investorid prod* save "S:\elzayaty\Mobility paper\prod`i'.dta", replace restore, preserve }

So when this loop creates a particular prod_`i'`j' variable it names it after the combination of the two numbers represented by i and j. this means when multiplying "ever_1" and "ever_12", for example, it produces a variable called "prod_112". Unfortunately, when it multiplies "ever_11" and "ever_2" it ALSO produces a variable called "prod_112". These two variables with the same name are distinct from one another with different values.

In the individual 72 dta files that this process produces, that isn't an issue, since the loop basically produces 72 files where each is one value of i (for example, 1) paired with every value of j (1-72). But since I need to merge them into a master file, Stata prioritizes the values of the "master" file in the merging process and includes only one of the ambiguously named variables.

I would like to achieve one of the following things: either get Stata to keep and rename all ambiguous variables during the merging process (the first loop) or, even better, find a way to create the variables in a way that avoids the problem in the first place (the second loop).

Ideally I would like to have a leading zero on the single digit portions of the 1/72 values so that, in my example above, I would be be creating "prod0112" and "prod1102", thus avoiding the whole issue. But I do not know how to (or if i can) force the loop to use leading zeroes.

Obviously I can manually rename the variables as a I go, which I will do if I have to, but I'm hoping for a better and less time consuming solution! Thank you!

-Andy
Tags: None
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#2

12 Jul 2019, 16:44

Why not just change how you name the variable? Something like prod_`i'x`j' would get you distinct variables, right?

There are ways to add the leading zero but unless there's a compelling reason to do it that way it's probably going to be easier to read both your code and the resulting variable names if you just change the naming convention.
1 like
Comment
Andy El-Zayaty

Join Date: Jun 2019

Posts: 17
#3

12 Jul 2019, 16:52

Sarah - this is one of those moments where you just have to shake your head...sometimes you get so set in your ways you overlook the simple solution! I'm a bit embarrassed, but thank you, that works wonderfully!
Comment

Announcement

Merging when two distinct variables have the same name

Comment

Comment