Hi Statalist,
I'm currently merging 72 datasets together using the following loop:
This works fine but I have one issue. The contents of these 72 files is produced by the loop you see here:
So when this loop creates a particular prod_`i'`j' variable it names it after the combination of the two numbers represented by i and j. this means when multiplying "ever_1" and "ever_12", for example, it produces a variable called "prod_112". Unfortunately, when it multiplies "ever_11" and "ever_2" it ALSO produces a variable called "prod_112". These two variables with the same name are distinct from one another with different values.
In the individual 72 dta files that this process produces, that isn't an issue, since the loop basically produces 72 files where each is one value of i (for example, 1) paired with every value of j (1-72). But since I need to merge them into a master file, Stata prioritizes the values of the "master" file in the merging process and includes only one of the ambiguously named variables.
I would like to achieve one of the following things: either get Stata to keep and rename all ambiguous variables during the merging process (the first loop) or, even better, find a way to create the variables in a way that avoids the problem in the first place (the second loop).
Ideally I would like to have a leading zero on the single digit portions of the 1/72 values so that, in my example above, I would be be creating "prod0112" and "prod1102", thus avoiding the whole issue. But I do not know how to (or if i can) force the loop to use leading zeroes.
Obviously I can manually rename the variables as a I go, which I will do if I have to, but I'm hoping for a better and less time consuming solution! Thank you!
-Andy
I'm currently merging 72 datasets together using the following loop:
Code:
clear use "S:\elzayaty\Mobility Paper\prodclean1.dta", clear forvalues i = 1/72 { merge m:m investorid using "S:\elzayaty\Mobility Paper\prodclean`i'.dta" drop _merge }
Code:
clear* use "S:\elzayaty\Mobility Paper\NAICS thing with industry pairings.dta", clear forvalues i = 1/72 { capture preserve if !inlist(c(rc), 0, 621) { display as error "Unexpected error preserving data set" exit c(rc) } forvalues j = 1/72 { gen prod_`i'`j' = ever_`i'*ever_`j' } keep investorid prod* save "S:\elzayaty\Mobility paper\prod`i'.dta", replace restore, preserve }
In the individual 72 dta files that this process produces, that isn't an issue, since the loop basically produces 72 files where each is one value of i (for example, 1) paired with every value of j (1-72). But since I need to merge them into a master file, Stata prioritizes the values of the "master" file in the merging process and includes only one of the ambiguously named variables.
I would like to achieve one of the following things: either get Stata to keep and rename all ambiguous variables during the merging process (the first loop) or, even better, find a way to create the variables in a way that avoids the problem in the first place (the second loop).
Ideally I would like to have a leading zero on the single digit portions of the 1/72 values so that, in my example above, I would be be creating "prod0112" and "prod1102", thus avoiding the whole issue. But I do not know how to (or if i can) force the loop to use leading zeroes.
Obviously I can manually rename the variables as a I go, which I will do if I have to, but I'm hoping for a better and less time consuming solution! Thank you!
-Andy
Comment