Hi guys,
I have a problem when matching my sample using "joinby"
I have two data sets, one is the case data set and the other is control data set, they all have the same variables,
and I want to find some matches(can be more than one) for each individual in case data set.
I searched on statalist and there are several thread about this topic, I chose to use "joinby" given what I want to do .
but the problem is when I use "joinby", STATA get extremely slow, even stuck, and two files I use both smaller than 50MB,
WHEN I USE THE WHOLE DATASET, each CONTAINS 42 variables and no more than 10000 observation, STATA give me this:
then I only keep four variables I need in both data sets, and stata get kind of stuck,
below is a sketch of my data and code
any suggestions?
I have a problem when matching my sample using "joinby"
I have two data sets, one is the case data set and the other is control data set, they all have the same variables,
and I want to find some matches(can be more than one) for each individual in case data set.
I searched on statalist and there are several thread about this topic, I chose to use "joinby" given what I want to do .
but the problem is when I use "joinby", STATA get extremely slow, even stuck, and two files I use both smaller than 50MB,
WHEN I USE THE WHOLE DATASET, each CONTAINS 42 variables and no more than 10000 observation, STATA give me this:
Code:
op. sys. refuses to provide memory
below is a sketch of my data and code
Code:
* Example generated by -dataex-. To install: ssc install dataex this a sketch from case data set (privatize-sample1215) clear input str9 firmid float industrycode2 double inventory float IR "001584837" 2600 647 .09568175 "001584837" 2600 1037 .1985069 "001584837" 2600 1990 .0987593 "001584837" 2600 3570 .04087942 "001596563" 1300 1200 .14754704 "001596563" 1300 3942 .07206186 "001596563" 1300 14566 .20515493 "001596563" 1300 4287 .0958846 "001596563" 1300 14287 .906864 "001596563" 1300 1206 .08642064 "001596694" 1500 13239 .9516929 "001596694" 1500 15056 .24014674 "001596694" 1500 16560 .4221582 "001596694" 1500 17328 .5035599 "001596694" 1500 14320 .5021566 "00159885X" 1500 101 .01288594 "00159885X" 1500 372 .015686937 "00159885X" 1500 517 .016111441 "00159885X" 1500 2457 .08355153 "00159885X" 1500 0 .0025296365 end
Code:
* Example generated by -dataex-. To install: ssc install dataex this a sketch from control data set (privatize-control1009) clear input str9 firmid float industrycode2 double inventory float IR "016016226" 3100 1034 .6454432 "016016226" 3100 1012 1.1269488 "016016226" 3100 896 .688701 "016016226" 3100 653 .449415 "016016226" 3100 1685 .6443595 "016016226" 3100 2036 .7181658 "100003401" 3300 1091 .4313958 "100003401" 3300 945 .2691541 "100003401" 3300 1134 .2837838 "100003401" 3300 1470 .23374145 "101109750" 3900 2608 .9709606 "101109750" 3900 1952 .4393428 "101109750" 3900 2806 .54034275 "101109750" 3900 4310 .611521 "101131499" 4100 296 .186398 "101131499" 4100 228 .11149144 "101131499" 4100 220 .15182884 "101131499" 4100 0 .0025296365 "101133400" 3600 450 .11792453 "101133400" 3600 507 .26120555 end
Code:
clear use "E:\Research\privatization\privatize-sample1215.dta" joinby industrycode2 using "E:\Research\privatization\privatize-control1009.dta"
Comment