How to merge multiple data sets without id variable?

Chul-Kyoo Jung

Join Date: Oct 2018
Posts: 118

How to merge multiple data sets without id variable?

12 Nov 2021, 14:03

Hi, I want to merge multiple cross-sectional micro data files for each year (1992-2003) to make a single unbalanced panel on my director such as "D:\mdis2\firm.1992.dta".
All the data sets are identical in terms of variables.
Only problem is there is no identifier so I want to create a unique id using locations, sectors and starting_day.
Is there any way to merge all those files after creating a new id variable efficiently?

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
 
input float year byte location1 int location2 byte location3 str1 sector1 byte sector2 float starting_day byte v7 int v8 byte(v9 v10)
1992 11 10 11 "D" 19    . . . . .
1992 11 10 12 "D" 15    . . . . .
1992 11 10 12 "D" 17    . . . . .
1992 11 10 12 "D" 22    . . . . .
1992 11 10 13 "D" 17    . . . . .
1992 11 10 13 "D" 18    . . . . .
1992 11 10 13 "D" 22 5706 1 0 0 0
1992 11 10 13 "D" 22 6010 1 0 0 0
1992 11 10 13 "D" 22 6205 1 0 0 0
1992 11 10 13 "D" 22 6606 1 0 0 0
1992 11 10 13 "D" 22 6906 1 0 0 0
1992 11 10 13 "D" 22 7105 1 0 0 0
1992 11 10 13 "D" 22 7209 1 0 0 0
1992 11 10 13 "D" 22 7304 1 0 0 0
1992 11 10 13 "D" 22 7305 1 0 0 0
1992 11 10 13 "D" 22 7601 3 1 1 0
1992 11 10 13 "D" 22 8101 1 0 0 0
1992 11 10 13 "D" 22 8601 3 1 1 0
1992 11 10 13 "D" 22 8607 3 1 0 1
1992 11 10 13 "D" 22 8808 3 1 1 0
1992 11 10 13 "D" 22 8903 1 0 0 0
1992 11 10 13 "D" 22 9103 3 1 1 0
1992 11 10 13 "D" 22 9105 1 0 0 0
1992 11 10 13 "D" 22 9106 1 0 0 0
1992 11 10 13 "D" 36    . . . . .
1992 11 10 14 "D" 22 6302 3 1 1 0
1992 11 10 14 "D" 22 7804 3 1 1 0
1992 11 10 14 "D" 22 7910 3 1 1 0
1992 11 10 14 "D" 22 8712 3 1 1 0
1992 11 10 14 "D" 22 8906 3 1 1 0
1992 11 10 14 "D" 22 8909 3 1 0 1
1992 11 10 14 "D" 22 9104 1 0 0 0
1992 11 10 14 "D" 32    . . . . .
1992 11 10 14 "D" 33    . . . . .
1992 11 10 14 "D" 36    . . . . .
1992 11 10 15 "D" 17    . . . . .
1992 11 10 15 "D" 21    . . . . .
1992 11 10 15 "D" 22    . . . . .
1992 11 10 15 "D" 22    . . . . .
1992 11 10 16 "D" 17    . . . . .
1992 11 10 16 "D" 18    . . . . .
1992 11 10 16 "D" 18    . . . . .
1992 11 10 16 "D" 21    . . . . .
1992 11 10 16 "D" 22    . . . . .
1992 11 10 16 "D" 24    . . . . .
1992 11 10 16 "D" 30    . . . . .
1992 11 10 17 "D" 22 6807 3 1 1 0
1992 11 10 17 "D" 22 8003 3 1 1 0
1992 11 10 17 "D" 22 8211 3 1 1 0
1992 11 10 17 "D" 22 8802 3 1 1 0
1992 11 10 18 "D" 19    . . . . .
1992 11 10 18 "D" 22 5211 3 1 1 0
1992 11 10 18 "D" 22 6503 3 1 1 0
1992 11 10 18 "D" 22 7202 3 1 1 0
1992 11 10 18 "D" 22 7204 3 2 2 0
1992 11 10 18 "D" 22 7507 1 0 0 0
1992 11 10 18 "D" 22 7508 3 1 1 0
1992 11 10 18 "D" 22 7708 3 1 1 0
1992 11 10 18 "D" 22 7709 3 1 1 0
1992 11 10 18 "D" 22 7809 3 1 1 0
1992 11 10 18 "D" 22 7902 3 1 1 0
1992 11 10 18 "D" 22 8005 3 1 1 0
1992 11 10 18 "D" 22 8007 3 1 1 0
1992 11 10 18 "D" 22 8403 3 1 1 0
1992 11 10 18 "D" 22 8504 3 1 1 0
1992 11 10 18 "D" 22 9108 3 1 1 0
1992 11 10 18 "D" 22 9212 3 1 1 0
1992 11 10 18 "D" 24    . . . . .
1992 11 10 18 "D" 25    . . . . .
1992 11 10 18 "D" 28    . . . . .
1992 11 10 18 "D" 29    . . . . .
1992 11 10 18 "D" 32    . . . . .
1992 11 10 18 "D" 36    . . . . .
1992 11 10 19 "D" 17    . . . . .
1992 11 10 19 "D" 22  808 2 0 0 0
1992 11 10 19 "D" 22 5305 2 0 0 0
1992 11 10 19 "D" 22 5611 3 1 1 0
1992 11 10 19 "D" 22 6404 3 1 1 0
1992 11 10 19 "D" 22 7107 3 1 1 0
1992 11 10 19 "D" 22 8104 1 0 0 0
1992 11 10 19 "D" 22 8306 3 1 1 0
1992 11 10 19 "D" 22 8608 3 1 1 0
1992 11 10 19 "D" 22 8809 3 1 1 0
1992 11 10 19 "D" 22 8902 1 0 0 0
1992 11 10 19 "D" 28    . . . . .
1992 11 10 19 "D" 28    . . . . .
1992 11 10 19 "D" 29    . . . . .
1992 11 10 20 "D" 22    . . . . .
1992 11 10 21 "D" 15    . . . . .
1992 11 10 21 "D" 17    . . . . .
1992 11 10 21 "D" 17    . . . . .
1992 11 10 21 "D" 18 6901 3 1 0 1
1992 11 10 21 "D" 18 8404 3 1 0 1
1992 11 10 21 "D" 18 9004 1 0 0 0
1992 11 10 21 "D" 22 2606 2 0 0 0
1992 11 10 21 "D" 22 4512 1 0 0 0
1992 11 10 21 "D" 22 5011 3 1 1 0
1992 11 10 21 "D" 22 5309 3 1 1 0
1992 11 10 21 "D" 22 5406 1 0 0 0
1992 11 10 21 "D" 22 5409 2 0 0 0
end

Last edited by Chul-Kyoo Jung; 12 Nov 2021, 14:06.

Tags: None

Øyvind Snilsberg

Join Date: Oct 2021
Posts: 591

12 Nov 2021, 14:17

although I recommend you append the files,

Code:

forv j = 1992 / 2003 {
u firm`j' ,clear
egen id = group(starting_day location* sector*)    
tempfile firm`j'
save `firm`j''    
}

u `firm1992'
forv k = 1993 / 2003 {
merge 1:1 id using `firm`k'' ,nogen
}

Comment

Chul-Kyoo Jung

Join Date: Oct 2018

Posts: 118
#3

12 Nov 2021, 14:21

@Sinlsberg, Thank you. You are totally right, I was totally forgetful about append. Let me try append. Also appreciate your codes.
Comment
Øyvind Snilsberg

Join Date: Oct 2021

Posts: 591
#4

12 Nov 2021, 14:27

good luck!
Comment

KAYDEN MARLI

Join Date: Apr 2019
Posts: 29

13 Nov 2021, 16:29

this is my household file id

copy starting from the next line ------ ----------------

Code:

* Example generated by -dataex-. To install: ssc install    dataex
clear
input double ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
end

copy up to and including the previous line - ----------------

Listed 100 out of 14037 observations
Use the count() option to list more

this is my individual file

. dataex ID person_number

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double(ID person_number)
 1 3
 1 2
 1 3
 1 1
 1 2
 1 1
 2 1
 2 3
 2 3
 2 2
 3 2
 3 3
 3 1
 4 1
 4 2
 5 1
 5 2
 5 3
 6 2
 6 3
 6 1
 7 1
 8 1
 9 1
10 2
10 4
10 1
10 3
11 3
11 2
11 1
12 3
12 3
12 1
12 2
13 5
13 2
13 5
13 4
13 3
13 1
14 1
15 1
15 4
15 3
15 2
16 2
16 4
16 4
16 3
16 1
17 2
17 1
18 2
18 1
19 1
20 1
21 3
21 2
21 1
22 3
22 4
22 4
22 2
22 5
22 1
23 1
23 2
23 3
23 5
23 4
24 2
24 1
25 3
25 1
25 2
25 4
26 2
26 4
26 3
26 1
27 1
28 1
28 2
28 3
29 3
29 1
29 2
29 4
30 1
31 2
31 3
31 4
31 5
31 1
32 3
32 4
32 2
32 1
32 5
end
label values person_number person_number
label def person_number 1 "1", modify
label def person_number 2 "2", modify
label def person_number 3 "3", modify
label def person_number 4 "4", modify
label def person_number 5 "5", modify

------------------ copy up to and including the previous line ------------------

Listed 100 out of 53636 observations
Use the count() option to list more

i have merged themut b i can not make analysis based on household level , because i get more then 14037 household observations , what am i doing wrong

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

13 Nov 2021, 18:15

Well, your person-level file contains multiple persons per household, so, of course, the merged data set contains more observations than the household data set did. Depending on your research questions, it might be appropriate to reduce the merged data set to one observation per household by combining the responses of the different people within a given household. Combining might mean averaging, or the median, or the max, or the min, or the response of the oldest person, or that of the youngest, or...there are so many possibilities here! A different approach, suitable for some types of questions, would be to do the analysis with a multi-level model. You do not say what your research question is, so it is impossible to give specific advice.

By the way, it is important to keep threads on topic so that people who come searching the Forum, or who come to browse specific topics, can find what they are looking for without wasting a lot of time weeding out irrelevant material. Your question is really only tangentially related to the main topic of this thread--they both mention "id," but really deal with very different problems. In the future, please start a new thread when your question is not a true continuation of the established thread.
Comment

Announcement

How to merge multiple data sets without id variable?

Comment

Comment

Comment

Comment

Comment