Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge multiple data sets without id variable?

    Hi, I want to merge multiple cross-sectional micro data files for each year (1992-2003) to make a single unbalanced panel on my director such as "D:\mdis2\firm.1992.dta".
    All the data sets are identical in terms of variables.
    Only problem is there is no identifier so I want to create a unique id using locations, sectors and starting_day.
    Is there any way to merge all those files after creating a new id variable efficiently?
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
     
    input float year byte location1 int location2 byte location3 str1 sector1 byte sector2 float starting_day byte v7 int v8 byte(v9 v10)
    1992 11 10 11 "D" 19    . . . . .
    1992 11 10 12 "D" 15    . . . . .
    1992 11 10 12 "D" 17    . . . . .
    1992 11 10 12 "D" 22    . . . . .
    1992 11 10 13 "D" 17    . . . . .
    1992 11 10 13 "D" 18    . . . . .
    1992 11 10 13 "D" 22 5706 1 0 0 0
    1992 11 10 13 "D" 22 6010 1 0 0 0
    1992 11 10 13 "D" 22 6205 1 0 0 0
    1992 11 10 13 "D" 22 6606 1 0 0 0
    1992 11 10 13 "D" 22 6906 1 0 0 0
    1992 11 10 13 "D" 22 7105 1 0 0 0
    1992 11 10 13 "D" 22 7209 1 0 0 0
    1992 11 10 13 "D" 22 7304 1 0 0 0
    1992 11 10 13 "D" 22 7305 1 0 0 0
    1992 11 10 13 "D" 22 7601 3 1 1 0
    1992 11 10 13 "D" 22 8101 1 0 0 0
    1992 11 10 13 "D" 22 8601 3 1 1 0
    1992 11 10 13 "D" 22 8607 3 1 0 1
    1992 11 10 13 "D" 22 8808 3 1 1 0
    1992 11 10 13 "D" 22 8903 1 0 0 0
    1992 11 10 13 "D" 22 9103 3 1 1 0
    1992 11 10 13 "D" 22 9105 1 0 0 0
    1992 11 10 13 "D" 22 9106 1 0 0 0
    1992 11 10 13 "D" 36    . . . . .
    1992 11 10 14 "D" 22 6302 3 1 1 0
    1992 11 10 14 "D" 22 7804 3 1 1 0
    1992 11 10 14 "D" 22 7910 3 1 1 0
    1992 11 10 14 "D" 22 8712 3 1 1 0
    1992 11 10 14 "D" 22 8906 3 1 1 0
    1992 11 10 14 "D" 22 8909 3 1 0 1
    1992 11 10 14 "D" 22 9104 1 0 0 0
    1992 11 10 14 "D" 32    . . . . .
    1992 11 10 14 "D" 33    . . . . .
    1992 11 10 14 "D" 36    . . . . .
    1992 11 10 15 "D" 17    . . . . .
    1992 11 10 15 "D" 21    . . . . .
    1992 11 10 15 "D" 22    . . . . .
    1992 11 10 15 "D" 22    . . . . .
    1992 11 10 16 "D" 17    . . . . .
    1992 11 10 16 "D" 18    . . . . .
    1992 11 10 16 "D" 18    . . . . .
    1992 11 10 16 "D" 21    . . . . .
    1992 11 10 16 "D" 22    . . . . .
    1992 11 10 16 "D" 24    . . . . .
    1992 11 10 16 "D" 30    . . . . .
    1992 11 10 17 "D" 22 6807 3 1 1 0
    1992 11 10 17 "D" 22 8003 3 1 1 0
    1992 11 10 17 "D" 22 8211 3 1 1 0
    1992 11 10 17 "D" 22 8802 3 1 1 0
    1992 11 10 18 "D" 19    . . . . .
    1992 11 10 18 "D" 22 5211 3 1 1 0
    1992 11 10 18 "D" 22 6503 3 1 1 0
    1992 11 10 18 "D" 22 7202 3 1 1 0
    1992 11 10 18 "D" 22 7204 3 2 2 0
    1992 11 10 18 "D" 22 7507 1 0 0 0
    1992 11 10 18 "D" 22 7508 3 1 1 0
    1992 11 10 18 "D" 22 7708 3 1 1 0
    1992 11 10 18 "D" 22 7709 3 1 1 0
    1992 11 10 18 "D" 22 7809 3 1 1 0
    1992 11 10 18 "D" 22 7902 3 1 1 0
    1992 11 10 18 "D" 22 8005 3 1 1 0
    1992 11 10 18 "D" 22 8007 3 1 1 0
    1992 11 10 18 "D" 22 8403 3 1 1 0
    1992 11 10 18 "D" 22 8504 3 1 1 0
    1992 11 10 18 "D" 22 9108 3 1 1 0
    1992 11 10 18 "D" 22 9212 3 1 1 0
    1992 11 10 18 "D" 24    . . . . .
    1992 11 10 18 "D" 25    . . . . .
    1992 11 10 18 "D" 28    . . . . .
    1992 11 10 18 "D" 29    . . . . .
    1992 11 10 18 "D" 32    . . . . .
    1992 11 10 18 "D" 36    . . . . .
    1992 11 10 19 "D" 17    . . . . .
    1992 11 10 19 "D" 22  808 2 0 0 0
    1992 11 10 19 "D" 22 5305 2 0 0 0
    1992 11 10 19 "D" 22 5611 3 1 1 0
    1992 11 10 19 "D" 22 6404 3 1 1 0
    1992 11 10 19 "D" 22 7107 3 1 1 0
    1992 11 10 19 "D" 22 8104 1 0 0 0
    1992 11 10 19 "D" 22 8306 3 1 1 0
    1992 11 10 19 "D" 22 8608 3 1 1 0
    1992 11 10 19 "D" 22 8809 3 1 1 0
    1992 11 10 19 "D" 22 8902 1 0 0 0
    1992 11 10 19 "D" 28    . . . . .
    1992 11 10 19 "D" 28    . . . . .
    1992 11 10 19 "D" 29    . . . . .
    1992 11 10 20 "D" 22    . . . . .
    1992 11 10 21 "D" 15    . . . . .
    1992 11 10 21 "D" 17    . . . . .
    1992 11 10 21 "D" 17    . . . . .
    1992 11 10 21 "D" 18 6901 3 1 0 1
    1992 11 10 21 "D" 18 8404 3 1 0 1
    1992 11 10 21 "D" 18 9004 1 0 0 0
    1992 11 10 21 "D" 22 2606 2 0 0 0
    1992 11 10 21 "D" 22 4512 1 0 0 0
    1992 11 10 21 "D" 22 5011 3 1 1 0
    1992 11 10 21 "D" 22 5309 3 1 1 0
    1992 11 10 21 "D" 22 5406 1 0 0 0
    1992 11 10 21 "D" 22 5409 2 0 0 0
    end
    Last edited by Chul-Kyoo Jung; 12 Nov 2021, 14:06.

  • #2
    although I recommend you append the files,
    Code:
    forv j = 1992 / 2003 {
    u firm`j' ,clear
    egen id = group(starting_day location* sector*)    
    tempfile firm`j'
    save `firm`j''    
    }
    
    u `firm1992'
    forv k = 1993 / 2003 {
    merge 1:1 id using `firm`k'' ,nogen
    }

    Comment


    • #3
      @Sinlsberg, Thank you. You are totally right, I was totally forgetful about append. Let me try append. Also appreciate your codes.

      Comment


      • #4
        good luck!

        Comment


        • #5
          this is my household file id


          copy starting from the next line ------ ----------------
          Code:
          * Example generated by -dataex-. To install: ssc install    dataex
          clear
          input double ID
          1
          2
          3
          4
          5
          6
          7
          8
          9
          10
          11
          12
          13
          14
          15
          16
          17
          18
          19
          20
          21
          22
          23
          24
          25
          26
          27
          28
          29
          30
          31
          32
          33
          34
          35
          36
          37
          38
          39
          40
          41
          42
          43
          44
          45
          46
          47
          48
          49
          50
          51
          52
          53
          54
          55
          56
          57
          58
          59
          60
          61
          62
          63
          64
          65
          66
          67
          68
          69
          70
          71
          72
          73
          74
          75
          76
          77
          78
          79
          80
          81
          82
          83
          84
          85
          86
          87
          88
          89
          90
          91
          92
          93
          94
          95
          96
          97
          98
          99
          100
          end
          copy up to and including the previous line - ----------------

          Listed 100 out of 14037 observations
          Use the count() option to list more



          this is my individual file

          . dataex ID person_number

          ----------------------- copy starting from the next line -----------------------
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input double(ID person_number)
           1 3
           1 2
           1 3
           1 1
           1 2
           1 1
           2 1
           2 3
           2 3
           2 2
           3 2
           3 3
           3 1
           4 1
           4 2
           5 1
           5 2
           5 3
           6 2
           6 3
           6 1
           7 1
           8 1
           9 1
          10 2
          10 4
          10 1
          10 3
          11 3
          11 2
          11 1
          12 3
          12 3
          12 1
          12 2
          13 5
          13 2
          13 5
          13 4
          13 3
          13 1
          14 1
          15 1
          15 4
          15 3
          15 2
          16 2
          16 4
          16 4
          16 3
          16 1
          17 2
          17 1
          18 2
          18 1
          19 1
          20 1
          21 3
          21 2
          21 1
          22 3
          22 4
          22 4
          22 2
          22 5
          22 1
          23 1
          23 2
          23 3
          23 5
          23 4
          24 2
          24 1
          25 3
          25 1
          25 2
          25 4
          26 2
          26 4
          26 3
          26 1
          27 1
          28 1
          28 2
          28 3
          29 3
          29 1
          29 2
          29 4
          30 1
          31 2
          31 3
          31 4
          31 5
          31 1
          32 3
          32 4
          32 2
          32 1
          32 5
          end
          label values person_number person_number
          label def person_number 1 "1", modify
          label def person_number 2 "2", modify
          label def person_number 3 "3", modify
          label def person_number 4 "4", modify
          label def person_number 5 "5", modify
          ------------------ copy up to and including the previous line ------------------

          Listed 100 out of 53636 observations
          Use the count() option to list more




          i have merged themut b i can not make analysis based on household level , because i get more then 14037 household observations , what am i doing wrong




          Comment


          • #6
            Well, your person-level file contains multiple persons per household, so, of course, the merged data set contains more observations than the household data set did. Depending on your research questions, it might be appropriate to reduce the merged data set to one observation per household by combining the responses of the different people within a given household. Combining might mean averaging, or the median, or the max, or the min, or the response of the oldest person, or that of the youngest, or...there are so many possibilities here! A different approach, suitable for some types of questions, would be to do the analysis with a multi-level model. You do not say what your research question is, so it is impossible to give specific advice.

            By the way, it is important to keep threads on topic so that people who come searching the Forum, or who come to browse specific topics, can find what they are looking for without wasting a lot of time weeding out irrelevant material. Your question is really only tangentially related to the main topic of this thread--they both mention "id," but really deal with very different problems. In the future, please start a new thread when your question is not a true continuation of the established thread.

            Comment

            Working...
            X