Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging a country-level dataset and an individual-level dataset

    Dear Statalisters:

    I am trying to merge a country-level dataset and an individual-level dataset, as below. test1 has a country-level variable "GDP." test3 has individual-level variables "wage" and "productivity." GDP is a country-level variable, so the same value should appear for each country-year cell. But as you can see, GDP is missing after its first appearance in year. How can I make Stata repeat the same value of GDP for each country-year cell? ("indid" is individual id.)
    Thank you in advance for your help.

    Best wishes,

    Taka

    Code:
    . insheet using test1.csv
    (4 vars, 11 obs)
    
    . l
    
            
    country   year   indid    gdp    
            
    1.        1   1960       1    1    
    2.        2   1961       2    8    
    3.        3   1962       1    4    
    4.        4   1963       1    7    
    5.        5   1964       1    9    
            
    6.        6   1965       1    3    
    7.        7   1966       1    4    
    8.        8   1967       1    7    
    9.        9   1968       1    3    
    10.       10   1969       1    9    
            
    11.       11   1970       1    6
    Code:
    . insheet using test3.csv
    (5 vars, 27 obs)
    
    . l
    
                
    country   year   indid    wage    produc~y    
                
    1.        1   1960       1    123    24    
    2.        1   1960       2    134    4    
    3.        1   1960       3    255    5    
    4.        1   1960       4    222    2    
    5.        1   1960       5    88    6    
                
    6.        1   1960       6    999    7    
    7.        1   1960       7    198    3    
    8.        1   1960       8    33    1    
    9.        1   1960       9    47    4    
    10.        1   1960      10    56    5    
                
    11.        1   1960      11    79    7    
    12.        2   1960       1    9    29    
    13.        3   1960       1    15    19    
    14.        4   1960       1    71    38    
    15.        5   1960       1    8    45    
                
    16.        5   1960       2    21    65    
    17.        5   1960       3    4    34    
    18.        5   1960       4    3    63    
    19.        5   1960       5    76    41    
    20.        5   1960       6    54    79    
                
    21.        5   1960       7    43    41    
    22.        6   1960       1    89    54    
    23.        7   1960       1    8    27    
    24.        8   1960       1    67    87    
    25.        9   1960       1    10    13    
                
    26.       10   1960       1    15    15    
    27.       11   1960       1    13    67
    Code:
    .    merge 1:1  country    year    indid    using test3
    
        Result            Number of obs
                    
        Not matched            36
        from master            10    (_merge==1)
        from using            26    (_merge==2)
    
        Matched            1    (_merge==3)
    Code:
    . sort    country    year indid
    
    . l
    
                                
        country    year   indid    gdp    wage    produc~y    _merge    
                                
    1.    1    1960       1    1    123    24    Matched (3)    
    2.    1    1960       2    .    134    4    Using only (2)    
    3.    1    1960       3    .    255    5    Using only (2)    
    4.    1    1960       4    .    222    2    Using only (2)    
    5.    1    1960       5    .    88    6    Using only (2)    
                                
    6.    1    1960       6    .    999    7    Using only (2)    
    7.    1    1960       7    .    198    3    Using only (2)    
    8.    1    1960       8    .    33    1    Using only (2)    
    9.    1    1960       9    .    47    4    Using only (2)    
    10.    1    1960      10    .    56    5    Using only (2)    
                                
    11.    1    1960      11    .    79    7    Using only (2)    
    12.    2    1960       1    .    9    29    Using only (2)    
    13.    2    1961       2    8    .    .    Master only (1)    
    14.    3    1960       1    .    15    19    Using only (2)    
    15.    3    1962       1    4    .    .    Master only (1)    
                                
    16.    4    1960       1    .    71    38    Using only (2)    
    17.    4    1963       1    7    .    .    Master only (1)    
    18.    5    1960       1    .    8    45    Using only (2)    
    19.    5    1960       2    .    21    65    Using only (2)    
    20.    5    1960       3    .    4    34    Using only (2)    
                                
    21.    5    1960       4    .    3    63    Using only (2)    
    22.    5    1960       5    .    76    41    Using only (2)    
    23.    5    1960       6    .    54    79    Using only (2)    
    24.    5    1960       7    .    43    41    Using only (2)    
    25.    5    1964       1    9    .    .    Master only (1)    
                                
    26.    6    1960       1    .    89    54    Using only (2)    
    27.    6    1965       1    3    .    .    Master only (1)    
    28.    7    1960       1    .    8    27    Using only (2)    
    29.    7    1966       1    4    .    .    Master only (1)    
    30.    8    1960       1    .    67    87    Using only (2)    
                                
    31.    8    1967       1    7    .    .    Master only (1)    
    32.    9    1960       1    .    10    13    Using only (2)    
    33.    9    1968       1    3    .    .    Master only (1)    
    34.    10    1960       1    .    15    15    Using only (2)    
    35.    10    1969       1    9    .    .    Master only (1)    
                                
    36.    11    1960       1    .    13    67    Using only (2)    
    37.    11    1970       1    6    .    .    Master only (1)

  • #2
    Something is wrong with data set test1. It is supposed to be a country-year level data set. It should not even have an indid variable in it. Why is that there?

    Anyway, the linkage you need involves ignoring indid in the matching and doing an m:1 merge on country and year.
    Code:
    use test3, clear
    merge m:1 country year using test1

    Comment


    • #3
      Thank you. It worked. The dataset I'll be compiling is for mixed effects models, which have country-level as well as individual-level variables. indid is a variable identifying individuals in the countries. In light of that, does the following look good?
      Code:
           +------------------------------------------------------------------+
           | country   year   indid   wage   produc~y   gdp            _merge |
           |------------------------------------------------------------------|
        1. |       1   1960       1    123         24     1       Matched (3) |
        2. |       1   1960       2    134          4     1       Matched (3) |
        3. |       1   1960       3    255          5     1       Matched (3) |
        4. |       1   1960       4    222          2     1       Matched (3) |
        5. |       1   1960       5     88          6     1       Matched (3) |
           |------------------------------------------------------------------|
        6. |       1   1960       6    999          7     1       Matched (3) |
        7. |       1   1960       7    198          3     1       Matched (3) |
        8. |       1   1960       8     33          1     1       Matched (3) |
        9. |       1   1960       9     47          4     1       Matched (3) |
       10. |       1   1960      10     56          5     1       Matched (3) |
           |------------------------------------------------------------------|
       11. |       1   1960      11     79          7     1       Matched (3) |
       12. |       2   1960       1      9         29     .   Master only (1) |
       13. |       2   1961       2      .          .     8    Using only (2) |
       14. |       3   1960       1     15         19     .   Master only (1) |
       15. |       3   1962       1      .          .     4    Using only (2) |
           |------------------------------------------------------------------|
       16. |       4   1960       1     71         38     .   Master only (1) |
       17. |       4   1963       1      .          .     7    Using only (2) |
       18. |       5   1960       1      8         45     .   Master only (1) |
       19. |       5   1960       2     21         65     .   Master only (1) |
       20. |       5   1960       3      4         34     .   Master only (1) |
           |------------------------------------------------------------------|
       21. |       5   1960       4      3         63     .   Master only (1) |
       22. |       5   1960       5     76         41     .   Master only (1) |
       23. |       5   1960       6     54         79     .   Master only (1) |
       24. |       5   1960       7     43         41     .   Master only (1) |
       25. |       5   1964       1      .          .     9    Using only (2) |
           |------------------------------------------------------------------|
       26. |       6   1960       1     89         54     .   Master only (1) |
       27. |       6   1965       1      .          .     3    Using only (2) |
       28. |       7   1960       1      8         27     .   Master only (1) |
       29. |       7   1966       1      .          .     4    Using only (2) |
       30. |       8   1960       1     67         87     .   Master only (1) |
           |------------------------------------------------------------------|
       31. |       8   1967       1      .          .     7    Using only (2) |
       32. |       9   1960       1     10         13     .   Master only (1) |
       33. |       9   1968       1      .          .     3    Using only (2) |
       34. |      10   1960       1     15         15     .   Master only (1) |
       35. |      10   1969       1      .          .     9    Using only (2) |
           |------------------------------------------------------------------|
       36. |      11   1960       1     13         67     .   Master only (1) |
       37. |      11   1970       1      .          .     6    Using only (2) |
           +------------------------------------------------------------------+
      Thank you for your help.

      Best,

      Taka

      Comment


      • #4
        Yes, this looks like suitable data for a two-level model.

        Comment


        • #5
          Thank you. Could you tell me why I get the error message when I reverse test1 and test3 (variables country year do not uniquely identify observations in the using data)?

          Comment


          • #6
            Because if you reverse test1 and test3, then test3 becomes the using data, and in test3 there are multiple observations for each country-year combination (each such observation having a different value of individual id). If you wanted to do it in that order, you would have to also change m:1 to 1:m in the code.

            But don't do that because there is another problem that arises if you interchange test1 and test3 in the code. The problem arises because when both data sets have a common variable other than the merge key variable(s), which yours do, namely indid, the value in the master data is preserved and the values in the using data are lost. But your values of indid in test1 are an incomplete set of values--only test3 has the complete set of values of indid. So you need to preserve the values from test3 and lose the ones from test1. So test3 needs to be master and test1 needs to be using.

            Comment


            • #7
              Thank you. I understand.

              Thanks again for everything.

              Best wishes,

              Taka

              Comment

              Working...
              X