Hi All,
I am currently working with two datasets, A and B. Both of these datasets look like the following (they have the exact same variables) and format:
----------------------- copy starting from the next line -----------------------
------------------ copy up to and including the previous line ------------------
Listed 10 out of 10 observations
In the above dataset, each datapoint represents a year-employer-employee-department rating. So for instance, the first row corresponds to firm 1, employee number 2, in department number 1, who received a score of 21 in the year 1971. I have two datasets that have a significant overlap of coverage in terms of years considered, but potentially contain a lot of different data (conversely, they contain a lot of the same data as well). What I would like to do is to combine both datasets that would maximize information from either datasets and drop duplicates.
There is one issue that is forecastable- what would happen to those observations where the rating is different, but all else the same? Hopefully, there are not too many of those! In such cases, I would not mind picking one dataset over another, as there is no a priori I have about the credibility of data in both datasets. I would greatly appreciate any help!
Many thanks,
Chinmay
I am currently working with two datasets, A and B. Both of these datasets look like the following (they have the exact same variables) and format:
----------------------- copy starting from the next line -----------------------
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(year firm employee department score) 1971 1 2 1 21 1971 1 3 1 32 1971 1 4 1 3 1971 2 1 1 32 1971 2 3 1 23 1971 1 2 2 32 1971 1 3 2 12 1971 1 4 2 32 1972 1 2 1 232 1972 1 3 1 12 end
Listed 10 out of 10 observations
In the above dataset, each datapoint represents a year-employer-employee-department rating. So for instance, the first row corresponds to firm 1, employee number 2, in department number 1, who received a score of 21 in the year 1971. I have two datasets that have a significant overlap of coverage in terms of years considered, but potentially contain a lot of different data (conversely, they contain a lot of the same data as well). What I would like to do is to combine both datasets that would maximize information from either datasets and drop duplicates.
There is one issue that is forecastable- what would happen to those observations where the rating is different, but all else the same? Hopefully, there are not too many of those! In such cases, I would not mind picking one dataset over another, as there is no a priori I have about the credibility of data in both datasets. I would greatly appreciate any help!
Many thanks,
Chinmay
Comment