Dear Statalist
I have two files I’d like to merge.
File 1 is in wide format containing accounting data for 300 000 companies (legal entities). Company identifier is variable ID_company, and 1 record = 1 company (no duplicates). The dataset does not contain the variable Employee. File 1 contains no information about subunits.
File 2 is in long format containing all subunits for the companies in file 1 and the number of employees. Identifier for subunits is ID_subunit and the number of employee for each subunits is in the variable Employee. File 2 contains no duplicates on subunits.
Example: File 2 in long format
The value of employees is recorded per subunit and not per company. This explains the data structure as mentioned.
The problem arises in merging these files. It is possible to merge these files, and keep the records from using dataset. But then more problems arises later in the analysis because we need to keep the information about number of employees per subunit (we later split income and costs to geographical location for each subunit and we need Employee to do this – it’s roughly the same procedure as in national accounts).
Does anyone have a solution in STATA as how to transpose File 2 from long to wide format with different number of subunits per company? With same data structure in File 1 and File 2, then it is easier to merge the files and use the variable Employee in the analysis that follows.
I appreciate any thoughts.
Example: File 2 in wide format
Thanks,
Oystein Linnestad
Norway
I have two files I’d like to merge.
File 1 is in wide format containing accounting data for 300 000 companies (legal entities). Company identifier is variable ID_company, and 1 record = 1 company (no duplicates). The dataset does not contain the variable Employee. File 1 contains no information about subunits.
File 2 is in long format containing all subunits for the companies in file 1 and the number of employees. Identifier for subunits is ID_subunit and the number of employee for each subunits is in the variable Employee. File 2 contains no duplicates on subunits.
Example: File 2 in long format
ID_company | ID_subunit | Employee |
998877665 | 766554433 | 5 |
998877665 | 655443322 | 12 |
998877665 | 544332211 | 23 |
The problem arises in merging these files. It is possible to merge these files, and keep the records from using dataset. But then more problems arises later in the analysis because we need to keep the information about number of employees per subunit (we later split income and costs to geographical location for each subunit and we need Employee to do this – it’s roughly the same procedure as in national accounts).
Does anyone have a solution in STATA as how to transpose File 2 from long to wide format with different number of subunits per company? With same data structure in File 1 and File 2, then it is easier to merge the files and use the variable Employee in the analysis that follows.
I appreciate any thoughts.
Example: File 2 in wide format
ID_company | ID_subunit_1 | Employee_1 | ID_subunit_2 | Employee_2 | ID_subunit_3 | Employee_3 |
998877665 | 766554433 | 5 | 655443322 | 12 | 544332211 | 23 |
Thanks,
Oystein Linnestad
Norway
Comment