Hi all,
I'm new to stata and I have been "learning by doing". This forum has been of utmost help. I'm currently working on a panel data, trying to calculate the clean and dirty share of bilateral exports as dependent variables. I am working with 197 countries for 37 years. To achieve this, I first generated a master dataset of total bilateral export containing 595,726 observations. Afterwards, I generated a dataset of exports in "dirty goods" with 2,137,764 observations and another dataset of exports in "clean goods" containing 8,576,162 observations. (The reason for the disparity in the number of observations is because the data from sectoral export is more detailed) N/B: Data from UN Comtrade using HS (For total and clean sectors) and SITC codes (For dirty sectors)
To calculate the clean and dirty share I should sum up sectoral exports and relate them to total exports. I have been on this for a while. The closest I came to getting it right was from a previous reply by Clyde Schechter. I proceeded as follows:
For the "clean" and "dirty" export dataset
1. by reportercode partnercode year, sort: egen cleanvalues = total (values)
2. by reportercode partnercode year, sort: egen dirtyvalues = total (values)
For the total export dataset
1. by reportercode year, sort: egen totalvalues = total(values)
Thereafter, I could only merge each (dirty and clean to the total) of them using a 1:1 by observation, such that all observations in the master dataset were matched and i dropped all those that were not matched. I then took the log of total values, cleanvalues and dirtyvalues before calculating the share by
1. gen cleanshare = ln_cleanvalues/ln_totalvalues
2. gen dirtyshare = ln_dirtyvalues/ln_totalvalues
For the cleanshare, I get a summary of (0 min, 0.6546 mean, 0.125sd and 1.21 max)
For the dirtyshare, I get a summary of (0 min, 0.6541 mean, 0.157sd and 1.45 max)
Concerns
1. The share should bound between 0 and 1. The Maximum values are above 1 in both cases
2. The mean values seem overstated. They should at most equate 1. In this instance, it should be a lot less than 1 as a lot of other sectors were not covered. Calculating manually, the sum of clean over sum of total = 0.038 and that of dirty = 0.089
QUESTIONS
1. Please did I do something wrongly?
2. Please how can I match these correctly?
N/b: Please I would appreciate urgent feedback as I am pressed for time and need to move on to other variables
Many Thanks.
I'm new to stata and I have been "learning by doing". This forum has been of utmost help. I'm currently working on a panel data, trying to calculate the clean and dirty share of bilateral exports as dependent variables. I am working with 197 countries for 37 years. To achieve this, I first generated a master dataset of total bilateral export containing 595,726 observations. Afterwards, I generated a dataset of exports in "dirty goods" with 2,137,764 observations and another dataset of exports in "clean goods" containing 8,576,162 observations. (The reason for the disparity in the number of observations is because the data from sectoral export is more detailed) N/B: Data from UN Comtrade using HS (For total and clean sectors) and SITC codes (For dirty sectors)
To calculate the clean and dirty share I should sum up sectoral exports and relate them to total exports. I have been on this for a while. The closest I came to getting it right was from a previous reply by Clyde Schechter. I proceeded as follows:
For the "clean" and "dirty" export dataset
1. by reportercode partnercode year, sort: egen cleanvalues = total (values)
2. by reportercode partnercode year, sort: egen dirtyvalues = total (values)
For the total export dataset
1. by reportercode year, sort: egen totalvalues = total(values)
Thereafter, I could only merge each (dirty and clean to the total) of them using a 1:1 by observation, such that all observations in the master dataset were matched and i dropped all those that were not matched. I then took the log of total values, cleanvalues and dirtyvalues before calculating the share by
1. gen cleanshare = ln_cleanvalues/ln_totalvalues
2. gen dirtyshare = ln_dirtyvalues/ln_totalvalues
For the cleanshare, I get a summary of (0 min, 0.6546 mean, 0.125sd and 1.21 max)
For the dirtyshare, I get a summary of (0 min, 0.6541 mean, 0.157sd and 1.45 max)
Concerns
1. The share should bound between 0 and 1. The Maximum values are above 1 in both cases
2. The mean values seem overstated. They should at most equate 1. In this instance, it should be a lot less than 1 as a lot of other sectors were not covered. Calculating manually, the sum of clean over sum of total = 0.038 and that of dirty = 0.089
QUESTIONS
1. Please did I do something wrongly?
2. Please how can I match these correctly?
N/b: Please I would appreciate urgent feedback as I am pressed for time and need to move on to other variables
Many Thanks.

Comment