Picking initial log of value added per worker per country in their initial year

Hugo Rocha

Join Date: Feb 2021

Posts: 288
#1

Picking initial log of value added per worker per country in their initial year

09 Oct 2021, 19:12

Hi,

I am trying to figure it out how to set up Stata to pick the first log of real value added per worker per country in the following dataset. For instance, if a certain country starts showing log of real value added per worker in Food Processing in 1963, how to generate a variable that separates this? or if a country starts having that variable (log of real value added per worker in Chemicals in year 1970)? Is there a short-curt top pick the first log of real value added per worker in any sector (isiccomb1) by country?

[/CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input int(country year) long isiccomb1 float logrealvalueaddedperworker long(Employment Establishments) double ValueAdded long FemaleEmployees float valueaddedperworker
4 1973 1 . 4321 . . . .
4 1974 1 . 4845 . . . .
4 1975 1 . 5103 . . . .
4 1976 1 . 4540 . . . .
4 1977 1 . 6309 . . . .
4 1978 1 . 6413 . . . .
4 1979 1 . 6006 . . . .
4 1980 1 . 5672 . . . .
4 1981 1 . 8481 62 . . .
4 1982 1 . 6866 69 . . .
4 1983 1 . 5080 22 . 376 .
4 1984 1 . 5187 22 . 419 .
4 1985 1 . 5446 26 . 742 .
4 1986 1 . 5556 30 . 736 .
4 1987 1 . 6177 48 . 865 .
4 1988 1 . 5950 48 . 1299 .
4 1990 1 . . . . 565 .
4 1991 1 . . . . 590 .
4 2002 1 . 1398 57 . 270 .
4 2003 1 . 1986 68 . 384 .
4 2004 1 . 2170 60 . 720 .
4 2005 1 . 4376 152 . 561 .
4 2006 1 . 5278 199 . 901 .
4 2007 1 . 5300 206 . 1237 .
4 2008 1 . 5431 212 . 899 .
4 2009 1 . 5400 199 . 870 .
4 2010 1 . 4893 195 . 912 .
4 2011 1 . 4901 197 . 1008 .
4 2012 1 . 4790 192 . . .
4 2013 1 . 4368 189 . . .
4 2014 1 . . . . . .
4 2015 1 . . . . . .
4 2016 1 . . . . . .
4 2017 1 . . . . . .
4 2018 1 . . . . . .
4 1973 3 . 0 . . . .
4 1974 3 . 0 . . . .
4 1975 3 . 0 . . . .
4 1976 3 . 0 . . . .
4 1977 3 . 0 . . . .
4 1978 3 . 0 . . . .
4 1979 3 . 0 . . . .
4 1980 3 . 0 . . . .
4 1981 3 . 0 0 . . .
4 1982 3 . 0 0 . . .
4 1983 3 . 0 0 . 0 .
4 1984 3 . 0 0 . 0 .
4 1985 3 . 0 0 . 0 .
4 1986 3 . 0 0 . 0 .
4 1987 3 . 0 0 . 0 .
4 1988 3 . 0 0 . 0 .
4 1990 3 . . . . 0 .
4 1991 3 . . . . 0 .
4 2002 3 . . . . . .
4 2003 3 . . . . . .
4 2004 3 . . . . . .
4 2005 3 . . . . . .
4 2006 3 . . . . . .
4 2007 3 . . . . . .
4 2008 3 . . . . . .
4 2009 3 . . . . . .
4 2010 3 . . . . . .
4 2011 3 . . . . . .
4 2012 3 . . . . . .
4 2013 3 . . . . . .
4 2014 3 . . . . . .
4 2015 3 . . . . . .
4 2016 3 . . . . . .
4 2017 3 . . . . . .
4 2018 3 . . . . . .
4 1973 6 . 12555 . . . .
4 1974 6 . 14243 . . . .
4 1975 6 . 17202 . . . .
4 1976 6 . 20520 . . . .
4 1977 6 . 20540 . . . .
4 1978 6 . 22889 . . . .
4 1979 6 . 23733 . . . .
4 1980 6 . 20438 . . . .
4 1981 6 . 15068 56 . . .
4 1982 6 . 13387 63 . . .
4 1983 6 . 10067 73 . 303 .
4 1984 6 . 10208 75 . 290 .
4 1985 6 . 10468 78 . 516 .
4 1986 6 . 11716 59 . 1422 .
4 1987 6 . 12060 71 . 1382 .
4 1988 6 . 11497 72 . 1358 .
4 1990 6 . . . . 1825 .
4 1991 6 . . . . 674 .
4 2002 6 . 830 11 . 710 .
4 2003 6 . 2371 22 . 1800 .
4 2004 6 . 1870 8 . 1709 .
4 2005 6 . 4474 31 . 3900 .
4 2006 6 . 5808 49 . 5000 .
4 2007 6 . 6700 57 . 5908 .
4 2008 6 . 6917 64 . 6010 .
4 2009 6 . 6902 66 . 6030 .
4 2010 6 . 6820 66 . 6048 .
4 2011 6 . 6853 68 . 6120 .
4 2012 6 . 6749 68 . . .
4 2013 6 . 6222 66 . . .
end
label values isiccomb1 isiccomb1
label def isiccomb1 1 "15", modify
label def isiccomb1 3 "16", modify
label def isiccomb1 6 "17", modify
[/CODE]
------------------ copy up to and including the previous line ------------------

Listed 100 out of 104180 observations
Use the count() option to list more
Tags: None

Hugo Rocha

Join Date: Feb 2021
Posts: 288

09 Oct 2021, 19:13

Code:

 input int(country year) long isiccomb1 float logrealvalueaddedperworker long(Employment Establishments) double ValueAdded long FemaleEmployees float valueaddedperworker
4 1973 1 .  4321   . .    . .
4 1974 1 .  4845   . .    . .
4 1975 1 .  5103   . .    . .
4 1976 1 .  4540   . .    . .
4 1977 1 .  6309   . .    . .
4 1978 1 .  6413   . .    . .
4 1979 1 .  6006   . .    . .
4 1980 1 .  5672   . .    . .
4 1981 1 .  8481  62 .    . .
4 1982 1 .  6866  69 .    . .
4 1983 1 .  5080  22 .  376 .
4 1984 1 .  5187  22 .  419 .
4 1985 1 .  5446  26 .  742 .
4 1986 1 .  5556  30 .  736 .
4 1987 1 .  6177  48 .  865 .
4 1988 1 .  5950  48 . 1299 .
4 1990 1 .     .   . .  565 .
4 1991 1 .     .   . .  590 .
4 2002 1 .  1398  57 .  270 .
4 2003 1 .  1986  68 .  384 .
4 2004 1 .  2170  60 .  720 .
4 2005 1 .  4376 152 .  561 .
4 2006 1 .  5278 199 .  901 .
4 2007 1 .  5300 206 . 1237 .
4 2008 1 .  5431 212 .  899 .
4 2009 1 .  5400 199 .  870 .
4 2010 1 .  4893 195 .  912 .
4 2011 1 .  4901 197 . 1008 .
4 2012 1 .  4790 192 .    . .
4 2013 1 .  4368 189 .    . .
4 2014 1 .     .   . .    . .
4 2015 1 .     .   . .    . .
4 2016 1 .     .   . .    . .
4 2017 1 .     .   . .    . .
4 2018 1 .     .   . .    . .
4 1973 3 .     0   . .    . .
4 1974 3 .     0   . .    . .
4 1975 3 .     0   . .    . .
4 1976 3 .     0   . .    . .
4 1977 3 .     0   . .    . .
4 1978 3 .     0   . .    . .
4 1979 3 .     0   . .    . .
4 1980 3 .     0   . .    . .
4 1981 3 .     0   0 .    . .
4 1982 3 .     0   0 .    . .
4 1983 3 .     0   0 .    0 .
4 1984 3 .     0   0 .    0 .
4 1985 3 .     0   0 .    0 .
4 1986 3 .     0   0 .    0 .
4 1987 3 .     0   0 .    0 .
4 1988 3 .     0   0 .    0 .
4 1990 3 .     .   . .    0 .
4 1991 3 .     .   . .    0 .
4 2002 3 .     .   . .    . .
4 2003 3 .     .   . .    . .
4 2004 3 .     .   . .    . .
4 2005 3 .     .   . .    . .
4 2006 3 .     .   . .    . .
4 2007 3 .     .   . .    . .
4 2008 3 .     .   . .    . .
4 2009 3 .     .   . .    . .
4 2010 3 .     .   . .    . .
4 2011 3 .     .   . .    . .
4 2012 3 .     .   . .    . .
4 2013 3 .     .   . .    . .
4 2014 3 .     .   . .    . .
4 2015 3 .     .   . .    . .
4 2016 3 .     .   . .    . .
4 2017 3 .     .   . .    . .
4 2018 3 .     .   . .    . .
4 1973 6 . 12555   . .    . .
4 1974 6 . 14243   . .    . .
4 1975 6 . 17202   . .    . .
4 1976 6 . 20520   . .    . .
4 1977 6 . 20540   . .    . .
4 1978 6 . 22889   . .    . .
4 1979 6 . 23733   . .    . .
4 1980 6 . 20438   . .    . .
4 1981 6 . 15068  56 .    . .
4 1982 6 . 13387  63 .    . .
4 1983 6 . 10067  73 .  303 .
4 1984 6 . 10208  75 .  290 .
4 1985 6 . 10468  78 .  516 .
4 1986 6 . 11716  59 . 1422 .
4 1987 6 . 12060  71 . 1382 .
4 1988 6 . 11497  72 . 1358 .
4 1990 6 .     .   . . 1825 .
4 1991 6 .     .   . .  674 .
4 2002 6 .   830  11 .  710 .
4 2003 6 .  2371  22 . 1800 .
4 2004 6 .  1870   8 . 1709 .
4 2005 6 .  4474  31 . 3900 .
4 2006 6 .  5808  49 . 5000 .
4 2007 6 .  6700  57 . 5908 .
4 2008 6 .  6917  64 . 6010 .
4 2009 6 .  6902  66 . 6030 .
4 2010 6 .  6820  66 . 6048 .
4 2011 6 .  6853  68 . 6120 .
4 2012 6 .  6749  68 .    . .
4 2013 6 .  6222  66 .    . .
end
label values isiccomb1 isiccomb1
label def isiccomb1 1 "15", modify
label def isiccomb1 3 "16", modify
label def isiccomb1 6 "17", modify

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30001
#3

09 Oct 2021, 21:56

In your example data, the variable logrealvalueperworkeradded is always missing. So it isn't really possible to illustrate the code with that example. Nevertheless, here is code that should give what you want:

Code:

gen byte miss_logreal = missing(logrealvalueaddedperworker) by country isiccomb1 (miss_logreal year), sort: gen wanted = logrealvalueaddedperworker[1]

Note: This will give the first non-missing value of logrealvalueaddedperworker reported for each country/isiccomb1 (if there are any such). Given the high frequency of missing values for this variable, I assume that is what you wanted--that you don't want a missing value when the first year's observation is missing if there is eventually a non-missing value reported sometime.
1 like
Comment
Hugo Rocha

Join Date: Feb 2021

Posts: 288
#4

09 Oct 2021, 22:19

Thank you! This is exactly what I've wanted and if I want instead the last non-missing observation by country and sector, I guess I have to just change the index after sort and create a new variable like wanted1?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30001
#5

10 Oct 2021, 09:53

Well, it's not enough to just change the index in the final command because the missing values of logrealvalue.... have been sorted to the end. So you actually need to start with a different variable about missingness:

Code:

gen byte non_miss_logreal = !missing(logrealvalueaddedperworker) by country isiccomb1 (non_miss_logreal year), sort: gen wanted1 = logrealvalueaddedperworker[_N]

This way the non-missing values are sorted to the end, and the second command picks out the last of those.
1 like
Comment
Hugo Rocha

Join Date: Feb 2021

Posts: 288
#6

13 Oct 2021, 21:09

After doing the code, I was just wondering: Why are there these stacked observations (i.e., multiple growth rates for one initial level, resulting in dots going up in a straight line? It shouldn't be many different initial levels given that they are grouped by country isiccomb1?

Thank you so much,

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30001
#7

13 Oct 2021, 21:30

Well, there should, in theory, be (at most) one distinct value of log initial manufacturing value added per worker for each combination of country and isiccomb. Now, in your example data shown in #2, in fact you get no values that would show up in a graph because logrealvalueaddedperworker is always missing. Is that supposed to happen, or is that an error in your data? If it's supposed to happen, then you can't expect country 4 to contribute anything to the graph. There may be other countries like that, or other isiccombs within certain countries where logrealvalueaddedperworker is always missing. I would do this:

Code:

tab country isiccomb if !missing(logrealvalueaddedperworker)

Only those country isiccomb combinations that have a non-zero entry in the cell will have a non-missing value for log initial manufacturing value added per worker. And you might even lose some of those if there is missing data for Growth of value added per worker in some of those.
Comment
Hugo Rocha

Join Date: Feb 2021

Posts: 288
#8

14 Oct 2021, 08:05

Yes, apparently dropping the duplicates, drop a lot of logvalueaddedperworker. I am thinking about getting the last decade available by each country by sector...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30001
#9

14 Oct 2021, 09:44

Yes, apparently dropping the duplicates, drop a lot of logvalueaddedperworker.

Whoa! This is your first mention of dropping duplicates anywhere in this thread, so I don't know what you're referring to. But I'm going to speculate that in building this data set, you had difficulties merging some data sets and you dealt with the problem by dropping observations that were duplicates on some key variable(s) such as perhaps country isiccomb and maybe a year variable or something like that. When you're in a situation like that, where you have to weed out supernumerary observations to get a merge, you have to be very cautious how you do that. You have to first check whether the observations are really complete duplicates, that is, they agree on every variable. Only then is it OK to just drop duplicates. If the observations disagree on other variables, then you have to figure out which one is the correct one to keep or, in some cases, how to combine the discrepant figures you find before retaining some single observation that resolves the discrepancies. If you just arbitrarily drop observations without these precautions, you are then working with a data set that has swept part of the data under the rug and may be completely wrong. In particular, if when you created this data set you had followed these precautions, you would never lose information on a variable. If there were multiple observations agreeing on country and isiccomb (or whatever) but disagreed on logvalueaddedperworker in that some had a real value and others had a missing value, then you can certainly end up with a bad data set that is missing lots of available information. Sounds like you need to do a very careful re-build of your data set, with meticulous attention paid to the handling of supernumerary observations discovered when you attempt to merge (or append) data sets.
Comment
Hugo Rocha

Join Date: Feb 2021

Posts: 288
#10

01 Nov 2021, 17:13

You are correct. I dropped those duplicates because they exactly the same on every observation (based on two different estimates). I am still in the process of figuring it out which estimates to use.

Thanks a lot, and sorry for the late reply!
Comment

Announcement

Picking initial log of value added per worker per country in their initial year

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment