Missing values

Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#1

Missing values

29 Apr 2022, 09:25

Dear all,

I have a question about missing values. I need to create a dummy variable that gives the value "1" when there is a dismissals of the workforce (of Belgian firms, period 2011-2020) >= 0.10
Information about data:
- panel data.
- period 2011-2020
- around 185 000 observations

I have calculated the percentage of dismissals as following:

Code:

gen Dismissalstototalemployees = Dismissals/L.Totalemployees

In the next step I generate my dummy variable:

Code:

gen Collectiefontslag_10procent = 0

Because of the fact that the first observation for each firm can't be calculated, I use the following command to copy this missing value into my dummy variable:

Code:

replace Collectiefontslag_10procent = . if Dismissalstototalemployees == .

Next up I replace all the values of this dummy variable to "1" when "Dismissalstototalemployees" >= 0.10:

Code:

replace Collectiefontslag_10procent = 1 if Afdankingentovaantalwn >=0.10

Data would be this right now:

Code:

input long ID byte Collectiefontslag float(Afdankingentovaantalwn Collectiefontslag_10procent) 1 . . 1 1 0 .0042328043 0 1 0 .007556675 0 1 0 .009876544 0 1 0 .01178782 0 1 0 .008716707 0 1 0 .007302824 0 1 0 .009398496 0 1 0 .007111111 0 1 0 .004752475 0 2 . . 1 2 0 .05988024 0 2 0 .064935066 0 2 0 .05072464 0 2 0 .01090909 0 2 0 .04477612 0 2 0 .029661017 0 2 0 .04329005 0 2 0 .01826484 0 2 0 .004464286 0 3 . . 1 3 0 .008974708 0 3 0 .008898194 0 3 0 .007133421 0 3 0 .006935525 0 3 0 .008522036 0 3 0 .01293661 0 3 0 .005654633 0 3 0 .006094906 0 3 0 .005598622 0 4 . . 1 4 0 .004157044 0 4 0 .0009191177 0 4 0 .002764977 0 4 0 .003678161 0 4 0 .003628118 0 4 0 .0040964955 0 4 0 .002743484 0 4 0 .002710027 0 4 0 .002604167 0 5 . . 1 5 0 .0020686088 0 5 0 .0023098793 0 5 0 .0016697588 0 5 0 .0017384585 0 5 0 .0018552876 0 5 0 .00252419 0 5 0 .002361022 0 5 0 .002364066 0 5 0 .0010660981 0 end

"Collectiefontslag" = the dummy that I want to create in Stata, but instead I did this in Excel (to compare if the dummy generated in Stata is ok, but for robustness checks I need this code in Stata aswell)
Afdankingentovaantalwn = Dismissalstototalemployees

PROBLEM: As you can see, every time there is a missing value in "Dismissalstototalemployees" (Afdankingentovaantalwn), Collectiefontslag_10procent will show "1". As you can compare to the dummy created in Excel (Collectiefontslag), this is not what I want.

How can I solve this problem, that when there is a missing value, Stata doesn't interpret it as ">= 0.10"?

Thanks in advance,
Jordi

Last edited by Jordi Imbrechts; 29 Apr 2022, 09:28.
Tags: None
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#2

29 Apr 2022, 09:35

Nevermind, stupid question. Fixed it. Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#3

29 Apr 2022, 09:45

Your post is confusing because you switch back and forth between English and another language (Afrikaans? Dutch?) in your variable name. But I think I know what you want to do, and it can be done more simply, and in one line. I will stick to English names here:

Code:

gen Collectiefontslag_10procent = (Dismissals/L.Totalemployees >= 0.10) if !missing(Dismissals, L.Totalemployees)

Added: Crossed with #2.

I will also add a general comment: in Stata, missing values are interpreted as larger than any real number. There is no way to get Stata to do otherwise, but, as here, you can use -if- conditions to exclude missing values from the calculation altogether.
Comment
Jordi Imbrechts

Join Date: Apr 2022

Posts: 44
#4

29 Apr 2022, 11:35

Thank you Clyde!
Comment

Announcement

Comment

Comment

Comment