Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing values

    Dear all,

    I have a question about missing values. I need to create a dummy variable that gives the value "1" when there is a dismissals of the workforce (of Belgian firms, period 2011-2020) >= 0.10
    Information about data:
    - panel data.
    - period 2011-2020
    - around 185 000 observations

    I have calculated the percentage of dismissals as following:
    Code:
    gen Dismissalstototalemployees = Dismissals/L.Totalemployees
    In the next step I generate my dummy variable:
    Code:
    gen Collectiefontslag_10procent = 0
    Because of the fact that the first observation for each firm can't be calculated, I use the following command to copy this missing value into my dummy variable:
    Code:
    replace Collectiefontslag_10procent = . if Dismissalstototalemployees == .
    Next up I replace all the values of this dummy variable to "1" when "Dismissalstototalemployees" >= 0.10:
    Code:
    replace Collectiefontslag_10procent = 1 if Afdankingentovaantalwn >=0.10
    Data would be this right now:
    Code:
    input long ID byte    Collectiefontslag    float(Afdankingentovaantalwn    Collectiefontslag_10procent)
    1 .           . 1
    1 0 .0042328043 0
    1 0  .007556675 0
    1 0  .009876544 0
    1 0   .01178782 0
    1 0  .008716707 0
    1 0  .007302824 0
    1 0  .009398496 0
    1 0  .007111111 0
    1 0  .004752475 0
    2 .           . 1
    2 0   .05988024 0
    2 0  .064935066 0
    2 0   .05072464 0
    2 0   .01090909 0
    2 0   .04477612 0
    2 0  .029661017 0
    2 0   .04329005 0
    2 0   .01826484 0
    2 0  .004464286 0
    3 .           . 1
    3 0  .008974708 0
    3 0  .008898194 0
    3 0  .007133421 0
    3 0  .006935525 0
    3 0  .008522036 0
    3 0   .01293661 0
    3 0  .005654633 0
    3 0  .006094906 0
    3 0  .005598622 0
    4 .           . 1
    4 0  .004157044 0
    4 0 .0009191177 0
    4 0  .002764977 0
    4 0  .003678161 0
    4 0  .003628118 0
    4 0 .0040964955 0
    4 0  .002743484 0
    4 0  .002710027 0
    4 0  .002604167 0
    5 .           . 1
    5 0 .0020686088 0
    5 0 .0023098793 0
    5 0 .0016697588 0
    5 0 .0017384585 0
    5 0 .0018552876 0
    5 0   .00252419 0
    5 0  .002361022 0
    5 0  .002364066 0
    5 0 .0010660981 0
    end
    "Collectiefontslag" = the dummy that I want to create in Stata, but instead I did this in Excel (to compare if the dummy generated in Stata is ok, but for robustness checks I need this code in Stata aswell)
    Afdankingentovaantalwn = Dismissalstototalemployees

    PROBLEM: As you can see, every time there is a missing value in "Dismissalstototalemployees" (Afdankingentovaantalwn), Collectiefontslag_10procent will show "1". As you can compare to the dummy created in Excel (Collectiefontslag), this is not what I want.

    How can I solve this problem, that when there is a missing value, Stata doesn't interpret it as ">= 0.10"?

    Thanks in advance,
    Jordi
    Last edited by Jordi Imbrechts; 29 Apr 2022, 09:28.

  • #2
    Nevermind, stupid question. Fixed it. Thanks.

    Comment


    • #3
      Your post is confusing because you switch back and forth between English and another language (Afrikaans? Dutch?) in your variable name. But I think I know what you want to do, and it can be done more simply, and in one line. I will stick to English names here:
      Code:
      gen Collectiefontslag_10procent = (Dismissals/L.Totalemployees >= 0.10) if !missing(Dismissals, L.Totalemployees)
      Added: Crossed with #2.

      I will also add a general comment: in Stata, missing values are interpreted as larger than any real number. There is no way to get Stata to do otherwise, but, as here, you can use -if- conditions to exclude missing values from the calculation altogether.

      Comment


      • #4
        Thank you Clyde!

        Comment

        Working...
        X