Dear all,
For my thesis I have to research the causes of layoffs for (big) Belgian firms. My data consists of unbalanced panel data, period 2011-2020. In the first part of my results section I analyse the summary statistics and bivariate analysis (t-test/ranksum test).
The screenshot gives a table of the summary statistics of my variables. Sorry for it being in Dutch, but my problem should be clear with the information underneath the screenshot.
(Variables; #observations; mean; median, min; max)
Due to the fact that I have to much variation in #observations for my variables (for example 185 147 for Ln(Age) & only 106 419 for my 2 independent variables about productivity --> Apparently my regression wil take maximum those 106 419 firms into account --> those 80 000 extra observations can skew my sample significantly..
My promotor advised me to delete observations with a missing value.
After reading the following article ( https://www.stata.com/support/faqs/d...issing-values/ ), I do not quite understand the proposed solution. The article talks about missing values at the beginning and end, but in my case I do not quite understand how to fix the problem.
Originally I thought the following code (and doing this for all my variables) would be OK, but apparently it is not:
I understand that this is rather a dumb question, but apparently complex enough for me. I hope that the problem is clearly described. If not, please let me know.
Thanks in advance,
Jordi
For my thesis I have to research the causes of layoffs for (big) Belgian firms. My data consists of unbalanced panel data, period 2011-2020. In the first part of my results section I analyse the summary statistics and bivariate analysis (t-test/ranksum test).
The screenshot gives a table of the summary statistics of my variables. Sorry for it being in Dutch, but my problem should be clear with the information underneath the screenshot.
Due to the fact that I have to much variation in #observations for my variables (for example 185 147 for Ln(Age) & only 106 419 for my 2 independent variables about productivity --> Apparently my regression wil take maximum those 106 419 firms into account --> those 80 000 extra observations can skew my sample significantly..
My promotor advised me to delete observations with a missing value.
After reading the following article ( https://www.stata.com/support/faqs/d...issing-values/ ), I do not quite understand the proposed solution. The article talks about missing values at the beginning and end, but in my case I do not quite understand how to fix the problem.
Originally I thought the following code (and doing this for all my variables) would be OK, but apparently it is not:
Code:
drop if DalingProductiviteit2J >= .
Thanks in advance,
Jordi
Comment