Greetings,
I wanted to ask some questions regarding percentile dummy generation. Among academic papers, there is a kind of variable called "High_Var", which basically is a dummy equaling 1 when the year firm-specific observation is greater than the sample's median.
So the questions are, when generating the median of the sample:
1. Say I separate my sample into two sub-groups, called Developed (D1) and Developing (D0) economies, when generating this High_Var dummy variable, do I only use the Full sample's median (F1) to evaluate the High_Var, or the sub-group-specific median, say, two medians from each sup-group: Median_Income_D1 and Median_Income_D0?
2. The generation of the median, as I expect, would exclude missing values. These missing values occur due to actual missing values from the dataset I used, and so, when forming the High_Var dummy, should the values of the High_Var for missing values of Var be missing as well, or can it be 0?
3. Back to the missing value issue, the dataset I use is the aggregated patents granted on a country-level and has a time range of 2014~2023, yet, there are missing values for some countries that varies from 2015~2017. What would be the best approach to deal with this? Do I accept such missing values simply as missing? Or do I give them a zero since patents granted can't be below zero? what about aggregated investments into sustainable projects?
Thank you so much for your time and assistance!
I wanted to ask some questions regarding percentile dummy generation. Among academic papers, there is a kind of variable called "High_Var", which basically is a dummy equaling 1 when the year firm-specific observation is greater than the sample's median.
So the questions are, when generating the median of the sample:
1. Say I separate my sample into two sub-groups, called Developed (D1) and Developing (D0) economies, when generating this High_Var dummy variable, do I only use the Full sample's median (F1) to evaluate the High_Var, or the sub-group-specific median, say, two medians from each sup-group: Median_Income_D1 and Median_Income_D0?
2. The generation of the median, as I expect, would exclude missing values. These missing values occur due to actual missing values from the dataset I used, and so, when forming the High_Var dummy, should the values of the High_Var for missing values of Var be missing as well, or can it be 0?
3. Back to the missing value issue, the dataset I use is the aggregated patents granted on a country-level and has a time range of 2014~2023, yet, there are missing values for some countries that varies from 2015~2017. What would be the best approach to deal with this? Do I accept such missing values simply as missing? Or do I give them a zero since patents granted can't be below zero? what about aggregated investments into sustainable projects?
Thank you so much for your time and assistance!
Comment