Hi all,
I hope this email finds you well during these hard times.
I am quite new in this platform and don't know quite how to use it, but i have been reading and learning a lot from this platform, by reading your comments to students in solving stata problems. Therefore, i decided to write with the hope that you could help me in resolving once coding problem.
I am trying to create a code which would drop the whole series of daily stock return observations (aaa bbb or ccc), if more than 50% of the daily returns are missing or are equal to 0 in any year (2010 or 2011 or 2012…). My data is consisted of daily stock returns for the period from january 2010 until may 2020.
I tried one code that I found in the forum that suggested:
glo p = 0.5
* Loop over variables
foreach var of varlist * {
count if missing(`var')
if (r(N)/_N) >= $p drop `var'
}
However this code deletes the variables that have more than 50% missing in the whole period (10 years). I would like to delete the whole series if they have missing values or return=0 in more than 50% of the cases in any One Year perid; no matter which year it is (2010, 2011 or 2012).
I tried the code:
foreach var of varlist * {
bysort year: count if missing(`var')
if r(N) >=128 drop `var'
}
However, this code even though counts the missing observations in each year for each company. It doesn’t drop the series if the missing values are more than 128 (50% of the trading days) in one of the years. My assumption is that counted variables are not stored, as in the first example r(N), but in some other way.
I hope that you will find time to answer my question. Stay safe!
Thank you in advance,
Best regards,
Artan
I hope this email finds you well during these hard times.
I am quite new in this platform and don't know quite how to use it, but i have been reading and learning a lot from this platform, by reading your comments to students in solving stata problems. Therefore, i decided to write with the hope that you could help me in resolving once coding problem.
I am trying to create a code which would drop the whole series of daily stock return observations (aaa bbb or ccc), if more than 50% of the daily returns are missing or are equal to 0 in any year (2010 or 2011 or 2012…). My data is consisted of daily stock returns for the period from january 2010 until may 2020.
I tried one code that I found in the forum that suggested:
glo p = 0.5
* Loop over variables
foreach var of varlist * {
count if missing(`var')
if (r(N)/_N) >= $p drop `var'
}
However this code deletes the variables that have more than 50% missing in the whole period (10 years). I would like to delete the whole series if they have missing values or return=0 in more than 50% of the cases in any One Year perid; no matter which year it is (2010, 2011 or 2012).
I tried the code:
foreach var of varlist * {
bysort year: count if missing(`var')
if r(N) >=128 drop `var'
}
However, this code even though counts the missing observations in each year for each company. It doesn’t drop the series if the missing values are more than 128 (50% of the trading days) in one of the years. My assumption is that counted variables are not stored, as in the first example r(N), but in some other way.
I hope that you will find time to answer my question. Stay safe!
Thank you in advance,
Best regards,
Artan
Comment