hello all! Stata S/E 16.1 user.
I have been trying to find out how to do what I am trying to do for awhile now. I have figured out how to identify my issue/patterns of missing and I am looking for how to address efficiently in order to replicate different scenarios for sensitivity analysis. I want to eliminate state-years with missing observations for a particular variable above (and keep if below) a certain threshold. All of my variables are categorical. I have read the FAQ's and I hope I am in line with its requirements.
This is how I identified my pattern:
foreach state in 1 2 5 6 8 11 12 15 16 17 19 20 22 24 26 28 29 30 32 33 34 35 38 39 45 46 47 48 49 56 {
tab ADMYR wt1 if STFIPS==`state', nofreq row missing
}
Below is a good example of one state that I would like to address "wholesale" across all states in my analysis. The missing-ness across states is "random" in that different states have different patterns of relatively hi/low missing-ness in different years (or no high missing-ness at all). I have already dropped states with "too high" missing to be included in the analysis at this point (eg 90% or above).
I would like to be able to systematically drop (or keep) states-years at different thresholds of missing-ness. In this one state example, 2011 and 2012 would be dropped and all other states would retain a different set of years (or all years in many cases).
My initial approach to this runs into a wall as I cannot create a variable for the percentage of missing based on the "divide something by missing or divide missing by something" yields missing. I do realize I may be going about all of this in a very clunky manner. I am trying to get better and faster, and I hope that the forthcoming lessons from you all will help to that end.
I do appreciate your time in advance, and any and all help.
ONE STATE EXAMPLE:
On another note - I have read into the implications and ways to address dropping entire state-years from a data set in the context of D-in-D analysis, and I may need some additional advice there. I do see there is quite a bit out there (and on here) on this topic. If someone has any specific considerations for these implications (dropping entire state-years from a data set in the context of D-in-D analysis) based on the original question and if there is enough context, I would appreciate that. I realize that may be a separate topic, and apologies if I am reaching with this "another note".
Thanks again for tips, considerations and code.
Cam Bigler
I have been trying to find out how to do what I am trying to do for awhile now. I have figured out how to identify my issue/patterns of missing and I am looking for how to address efficiently in order to replicate different scenarios for sensitivity analysis. I want to eliminate state-years with missing observations for a particular variable above (and keep if below) a certain threshold. All of my variables are categorical. I have read the FAQ's and I hope I am in line with its requirements.
This is how I identified my pattern:
foreach state in 1 2 5 6 8 11 12 15 16 17 19 20 22 24 26 28 29 30 32 33 34 35 38 39 45 46 47 48 49 56 {
tab ADMYR wt1 if STFIPS==`state', nofreq row missing
}
Below is a good example of one state that I would like to address "wholesale" across all states in my analysis. The missing-ness across states is "random" in that different states have different patterns of relatively hi/low missing-ness in different years (or no high missing-ness at all). I have already dropped states with "too high" missing to be included in the analysis at this point (eg 90% or above).
I would like to be able to systematically drop (or keep) states-years at different thresholds of missing-ness. In this one state example, 2011 and 2012 would be dropped and all other states would retain a different set of years (or all years in many cases).
My initial approach to this runs into a wall as I cannot create a variable for the percentage of missing based on the "divide something by missing or divide missing by something" yields missing. I do realize I may be going about all of this in a very clunky manner. I am trying to get better and faster, and I hope that the forthcoming lessons from you all will help to that end.
I do appreciate your time in advance, and any and all help.
ONE STATE EXAMPLE:
Year of | ZT1 | ||||||
admission | 0 | 1 | 2 | 3 | 4 | . | Total |
2009 | 17.66 | 34.81 | 20.60 | 15.72 | 10.38 | 0.83 | 100.00 |
2010 | 28.01 | 34.38 | 16.55 | 12.77 | 7.75 | 0.54 | 100.00 |
2011 | 17.58 | 20.24 | 6.89 | 5.03 | 3.13 | 47.13 | 100.00 |
2012 | 20.88 | 12.37 | 6.36 | 6.18 | 4.86 | 49.35 | 100.00 |
2013 | 44.93 | 22.07 | 12.81 | 10.10 | 9.83 | 0.26 | 100.00 |
2014 | 64.03 | 16.42 | 8.07 | 5.32 | 6.04 | 0.12 | 100.00 |
2015 | 65.46 | 15.34 | 6.82 | 5.33 | 7.01 | 0.04 | 100.00 |
2016 | 63.85 | 18.02 | 6.09 | 5.44 | 6.51 | 0.08 | 100.00 |
2017 | 65.66 | 14.50 | 6.47 | 6.28 | 6.94 | 0.14 | 100.00 |
2018 | 64.18 | 17.31 | 6.15 | 6.42 | 5.73 | 0.21 | 100.00 |
2019 | 69.85 | 17.24 | 5.50 | 4.55 | 2.66 | 0.20 | 100.00 |
2020 | 64.86 | 19.03 | 6.17 | 5.63 | 4.21 | 0.09 | 100.00 |
Total | 48.54 | 20.31 | 9.18 | 7.49 | 6.33 | 8.14 | 100.00 |
Thanks again for tips, considerations and code.
Cam Bigler
Comment