Dear all,
I am interested in testing the robustness of my logistic regression results given some uncertainty.
My output is, for example, yes/no regarding whether or not someone attended a university hospital for a heart attack. I have a dataset including the variables year (2011, 2012), age, sex, economic status, education status, etc. I can link data between 2011 and 2012, but not before 2011 nor after 2012. I am only interested in the first hospitalization, and have de-duped my dataset. In 2012, 10% of “primary cases” weren’t actually primary as they had a primary event in 2011. Therefore, I worry that roughly 10% of individuals hospitalized in 2011 didn’t actually have their first hospitalization in 2011. To make sure my final results are robust even with 10% extra cases, I’d like to exclude 10% of people in 2011 – but I want to randomly exclude them given that I do not know who exactly should not be included in my analysis.
Initially, I thought to maybe use the command bootstrap. But, my question is, how do I do this in a loop so that I can randomly exclude 10% of the population, but over multiple times so that every time I have a slightly different population...but then can still come up with a final overall odds ratio -- the kind of mean OR. I thought that I could write a program for this, but it doesn’t seem to work given that the e(b) from a logistic regression is not stored as a scalar. It might also be possible to do a sort of meta-analysis of the different odds ratios based on the results using samples excluding different persons. But, this doesn't seem to be the "cleanest" method. This isn't a main analysis, only a sensitivity analysis to make sure that my results wouldn't be affected if the above stated scenario (i.e., extra 10% of people in 2011 without a primary episode) were indeed true.
Does anyone know of another command? Or has come across a similar issue?
Thank you for your help!
I am interested in testing the robustness of my logistic regression results given some uncertainty.
My output is, for example, yes/no regarding whether or not someone attended a university hospital for a heart attack. I have a dataset including the variables year (2011, 2012), age, sex, economic status, education status, etc. I can link data between 2011 and 2012, but not before 2011 nor after 2012. I am only interested in the first hospitalization, and have de-duped my dataset. In 2012, 10% of “primary cases” weren’t actually primary as they had a primary event in 2011. Therefore, I worry that roughly 10% of individuals hospitalized in 2011 didn’t actually have their first hospitalization in 2011. To make sure my final results are robust even with 10% extra cases, I’d like to exclude 10% of people in 2011 – but I want to randomly exclude them given that I do not know who exactly should not be included in my analysis.
Initially, I thought to maybe use the command bootstrap. But, my question is, how do I do this in a loop so that I can randomly exclude 10% of the population, but over multiple times so that every time I have a slightly different population...but then can still come up with a final overall odds ratio -- the kind of mean OR. I thought that I could write a program for this, but it doesn’t seem to work given that the e(b) from a logistic regression is not stored as a scalar. It might also be possible to do a sort of meta-analysis of the different odds ratios based on the results using samples excluding different persons. But, this doesn't seem to be the "cleanest" method. This isn't a main analysis, only a sensitivity analysis to make sure that my results wouldn't be affected if the above stated scenario (i.e., extra 10% of people in 2011 without a primary episode) were indeed true.
Does anyone know of another command? Or has come across a similar issue?
Thank you for your help!
Comment