Hi there,
I am working with survey data and the svy commands, and wondering: how can I predict out-of-sample values after svy logit or svy logistic?
In my dataset:
I'm looking through the Stata 15 manual for svy commands and I see the post-estimation predict command but I don't see a detailed example on how to predicto out of sample values (link: https://www.stata.com/manuals/svy.pdf).
Thank you so much for your time and input - I appreciate it!! Please let me know if I can be clearer.
I am working with survey data and the svy commands, and wondering: how can I predict out-of-sample values after svy logit or svy logistic?
In my dataset:
- The records are from five different survey rounds, where the variable, source indicates which round the data is from.
- The sampling weight is round-specific and is stored in the variable "FQweight".
- The samples are two-stage cluster designs with urban-rural and major regions as strata. Given Rounds 1-4 are sampled from a different area than Round 5, the research question here is: how well does R1-R4 data predict R5 data? In other words, for outcomes/variables which are expected to change over the time between R1 and R5, I would like to see if the Round1-4 data accurately predicts the R5 data or not, and, if it does then assess how well the R1-4 data predicts the R5 data.
- To address the research question, the plan is to subset the sample to Rounds1-4 only
- Then run svy: logistic where
- the Y-var is binary and is expected to change over time, and where
- the only X-var is a variable called cmc which stores the stores the time of interview.
- e.g.: svy: logistic mobile cmc
- Then, use the post-estimation predict command to predict R5 values. However, this is where my first question is (see below).
- Then, compare the predicted R5 value with the actual R5 value (this is where my second question is).
- Given the original/non-subsetted dataset contains R5 interview times but I conducted the svy: logistic only for the R1-R4 sample, is it possible to input the R5 interview time data and predict a y-value for Round 5 based on the above svy logistic results for only R1-4 data? How?
- After the above, how would i assess how accurate the R5 prediction is compared to the the actual R5 data I have in the original dataset?
I'm looking through the Stata 15 manual for svy commands and I see the post-estimation predict command but I don't see a detailed example on how to predicto out of sample values (link: https://www.stata.com/manuals/svy.pdf).
Thank you so much for your time and input - I appreciate it!! Please let me know if I can be clearer.
Comment