Hi there,
I am currently working on a longitudinal analysis of survey data. My analysis is only interested in a subsample of the full sample; therefore, I created a dummy variable that indicates whether an observation meets the criteria for the subsample (1 = eligible observation, 0 = non-eligible observation).
When I use the count function to determine the number of eligible observations, the sample size is 2,576. Yet, 435 of these observations have a 0 weight. Thus, I anticipated that the sample size for the subsample would be 2141.
However, when I go to generate descriptive and inferential statistics using the sample sizes differ.
When I generate the mean age for my subsample (see below), the sample size for the subsample is 2141, as expected.

However, when I go to generate the proportions for racial discrimination (one of my predictors), the sample size drops to 2127. I thought this may be due to missingness in the racial discrimination variable, and I identified 15 observations that had .a (our code for not applicable) for this variable. However, the removal of these observations for the calculation of the proportions would result in a sample size of 2126, not 2127.

Further to this, when I ran nested logistic regression models on the sample subsample, the sample size was reduced even further to 2095. Again, I tested to see if this was due to missingness in my analytical variables, which identified that 68 observations had .a in one or more of my analytical variables. However, if these observations were not included in the logistic models, then the sample size would be 2073, not 2095.

If anybody has any insight into why these sample sizes differ, it would be greatly appreciated.
I am currently working on a longitudinal analysis of survey data. My analysis is only interested in a subsample of the full sample; therefore, I created a dummy variable that indicates whether an observation meets the criteria for the subsample (1 = eligible observation, 0 = non-eligible observation).
When I use the count function to determine the number of eligible observations, the sample size is 2,576. Yet, 435 of these observations have a 0 weight. Thus, I anticipated that the sample size for the subsample would be 2141.
However, when I go to generate descriptive and inferential statistics using the sample sizes differ.
When I generate the mean age for my subsample (see below), the sample size for the subsample is 2141, as expected.
However, when I go to generate the proportions for racial discrimination (one of my predictors), the sample size drops to 2127. I thought this may be due to missingness in the racial discrimination variable, and I identified 15 observations that had .a (our code for not applicable) for this variable. However, the removal of these observations for the calculation of the proportions would result in a sample size of 2126, not 2127.
Further to this, when I ran nested logistic regression models on the sample subsample, the sample size was reduced even further to 2095. Again, I tested to see if this was due to missingness in my analytical variables, which identified that 68 observations had .a in one or more of my analytical variables. However, if these observations were not included in the logistic models, then the sample size would be 2073, not 2095.
If anybody has any insight into why these sample sizes differ, it would be greatly appreciated.
Comment