Hi Everyone,
I am writing a program that uses the mark or markout function in a program. I noticed that the number of observations I am getting fluctuates in an odd manner. I can replicate the issue using a simplified version of my program.
This is where my confusion is:
As you can see, when Stata summarizes yhat it reports there are 47 observations. This is expected.
When Stata summarizes medage if y, it reports there are 50 observations.
But when Stata summarizes medage if y == 1, it reports there are 47 observations.
My question is how is Stata interpreting the following statement:
I was incorrectly assuming that it would interpret it the same as:
My assumption was based on how it parses statements like:
Alternatively, I am aware that Stata will interpret non-zero and non-missing values as being included in an if qualifier:
What I'm finding surprising is that it's summarizing the data even if the filter variable is including missing values. Could someone enlighten me as to why this is happening?
Cheers,
David.
PS: I don't think that this is a bug, but a misunderstanding on my part as to what Stata is doing.
I am writing a program that uses the mark or markout function in a program. I noticed that the number of observations I am getting fluctuates in an odd manner. I can replicate the issue using a simplified version of my program.
Code:
sysuse census.dta, clear replace divorce = . in 1/3 qui regress death marriage divorce predict yhat if e(sample) replace yhat = 1 if yhat != .
Code:
sum yhat sum medage if yhat sum medage if yhat == 1
When Stata summarizes medage if y, it reports there are 50 observations.
But when Stata summarizes medage if y == 1, it reports there are 47 observations.
My question is how is Stata interpreting the following statement:
Code:
sum medage if yhat
Code:
sum medage if yhat == 1
Code:
tab var if `touse'
Code:
svy, subpop(filter): mean var
Cheers,
David.
PS: I don't think that this is a bug, but a misunderstanding on my part as to what Stata is doing.
Comment