Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How Stata parses if statements (e.g., touse).

    Hi Everyone,

    I am writing a program that uses the mark or markout function in a program. I noticed that the number of observations I am getting fluctuates in an odd manner. I can replicate the issue using a simplified version of my program.

    Code:
    sysuse census.dta, clear 
    replace divorce = . in 1/3 
    qui regress death marriage divorce 
    predict yhat if e(sample) 
    replace yhat = 1 if yhat != .
    This is where my confusion is:
    Code:
    sum yhat 
    sum medage if yhat 
    sum medage if yhat == 1
    As you can see, when Stata summarizes yhat it reports there are 47 observations. This is expected.

    When Stata summarizes medage if y, it reports there are 50 observations.

    But when Stata summarizes medage if y == 1, it reports there are 47 observations.

    My question is how is Stata interpreting the following statement:

    Code:
    sum medage if yhat
    I was incorrectly assuming that it would interpret it the same as:

    Code:
    sum medage if yhat == 1
    My assumption was based on how it parses statements like:

    Code:
    tab var if `touse'
    Alternatively, I am aware that Stata will interpret non-zero and non-missing values as being included in an if qualifier:

    Code:
    svy, subpop(filter): mean var
    What I'm finding surprising is that it's summarizing the data even if the filter variable is including missing values. Could someone enlighten me as to why this is happening?

    Cheers,

    David.

    PS: I don't think that this is a bug, but a misunderstanding on my part as to what Stata is doing.

  • #2
    It's more to do with how Stata evaluates logical true and false. Stata says that 0=false and non-zero (including missings!) are true. So when you made -yhat-, the three observations that were excluded from estimation had a value of missing.

    Code:
    sum medage if yhat
    sum medage if yhat == 1
    Since -yhat- only has values of missing (.) and 1, these are both literally interpreted as true and so yield the same result.

    Comment


    • #3
      Leonardo Guizzetti Ahhhh, that makes more sense, the `touse' is coded 0/1 which explains that pattern of exclusion. I was thinking about what Stata included instead of what Stata excluded.

      Comment

      Working...
      X