Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • No observations error despite observations present for same household

    Dear all,

    I have a panel dataset with a hhid, a dummy reflecting whether the responded is a husband (1) or wife (0). I have generated a new variable, which reflects the education level of the husband and the wife. However, when I tabulate husb_educ if wife_educ != . I get the a message that there are no observations, even though as can be seen by the data example, there are instances where we have observations for both husb_educ and wife_educ in the same household. Moreover, husb_educ and wife_educ were generated using the command: bysort hhid: egen husb_educ = max(educ_level) if husbandwife == 1 (for the husband, and the same procedure for the wife with the dummy = 0). Both education variables take values between 1 and 9. Apologies for any mistakes as I am new to the forum and thank you in advance.

    input double hhid float(husbandwife wife_educ husb_educ)
    10600 0 . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 0 . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 0 . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 0 . .
    10600 . . .
    10600 . . .
    10600 . . .
    10600 0 . .
    10600 . . .
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 . . .
    10800 0 2 .
    10800 0 2 .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 0 2 .
    10800 1 . 2
    10800 . . .
    10800 1 . 2
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 0 2 .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 1 . 2
    10800 1 . 2
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 1 . 2
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 . . .
    10800 0 2 .
    10800 . . .
    10800 . . .
    12200 . . .
    12200 1 . 2
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .
    12200 . . .



  • #2
    In fact, in your example data, there aren't any observations where both husb_educ and wife_educ are non-missing. And given the way you generated those variables, I am quite certain that there aren't any such anywhere in your entire data set. That's because the way you created those variables does not do what you think. What you needed to do is:

    Code:
    by hhid, sort: egen husb_educ = max(cond(husbandwife == 1, educ_level, .))
    by hhid, sort: egen wife_educ = max(cond(husbandwife == 0, educ_level, .))
    egen tag = tag(hhid)
    tab husb_educ if flag & !missing(wife_educ)
    By using the -if- clauses instead of the -cond()- function, you caused Stata to populate the husband's education only into the observation where husbandwife == 1, and the wife's only into the observation where husbandwife == 0. As a result there are no observations that contain both husb_educ and wife_educ non-missing values. And it goes downhill from there.

    In the code I show here, the only thing to be careful about is that now husb_educ and wife_educ are populated in every observation of the hhid. Consequently, to avoid double-counting, you need to one observation in each hhid and do your tabulation only on the tagged observations. That's what the last two lines of my code do.

    Comment


    • #3
      This worked perfectly. Thank you so much Clyde.

      Comment

      Working...
      X