No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Work by family instead of individual

    Hello, I am working with a public survey database that gathers data at the household, family and individual level. Meaning, in each household there is sometimes more than one family and in each family more than one individual.

    There is an id 'folio' for each household and I have created a family id code 'folio_nucleo' through:

    gen folio_nucleo=folio*10+nucleo
    I am trying to link children's variables with their parent's. For example, I want to see if the fact of having a teenage mother affects the child's enrolment in primary education (just an eg.). For this I would have to analyse the presence of a teenage mother and children's enrolment per household.

    I have created variables for 'teenage mother'

    gen madreado1=.
    replace madreado1=1 if madre==1 & s5<20  & (edad-s4)<6
    and for 'children 0 to 6 years not enrolled'

    gen noasis=.
    replace noasis=1 if e3==2  &  edad<7
    The variable for family ID is
    The total number of observations is 216439

    I have tried a few methods but have not yet been successful. I am sorry I am not presenting the code I have tried but I don't think it would help as they were very poor attempts and I didn't really have much idea of what I was trying.

    Thank you very much!
    Last edited by Fernanda Pavez; 07 Sep 2018, 13:17.

  • #2
    It's not clear exactly what you are asking here. So I'll just make some observations that, I hope will be helpful, but may not be.

    Your approach to generating these indicator variables is misguided. I see this here on Statalist a lot, and I don't know where it comes from, who teaches people to do it, nor why it is so apparently popular, but it's a recipe for trouble.

    NEVER create an indicator variable with:
    gen indicator = .
    replace indicator = 1 if whatever
    You end up with a variable that is coded 1 for yes and missing value for no. But missing value is a dangerous way to code no in Stata because, depending on how the variable is used, Stata will interpret missing value to mean either "Yes" or "omit this entire observation from the analysis", depending on the command. Either way you will get garbage from your analysis. Always code your indicator variables as 1 for yes and 0 for no. The code would be:

    gen indicator = (whatever)
    (It is sometimes necessary to add an additional line of code to create missing values of indicator in those observations where the data do not allow you to determine the truth or falsity of whatever. Those are genuinely missing values, not missing values pretending to be "no."
    (Note: the parentheses around the logical condition whatever are usually not necessary, but I think they enhance the readability of the code.)

    If you have other binary variables in your data set that are coded as 1 for yes and missing value for no, you will almost surely get into trouble with those sooner or later. So change them all now, before that happens. (-help recode- may be useful for this.)

    I cannot comment on whether the actual conditions you are using in your code are appropriate for identifying an adolescent mother, as you have provided no information about the meaning of any of those variables.

    If these remarks do not resolve your concerns, I suggest you post back with a more clearly posed question, and also provide a) example data using the -dataex- command, and b) information about the variables themselves.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.


    • #3
      Thank you. I will recode variables as you suggested and I do see the problem you point out. As for my initial question, I have already solved it but will take your suggestions into consideration for future posts. Thanks again.