Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a time-variant variable reflecting number of co-habitants

    Hello everyone,

    I am working with cohort data from a DSS. I have time-variant variants such as household or education (people can change were they live or their educational status with time). The database is structued in a way that I have several lines for each individual, with different entry and exit dates.

    I want to create a variable capturing the "number of co-habitants" as a time-variant variable. This is not simple, as each individual in each household have different entry and exit dates. I want to caputure the number of co-habitants for each individuals in a time changing manner. Which code would you use?

    Attached a simplified example of how the data could look like for one household:




    Thank you very much in advance!!
    Attached Files

  • #2
    I'd like to suggest three clarifications:

    1) Please use -dataex- (see FAQ http://www.statalist.org/forums/help) to turn that data example into Stata code so that forum users and import the data and try their codes. Screenshot is hard to apply as it takes time to hand type your cases, which would lead to a low response rate to your question.

    2) Provided are only the in/out dates for each person. And they may not 100% overlap (e.g. a 5 years old grandchild may only have spent 2.6 years with the grandma before she passed away), what are the exact rules here?

    3) Usually, it may be useful if you can provided an extra column of data to show what your "wanted" variable would look like, making it easier to visualize what's the ask.

    Comment


    • #3
      Dear Ken,

      Sorry for not using dataex. Below the same simplified data for 2 households.

      Regarding question 2 this is actually the problem! It would be easy to establish the number of maximum co-habitants per household, or the number of households when an individual starts living in a household. However, the interesting thing here is to consider co-habitants as a time-variant variable because, as you said, they do not overlap 100% (there are deaths, births and migrations). The number of co-habitants is a potential confounder for our outcome of interest and we need to take it into account in the analysis.

      I think that, as with other time-variant variables, it would be necessary to create more lines per individual (one per each addition or substration of a co-habitant and the moment of this reflected by the entry/exit dates). This is in addition to creating a column of "num of cohabitants" summing the number of co-habitants at that timepoint. However, I do not know how to transform this concept into code and this is why I am writting here.

      Thank you!




      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str12 perm_id float dob str1 gender float gender2 str3 starttype float startdate str3 endtype float enddate str9 household float death
      "1" 17809 "M" 1 "BIR" 17809 "DTH" 18124 "1" 1
      "2" 18913 "M" 1 "BIR" 18913 "EXT" 19308 "1" 0
      "2" 18913 "M" 1 "ENT" 20712 "NA"  22421 "1" 0
      "3" 20299 "M" 1 "ENT" 20773 "NA"  22421 "1" 0
      "4" 19822 "M" 1 "ENT" 20004 "EXT" 20253 "1" 0
      "5" 19652 "F" 2 "BIR" 19652 "EXT" 19653 "1" 0
      "5" 19652 "F" 2 "ENT" 20018 "EXT" 20254 "1" 0
      "5" 18461 "F" 2 "ENT" 18635 "NA"  22421 "1" 0
      "6" 17772 "F" 2 "ENT" 17886 "EXT" 20775 "2" 0
      "7" 19456 "F" 2 "BIR" 19456 "EXT" 20985 "2" 0
      end
      format %td dob
      format %td startdate
      format %td enddate

      Comment


      • #4
        Dear Ken,

        Sorry for not using dataex. Below the same simplified data for 2 households.

        Regarding question 2 this is actually the problem! It would be easy to establish the number of maximum co-habitants per household, or the number of households when an individual starts living in a household. However, the interesting thing here is to consider co-habitants as a time-variant variable because, as you said, they do not overlap 100% (there are deaths, births and migrations). The number of co-habitants is a potential confounder for our outcome of interest and we need to take it into account in the analysis.

        I think that, as with other time-variant variables, it would be necessary to create more lines per individual (one per each addition or substration of a co-habitant and the moment of this reflected by the entry/exit dates). This is in addition to creating a column of "num of cohabitants" summing the number of co-habitants at that timepoint. However, I do not know how to transform this concept into code and this is why I am writting here.

        Thank you!




        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str12 perm_id float dob str1 gender float gender2 str3 starttype float startdate str3 endtype float enddate str9 household float death
        "1" 17809 "M" 1 "BIR" 17809 "DTH" 18124 "1" 1
        "2" 18913 "M" 1 "BIR" 18913 "EXT" 19308 "1" 0
        "2" 18913 "M" 1 "ENT" 20712 "NA"  22421 "1" 0
        "3" 20299 "M" 1 "ENT" 20773 "NA"  22421 "1" 0
        "4" 19822 "M" 1 "ENT" 20004 "EXT" 20253 "1" 0
        "5" 19652 "F" 2 "BIR" 19652 "EXT" 19653 "1" 0
        "5" 19652 "F" 2 "ENT" 20018 "EXT" 20254 "1" 0
        "5" 18461 "F" 2 "ENT" 18635 "NA"  22421 "1" 0
        "6" 17772 "F" 2 "ENT" 17886 "EXT" 20775 "2" 0
        "7" 19456 "F" 2 "BIR" 19456 "EXT" 20985 "2" 0
        end
        format %td dob
        format %td startdate
        format %td enddate

        Comment


        • #5
          While you mentioned you'd like it to be "time varying," there isn't enough details on how to go about making the longitudinal data (e.g. Per day? Per year?). What I presented here, hopefully, may be useful to you:

          Code:
          preserve
          * Compute a number of stay (days) for each id-stay pair:
          gen durationday = enddate - startdate + 1
          * Expand the data so that if a member lived for X days, there are X lines:
          expand durationday
          * Create a date variable (day) spanning for startdate to enddate:
          bysort perm_id startdate: gen dayseq = startdate + _n - 1
          format %td dayseq
          * Compute the total family members on every available day:
          gen case = 1
          bysort dayseq: egen totalmember = total(case)
          * Collapse by id-stay pair, generate the min, max, and mean number of members during the stay:
          collapse (min) min_member=totalmember ///
                   (max) max_member=totalmember ///
                   (mean) avg_member=totalmember, by(perm_id startdate)
          save temp01, replace
          restore
          
          * Merge the three numbers back to the main database:
          merge 1:1 perm_id startdate using temp01
          erase temp01.dta // Erase the temp01 data
          
          * Examine the data
          drop _merge
          list startdate enddate *_member, sep(0)
          The minimal, maximal, and average (base on daily data) of family member during the stays are merged to the main data. Output is like this:

          Code:
               +--------------------------------------------------------+
               | startdate     enddate   min_me~r   max_me~r   avg_me~r |
               |--------------------------------------------------------|
            1. | 04oct2008   15aug2009          1          2   1.756329 |
            2. | 13oct2011   11nov2012          3          3          3 |
            3. | 15sep2016   21may2021          3          5   3.161988 |
            4. | 15nov2016   21may2021          3          5   3.130988 |
            5. | 08oct2014   14jun2015          4          5      4.944 |
            6. | 08jan2011   21may2021          2          5   3.090045 |
            7. | 21oct2013   22oct2013          4          4          4 |
            8. | 22oct2014   15jun2015          4          5    4.99578 |
            9. | 20dec2008   17nov2016          1          5   2.609689 |
           10. | 08apr2013   15jun2017          3          5   3.500654 |
               +--------------------------------------------------------+
          If you really want a full-blown time series data, you can take the -preserve- and -restore- lines out and examine if the data "temp01" can be used. It expanded the data into day level, each line represent each member's presence in the household on that day. You can then use collapse to summarize them into whatever temporal resolution you'd like.

          Comment

          Working...
          X