Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extrapolating ages based on dates

    I have panel data from surveys administered fairly regularly. There is a uniquely annoying situation where the respondent's family member's ages were recorded in only one survey, as you can see in the data below.

    My goal is to use this to extrapolate the family member ages across all other survey dates, and then use that to calculate the percent likelihood that they have what we define as a "young child" in their family, which is a child age 4 or younger. The issue, however, is that we do not have their birthdays.

    My inclination is to do something like the following: if Survey A was administered 100 days before Survey B (in which we observe ages), then there is a 1-(100/365) = 73% chance they're still the same age at Survey A that they were at Survey B (assuming an equal chance of a birthday on any given day). So then try to use that to try to back out a % likelihood that they have at least one child age 4 or younger (which will most often be 0 or 1 in my full dataset, but will sometimes be a fractional percentage if they have a child right around age 4/age 5 cutoff).

    Does this make sense conceptually, or is there a better way to approach this? And if anyone can wrap their head around the most efficient code to begin to execute this, I would be very appreciative.





    Data here:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(new_id date) byte(new_age1 new_age2 new_age3 new_age4 new_age5 new_age6 new_age7 new_age8)
    28 21768 . . . . . . . .
    28 21769 . . . . . . . .
    28 21770 . . . . . . . .
    28 21771 . . . . . . . .
    28 21772 . . . . . . . .
    28 21773 . . . . . . . .
    28 21774 . . . . . . . .
    28 21775 . . . . . . . .
    28 21776 . . . . . . . .
    28 21777 . . . . . . . .
    28 21778 . . . . . . . .
    28 21779 . . . . . . . .
    28 21781 . . . . . . . .
    28 21782 . . . . . . . .
    28 21783 . . . . . . . .
    28 21784 . . . . . . . .
    28 21785 . . . . . . . .
    28 21786 . . . . . . . .
    28 21787 . . . . . . . .
    28 21788 . . . . . . . .
    28 21789 . . . . . . . .
    28 21790 . . . . . . . .
    28 21791 . . . . . . . .
    28 21793 . . . . . . . .
    28 21794 . . . . . . . .
    28 21795 . . . . . . . .
    28 21796 . . . . . . . .
    28 21797 . . . . . . . .
    28 21798 . . . . . . . .
    28 21801 . . . . . . . .
    28 21963 . . . . . . . .
    28 21965 . . . . . . . .
    28 21966 . . . . . . . .
    28 21967 . . . . . . . .
    28 21968 . . . . . . . .
    28 21969 . . . . . . . .
    28 21970 . . . . . . . .
    28 21971 . . . . . . . .
    28 21972 . . . . . . . .
    28 21973 . . . . . . . .
    28 21977 . . . . . . . .
    28 21978 . . . . . . . .
    28 22116 . . . . . . . .
    28 21997 . . . . . . . .
    28 22053 . . . . . . . .
    28 22193 . . . . . . . .
    28 22195 . . . . . . . .
    28 22196 . . . . . . . .
    28 22197 . . . . . . . .
    28 22198 . . . . . . . .
    28 22199 . . . . . . . .
    28 22200 . . . . . . . .
    28 22201 . . . . . . . .
    28 22202 . . . . . . . .
    28 22203 . . . . . . . .
    28 22204 . . . . . . . .
    28 22205 . . . . . . . .
    28 22206 . . . . . . . .
    28 22207 . . . . . . . .
    28 22208 . . . . . . . .
    28 22209 . . . . . . . .
    28 22210 . . . . . . . .
    28 22211 . . . . . . . .
    28 22212 . . . . . . . .
    28 22213 . . . . . . . .
    28 22214 . . . . . . . .
    28 22215 . . . . . . . .
    28 22216 . . . . . . . .
    28 22217 . . . . . . . .
    28 22218 . . . . . . . .
    28 22219 . . . . . . . .
    28 22220 . . . . . . . .
    28 22221 . . . . . . . .
    28 22222 . . . . . . . .
    28 22223 . . . . . . . .
    28 22224 . . . . . . . .
    28 22225 . . . . . . . .
    28 22368 18 7 6 . . . . .
    28 22508 . . . . . . . .
    28 22718 . . . . . . . .
    29 21769 . . . . . . . .
    29 21769 . . . . . . . .
    29 21770 . . . . . . . .
    29 21771 . . . . . . . .
    29 21772 . . . . . . . .
    29 21773 . . . . . . . .
    29 21774 . . . . . . . .
    29 21775 . . . . . . . .
    29 21776 . . . . . . . .
    29 21777 . . . . . . . .
    29 21778 . . . . . . . .
    29 21779 . . . . . . . .
    29 21780 . . . . . . . .
    29 21781 . . . . . . . .
    29 21782 . . . . . . . .
    29 21783 . . . . . . . .
    29 21784 . . . . . . . .
    29 21787 . . . . . . . .
    29 21788 . . . . . . . .
    29 21964 . . . . . . . .
    29 21989 . . . . . . . .
    29 21990 . . . . . . . .
    29 21991 . . . . . . . .
    29 21993 . . . . . . . .
    29 21994 . . . . . . . .
    29 21995 . . . . . . . .
    29 21996 . . . . . . . .
    29 21997 . . . . . . . .
    29 21998 . . . . . . . .
    29 21999 . . . . . . . .
    29 22000 . . . . . . . .
    29 22001 . . . . . . . .
    29 22002 . . . . . . . .
    29 22036 . . . . . . . .
    29 22004 . . . . . . . .
    29 22074 . . . . . . . .
    29 22210 . . . . . . . .
    29 22212 . . . . . . . .
    29 22213 . . . . . . . .
    29 22214 . . . . . . . .
    29 22215 . . . . . . . .
    29 22216 . . . . . . . .
    29 22217 . . . . . . . .
    29 22218 . . . . . . . .
    29 22219 . . . . . . . .
    29 22220 . . . . . . . .
    29 22221 . . . . . . . .
    29 22222 . . . . . . . .
    29 22223 . . . . . . . .
    29 22224 . . . . . . . .
    29 22225 . . . . . . . .
    29 22226 . . . . . . . .
    29 22227 . . . . . . . .
    29 22228 . . . . . . . .
    29 22229 . . . . . . . .
    29 22230 9 3 10 14 . . . .
    29 22231 . . . . . . . .
    29 22232 . . . . . . . .
    29 22233 . . . . . . . .
    29 22234 . . . . . . . .
    29 22235 . . . . . . . .
    29 22236 . . . . . . . .
    29 22237 . . . . . . . .
    29 22238 . . . . . . . .
    29 22239 . . . . . . . .
    29 22240 . . . . . . . .
    29 22241 . . . . . . . .
    30 21775 . . . . . . . .
    30 21776 . . . . . . . .
    30 21777 . . . . . . . .
    30 21778 . . . . . . . .
    30 21779 . . . . . . . .
    30 21780 . . . . . . . .
    30 21781 . . . . . . . .
    30 21782 . . . . . . . .
    30 21783 . . . . . . . .
    30 21784 . . . . . . . .
    30 21785 . . . . . . . .
    30 21786 . . . . . . . .
    30 21787 . . . . . . . .
    30 21788 . . . . . . . .
    30 21789 . . . . . . . .
    30 21790 . . . . . . . .
    30 21791 . . . . . . . .
    30 21792 . . . . . . . .
    30 21793 . . . . . . . .
    30 21794 . . . . . . . .
    30 21795 . . . . . . . .
    30 21796 . . . . . . . .
    30 21797 . . . . . . . .
    30 21798 . . . . . . . .
    30 21799 . . . . . . . .
    30 21800 . . . . . . . .
    30 21801 . . . . . . . .
    30 21802 . . . . . . . .
    30 21803 . . . . . . . .
    30 21804 . . . . . . . .
    30 21805 . . . . . . . .
    30 21973 . . . . . . . .
    30 21998 . . . . . . . .
    30 21999 . . . . . . . .
    30 22000 . . . . . . . .
    30 22001 . . . . . . . .
    30 22002 . . . . . . . .
    30 22003 . . . . . . . .
    30 22004 . . . . . . . .
    30 22005 . . . . . . . .
    30 22006 . . . . . . . .
    30 22007 . . . . . . . .
    30 22008 . . . . . . . .
    30 22009 . . . . . . . .
    30 22010 . . . . . . . .
    30 22011 . . . . . . . .
    30 22015 . . . . . . . .
    30 22009 4 6 17 . . . . .
    30 22207 . . . . . . . .
    30 22209 . . . . . . . .
    30 22210 . . . . . . . .
    30 22211 . . . . . . . .
    30 22212 . . . . . . . .
    end
    format %td survey_date

  • #2
    This might get you started, but I'm sure there's a better way.

    I'm assuming the registered age is mid year.

    Code:
    ren date survey_date  // dataex fix
    format %td survey_date
    
    egen agedate = max(cond(new_age1!=.,survey_date,.)) , by(new_id)  // the date the age is observed
    bys new_id: g between = agedate-survey_d //the difference from the observed date
    
    forv i = 1/8 {
        egen reg_age`i' = max(new_age`i'), by(new_id)  // fill columns with the observed age
        g proxyage`i' = (reg_age`i'+0.5) - between/365 // compute likely age
    }

    Comment


    • #3
      This is very helpful, thanks! I'm okay with using a probabilistic age (aka not a rounded integer for age), so the proxy_age is really helpful, though I'm trying to figure out how to interpret it in terms of my ultimate goal, which is the % likelihood that one of the children is 4 years old or younger.

      Comment


      • #4
        I have not really followed this, but given the statement of the "ultimate goal" in #3, you might want to try Chebyshev's inequality; see https://en.wikipedia.org/wiki/Chebyshev's_inequality

        Comment


        • #5
          A condition statement could be used for the last part, but here's a sketch.

          It assumes a linear relationship between 4 and 5.

          Code:
          ren date survey_date  // dataex fix
          format %td survey_date
          
          egen agedate = max(cond(new_age1!=.,survey_date,.)) , by(new_id)  // the date the age is observed
          bys new_id: g between = agedate-survey_d //the difference from the observed date
          
          forv i = 1/8 {
              egen reg_age`i' = max(new_age`i'), by(new_id)  // fill columns with the observed age
              g proxyage`i' = (reg_age`i'+0.5) - between/365 // compute likely age
          }
          
          egen minage = rowmin(proxyage1 - proxyage8)
          
          capture drop prob
          g prob = minage<=4
          replace prob = 1 - (minage-4) if prob==0 & minage<5



          Comment


          • #6
            Might need to adjust for the fact I assumed a June birthday.

            Comment


            • #7
              Thanks both George Ford and Rich Goldstein for the assistance.

              Comment


              • #8
                As I think about it, when you get a 4 returned, that's 100% true. This would say 50% true when you add the 0.5 for a mid-year birthday.

                This may fix that.
                g prob = minage<=4.5 replace prob = 1 - (minage-4.5) if prob==0 & minage<5

                Comment

                Working...
                X