Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep running a replace command until (0 real changes made)

    Hello!
    I have a paneldata with the ages of individuals with around 50 million observations. The age of each individual should increase at the rate of 1 year per year, but it doesn't. Thus, I'm correcting this issue by checking if the increase (variable "delta") is different than 1. I'm also using a mode (egen tmp = mode(delta), by(ntrab)) of the delta to check if for at least some individuals, I have multiple years where the age increases at the correct rate. I then generate a new variable tmp2, which is simply to check if for two consecutive years the delta is equal to 1 (gen tmp2 = (delta == 1 & l.delta == 1)) and I replace the age variable (idade) to missing if tmp2 == 0. I then start correcting idade with the following code:

    replace idade = f1.idade-1 if f1.idade ~= . & f1.tmp2 == 1

    which basically goes to the first correct value and uses it subtracting one. This is an example of the data after running the code once. Stata has corrected the age for the year (variable ano) 2011, but I would need to run it again 3 more times until it corrects all years. Do note that this is an example and for some individuals I may have around 30 years, so I would actually like Stata to keep running the code until I get (0 real changes made). Is there anyway to do this?

    Thanks for your time!
    Hélder

    ano | workerid | idade | delta | tmp | tmp2
    2008 | 1 | . | -55 | 1 | 0
    2009 | 1 | . | 57 | 1 | 0
    2010 | 1 | . | -55 | 1 | 0
    2011 | 1 | 21 | 1 | 1 | 0
    2012 | 1 | 22 | 1 | 1 | 1
    2013 | 1 | 23 | 1 | 1 | 1
    2014 | 1 | 24 | 1 | 1 | 1


  • #2
    Hélder, if I understand correctly, you'd like to correct ages for all individuals in all years. I wonder if you have a variable indicating the birth year of each individual. If yes, then "replace age = year - birthyear" will do. If there is no birth year, and the age for an individual is inconsistent over time, then theoretically we don't know which value is right and which is wrong if no other supporting information is available.

    If we assume your mode hypothesis holds, then you may generate birth year and pick the mode of the birth year as the correct value. Codes are:

    Code:
    gen birthyear = year - age
    bys personid: egen bthyr_mode = mode(birthyear)
    replace age = year - bthyr_mode
    drop birthyear bthyr_mode

    Comment


    • #3
      Thanks Fei Wang
      I don't have the birth year, I only have the reported age in each year. Generating a birthyear variable for the "correct" age from the mode is a good idea!

      Comment


      • #4
        Originally posted by Helder Costa View Post
        The age of each individual should increase at the rate of 1 year per year
        If a person is born in July, and interviewed first in August and again in June, then that person's age in years should not have increased by 1 year. In my experience, people tend to get their age right when asked; so if this is a frequent issue, I would think that it is due to varying interview dates rather than incorrectly reported ages.
        Last edited by daniel klein; 26 Oct 2021, 10:52.

        Comment


        • #5
          Thanks daniel klein, but that isn't an issue with the data. The collection date of the data is always the same.

          Comment


          • #6
            I am not aware of a single survey study that has been able to conduct interviews of all respondents on the same day (week, or often even month). If those ages are not off by more than one year, I would not be suspicious.

            Comment

            Working...
            X