Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I assign individual's 2014 score by ID for variable FL in subsequent years for that same individual? (2015, 2016, 2017, 2018, 2019)

    First time posting so apologies for any poor wording.

    I am constructing a database for my sample period (2014-2019), and I am working with longitudinal survey data, whereby the same individuals are asked the same questions every year. I have a few thousand individuals in my sample, with unique ID numbers that track the applicant through each survey year (these ID numbers remain the same for the same individual each year). In addition, I have also taken questions from an additional supplemental survey which was run in the first year of my sample period (2014). However, this survey was only run in 2014, and not any of the subsequent years of my sample period. I need to take variable 'FL' from this supplemental 2014 survey, and assign the same 'FL' scores to each ID for the remaining sample period years (2015, 2016, 2017, 2018, 2019).

    The issue is that the supplemental survey data had to be merged with my main survey data using their 2014 Interview Numbers (which differ every year and do not always correspond to the same individual's ID), not their ID.

    Is there any way that I can replace the 'FL' missing values in 2015, 2016, 2017, 2018, 2019, and tell STATA to assign the 2014 FL values by ID? Again the issue is that I had to merge the FL variables into the Master database using their 2014 Interview Numbers (not their IDs). I have seen on some Forum discussions a code similar to:

    by ID FL, sort: replace SURVEY_YEAR = FL [2014]

    but I cannot quite get it to work for me.

    Any help is greatly appreciated as I have been struggling to overcome this hurdle for a while! Thank you

  • #2
    by ID FL, sort: replace SURVEY_YEAR = FL [2014]
    is not what you have seen. The subscript on a variable is an observation number, not a value like 2014.

    You don't show us sample data, so I'm going to assume you have a variable named ID - the ID number that identifies an individual through each survey year, and a variable named YEAR which takes the value of the survey year - 2014 through 2019.
    Code:
    by ID (YEAR), sort: replace FL = FL[1] if missing(FL)
    will sort your data into groups by ID, and within each group by YEAR, so 2014 is the first observation in each ID. Then within each ID, it replaces missing values of FL with the value from the first observation with that ID (YEAR 2014).

    See the output of
    Code:
    help by
    for further explanation of how the by and bysort commands work.

    Comment


    • #3
      That worked perfectly! Exactly what I wanted to do with the data, thank you so much! Was having a major headache trying to figure it out over the past couple days

      Comment

      Working...
      X