Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a control variable for a OLS regression

    Hi!
    I have a database with multiple variables in it. My goal is to create a control variable for a OLS regression. The variable I want to create is the: maximum duration the member of the household of the migrant arrived to Portugal (years_sincemig_hh). This duration will be till the year of 2020 or 2021. The year 2020 or 2021 depends across observations and is the "bl_year" code.
    I will put every already coded variable in red.
    Therefore I want the minimum year that the member of the household arrived to PT. Like if the migrant has 2 members and one arrived in PT in 2014 and the other in 2004 I want to stay with the latter one. I don´t mind having the control variable I want with the year itself (like 2014) or the duration they are in PT(2020-2014= 6) , it´s the same interpretation, right?

    I have:
    c2_h* = the year the member arrived in PT
    c2_h* = 77 -> if the member always lived here and here I need to replace by (bl_year - c2_d*) if used the year specifically not the duration (but I guess the purpose is the same)
    c2_g* = 2 -> if the member of the hh moved before the migrant
    c2_g* = 1 -> if the member of the hh moved at the same time and here replace by f6 which is the year the migrant arrived
    c2_g* = 3 -> after
    c2_g* = 77 -> if the member always lived here and here I need to replace by (bl_year - c2_d*) if used the year specifically not the duration (but I guess the purpose is the same)

    So I want to gather all of these variables and then restrict to among the members the migrant have what is the one who has the maximum amount of years living here in PT.


    Notes:
    10 is the maximum number of members that one migrant can have in their household considering that some fill out only till c2_*2 so 2 members and others c2_*5 so 5 members not taking into account the migrant
    (c2_g1; c2_g2; c2_g3; ...; c2_g10 - if the member moved before, after or at the same time)

    (c2_d1, c2_d2 .... c2_d10 -> member' age)

    Thank you so much

  • #2
    Welcome to Statalist.

    I cannot fully understand the question, but here are a couple commands that may get you started. It seems you have a wide form data where every line is a household and member's information is displayed in columns. Here is an example data to show the commands:

    Code:
    clear
    input hhid c2_h1 c2_h2 c2_h3
    1 1975 1975 .
    2 1997 2004 1992
    3 1982 . .
    4 2001 2003 1999
    5 2003 2016 .
    end
    First, using forvalues loop, it's possible to loop through the 10 of them quickly. Here:

    Code:
    forvalues y = 1/3{
        gen yr_stay_`y' = 2020 - c2_h`y'
    }
    Results:
    Code:
         +---------------------------------------------------------------+
         | hhid   c2_h1   c2_h2   c2_h3   yr_sta~1   yr_sta~2   yr_sta~3 |
         |---------------------------------------------------------------|
      1. |    1    1975    1975       .         45         45          . |
      2. |    2    1997    2004    1992         23         16         28 |
      3. |    3    1982       .       .         38          .          . |
      4. |    4    2001    2003    1999         19         17         21 |
      5. |    5    2003    2016       .         17          4          . |
         +---------------------------------------------------------------+
    And the screen out the maximal, use egen rowmax():

    Code:
    egen max_yr_stay = rowmax(yr_stay_1 - yr_stay_3)
    Results:
    Code:
         +--------------------------------------------------------------------------+
         | hhid   c2_h1   c2_h2   c2_h3   yr_sta~1   yr_sta~2   yr_sta~3   max_yr~y |
         |--------------------------------------------------------------------------|
      1. |    1    1975    1975       .         45         45          .         45 |
      2. |    2    1997    2004    1992         23         16         28         28 |
      3. |    3    1982       .       .         38          .          .         38 |
      4. |    4    2001    2003    1999         19         17         21         21 |
      5. |    5    2003    2016       .         17          4          .         17 |
         +--------------------------------------------------------------------------+
    For the rest I am afraid I can't help because the question is very unclear. While the description of the variables is helpful, it's hard to understand without seeing the data example. You've been looking at it for days, but we were not behind your shoulder, so verbal description that makes perfect sense to you does not necessarily work for us. Please take a moment to carefully read the FAQ (http://www.statalist.org/forums/help), pay more attention to section 12 about how to use command dataex to provide data example. Eliminate additional information that is not important to the task, and explain clearly how would you like to incorporate information from the c2g variables. That way, I believe more users will be able to help.

    Comment

    Working...
    X