Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to identify, within groups, the income in the median year?

    Dear Statalists,

    I have a panel data that contains people's income (in 1,000 dollars) in years they are surveyed. In some years the data are missing, also it is not a balanced panel.

    *=================*
    clear

    input str5 id year income
    A 2005 110
    A 2006 160
    A 2007 .
    B 2005 60
    B 2006 80
    B 2007 .
    B 2008 .
    C 2005 200
    C 2006 .
    C 2007 170
    C 2008 .
    D 2005 70
    D 2006 80
    D 2007 90
    D 2008 100
    end

    list, sepby(id)

    *=================*

    I want to create a variable md_income that takes the income value of the median survey year (in case there are two years in the middle, the income of whichever year in which income data is not missing).

    In the above data, I want a variable md_income that takes value 160 for A, 80 for B, 170 for C, and 85 for D.

    How can I do that? I am at my wit's end.
    Last edited by Lucy Zhao; 16 Jun 2023, 05:46.

  • #2
    Code:
         +------------------------------------------------------------+
         | id   year   income   median~r   wanted1   wanted2   wanted |
         |------------------------------------------------------------|
      1. |  A   2006      160       2006       160       160      160 |
      2. |  A   2007        .       2006       160       160      160 |
      3. |  A   2005      110       2006       160       160      160 |
         |------------------------------------------------------------|
      4. |  B   2008        .     2006.5         .        80       80 |
      5. |  B   2006       80     2006.5         .        80       80 |
      6. |  B   2007        .     2006.5         .        80       80 |
      7. |  B   2005       60     2006.5         .        80       80 |
         |------------------------------------------------------------|
      8. |  C   2006        .     2006.5         .       170      170 |
      9. |  C   2008        .     2006.5         .       170      170 |
     10. |  C   2007      170     2006.5         .       170      170 |
     11. |  C   2005      200     2006.5         .       170      170 |
         |------------------------------------------------------------|
     12. |  D   2006       80     2006.5         .        85       85 |
     13. |  D   2005       70     2006.5         .        85       85 |
     14. |  D   2007       90     2006.5         .        85       85 |
     15. |  D   2008      100     2006.5         .        85       85 |
         +------------------------------------------------------------+

    Comment


    • #3
      Sorry: the code got mislaid.

      Code:
      clear
      
      input str5 id year income
      A 2005 110
      A 2006 160
      A 2007 .
      B 2005 60
      B 2006 80
      B 2007 .
      B 2008 .
      C 2005 200
      C 2006 .
      C 2007 170
      C 2008 .
      D 2005 70
      D 2006 80
      D 2007 90
      D 2008 100
      end
      
      bysort id (year) : egen median_year = median(year)
      
      by id : gen wanted1 = income if year == median_year 
      bysort id (wanted1) : replace wanted1 = wanted1[1]
      
      by id : egen wanted2 = mean(cond(inlist(year, 2006, 2007), income, .))
      
      gen wanted = cond(wanted1 < ., wanted1, wanted2)
      
      list, sepby(id)
      
      
           +------------------------------------------------------------+
           | id   year   income   median~r   wanted1   wanted2   wanted |
           |------------------------------------------------------------|
        1. |  A   2006      160       2006       160       160      160 |
        2. |  A   2007        .       2006       160       160      160 |
        3. |  A   2005      110       2006       160       160      160 |
           |------------------------------------------------------------|
        4. |  B   2008        .     2006.5         .        80       80 |
        5. |  B   2006       80     2006.5         .        80       80 |
        6. |  B   2007        .     2006.5         .        80       80 |
        7. |  B   2005       60     2006.5         .        80       80 |
           |------------------------------------------------------------|
        8. |  C   2006        .     2006.5         .       170      170 |
        9. |  C   2008        .     2006.5         .       170      170 |
       10. |  C   2007      170     2006.5         .       170      170 |
       11. |  C   2005      200     2006.5         .       170      170 |
           |------------------------------------------------------------|
       12. |  D   2006       80     2006.5         .        85       85 |
       13. |  D   2005       70     2006.5         .        85       85 |
       14. |  D   2007       90     2006.5         .        85       85 |
       15. |  D   2008      100     2006.5         .        85       85 |
           +------------------------------------------------------------+

      Comment


      • #4
        THANK YOU SO VERY MUCH Nick! It works great!!!

        Comment

        Working...
        X