Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate/replace new variables in a repetitive way for all the individuals.

    Hello, everyone.

    Below is my data and variable name "salary21" means salary of PID 21, and each values indicate individuals' income in each year(1980~1999).

    I am trying to generate/replace a new variable by comparing public median income with each person's income in each year.
    Below code is what I've tried and what it should be look like.

    It perfectly worked but what I am struggling with is to make every PID's income variable in the same way.
    In my real data there are thousands of PID (individuals), thus it is impossible to write down below code thousands of times...

    *The code I've tried*
    Code:
    generate salary_21= "Unemp" if salary21 == 0
    replace salary_21= "(1)" if salary21 < medi_inc*2/3 &  salary21 ~= 0
    replace salary_21= "(2)" if salary21 >= medi_inc*2/3 & salary21 < income
    replace salary_21= "(3)" if salary21 > medi_inc
    *My data*

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float year int(salary21 salary22 salary41 salary42 salary51 salary52 salary61 salary62 salary71) double medi_inc
    1980 100 160 0 400 420 205 302 0   0 126.0188
    1981 100 160 0 400 420 205 302 0 250 155.0617
    1982 100 160 0 400 420 205 302 0 250  185.158
    1983 100 160 0 400 420 205 302 0 250  210.667
    1984 100 160 0 400 420 205 302 0 250  231.525
    1985 100 160 0 400 420 205 302 0 250  248.425
    1986  50 160 0 400 420 205 302 0 250      275
    1987  50 160 0 400 420 205 302 0 250      300
    1988  50 160 0 400 420 205 302 0 250  359.417
    1989  50 160 0 400 420 205 302 0 250  433.625
    1990   0 160 0 400 420 205 302 0 250  514.742
    1991   0 160 0 400 420 205 302 0 250  620.108
    1992 150 160 0 400 420 205 302 0 250  757.917
    1993 150 160 0 400 420 205 302 0 250  843.417
    1994 150 160 0 400 420 205 302 0 250  940.833
    1995 150 160 0 400 420 205 302 0 250  1065.25
    1996 150 160 0 400 420 205 302 0 250 1188.833
    1997 150 160 0 400 420 205 350 0 250   1310.5
    1998 150 160 0 400 420 205 100 0 250   1322.5
    1999 150 160 0 400 420 205 100 0   0 1361.167
    end

    Thanks a lot!!!

  • #2
    If you are saying that your person identifiers are the numbers after the salary, this would be much easier if you reshape your dataset into a long format:
    Code:
    reshape long salary, i(year) j(PID)
    
    generate salary_new= "Unemp" if salary == 0
    replace salary_new= "(1)" if salary < medi_inc*2/3 &  salary != 0
    replace salary_new= "(2)" if salary >= medi_inc*2/3 & salary < income
    replace salary_new= "(3)" if salary > medi_inc

    Comment


    • #3
      A string variable is not a good idea here. For example


      Code:
      . di ("U" < "(")
      0
      shows that "Unemp" will sort to after "(3)" in tables and on graphs, which I guess is the reverse of what you want. Use numeric values 0 1 2 3 and value labels.

      Comment


      • #4
        It worked!!!!!

        Thank you Jorrit and Nick for your help~!!

        Comment

        Working...
        X