Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Eliminating trend in repeated cross-sectional data

    Dear all

    I am using data from the UK Labour Force Survey with the aim of comparing earnings realisations in the labour market to youth's earnings expectations at age 30 from a different survey. As the sample size from the LFS would be too small if I used data on people that are exactly aged 30 years old, I am using data for adults aged 28-32 years old. However, on average adults aged less than 30 years old have lower earnings than the ones aged 30 years old, while the ones aged above 30 years old have higher earnings than the ones aged 30 years old. Before comparing the earnings expectation to the realised earnings observed in the LFS data i would like, to eliminate the age-related trend. My idea was to take the logarithm of annual earnings, generate a dummy for each age (28,29,30,31,32) and run the following regression
    reg logannualpay age28 age29 age30 age31 age32

    and then predict the detrended logannualpay

    predict logannualpay_detrend

    Does this method seem right?


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(annualgrosspay age28 age29 age30 age31 age32) int year float educat byte(SEX GOVTOF21)
     26502.36 1 0 0 0 0 2019 1 2 10
     36101.64 1 0 0 0 0 2019 1 2 10
     36101.64 1 0 0 0 0 2019 3 1 10
     42153.36 0 0 0 0 1 2019 1 1 10
     31458.51 0 0 0 1 0 2019 3 1 10
     17059.59 1 0 0 0 0 2019 3 1 10
     25093.77 0 0 0 0 1 2019 3 1 10
     18676.86 0 0 0 0 1 2019 3 2 10
     35110.41 1 0 0 0 0 2019 1 2 10
     40014.39 0 0 0 1 0 2019 3 1 10
     51491.79 0 0 1 0 0 2019 1 1 10
     28067.46 1 0 0 0 0 2019 1 2 10
     78150.66 0 0 0 0 1 2019 1 2 10
     28902.18 1 0 0 0 0 2019 3 1 10
     18050.82 1 0 0 0 0 2019 3 2 10
     84254.55 0 0 0 0 1 2019 2 1 10
     21859.23 0 1 0 0 0 2019 3 1 10
     28902.18 0 0 0 1 0 2019 1 1 10
        26085 0 1 0 0 0 2019 2 2 10
     30988.98 0 1 0 0 0 2019 3 2 10
     30102.09 1 0 0 0 0 2019 3 2 10
        26085 0 1 0 0 0 2019 3 1 10
     30675.96 1 0 0 0 0 2019 3 1  9
     22850.46 0 0 0 0 1 2019 3 1  9
     72255.45 0 0 1 0 0 2019 3 1  9
     24572.07 1 0 0 0 0 2019 1 2  9
     12051.27 0 0 0 1 0 2019 3 1  9
     38136.27 0 1 0 0 0 2019 1 2  9
     18833.37 0 0 0 1 0 2019 3 1  9
     22120.08 0 0 1 0 0 2019 1 2  9
     27076.23 0 0 0 0 1 2019 1 1  9
     30102.09 0 1 0 0 0 2019 1 1  9
     27597.93 0 0 1 0 0 2019 3 1  9
     35110.41 0 0 0 0 1 2019 1 2  9
     20189.79 0 1 0 0 0 2019 1 2  9
     49300.65 0 0 0 1 0 2019 2 1  9
     41944.68 1 0 0 0 0 2019 1 2  9
     40118.73 0 0 0 1 0 2019 3 1  9
     50187.54 0 0 0 1 0 2019 3 2  9
     33701.82 0 0 0 0 1 2019 1 2  9
     51491.79 0 0 1 0 0 2019 1 2  9
     38136.27 0 0 0 0 1 2019 1 1  9
     25093.77 1 0 0 0 0 2019 3 1  9
     30102.09 0 0 0 1 0 2019 2 1  9
        26085 0 0 0 1 0 2019 1 1  9
      39127.5 0 1 0 0 0 2019 1 2  9
     30102.09 0 1 0 0 0 2019 3 1  9
     36101.64 0 1 0 0 0 2019 1 2  9
     33336.63 0 0 1 0 0 2019 1 2  9
     22067.91 1 0 0 0 0 2019 1 2  9
     32084.55 0 0 0 0 1 2019 1 1  9
     19355.07 1 0 0 0 0 2019 3 1  9
        26085 0 1 0 0 0 2019 3 1  9
     22067.91 0 0 0 1 0 2019 3 1  8
     53161.23 0 0 1 0 0 2019 1 1  8
     85297.95 0 1 0 0 0 2019 1 1  8
     85297.95 1 0 0 0 0 2019 1 1  8
        78255 1 0 0 0 0 2019 1 1  8
     57178.32 0 1 0 0 0 2019 1 2  8
        26085 0 1 0 0 0 2019 1 1  8
     48152.91 1 0 0 0 0 2019 1 2  8
     48152.91 0 0 0 0 1 2019 1 1  8
     44135.82 0 1 0 0 0 2019 1 2  8
        78255 0 1 0 0 0 2019 1 2  8
     34119.18 0 0 0 0 1 2019 1 2  8
    123016.86 0 0 0 1 0 2019 1 1  8
      65212.5 0 0 0 0 1 2019 1 1  8
      65212.5 0 1 0 0 0 2019 1 2  8
     69020.91 0 0 0 0 1 2019 1 1  8
     35110.41 1 0 0 0 0 2019 1 1  8
     25093.77 0 0 0 1 0 2019 1 2  8
     32084.55 0 0 1 0 0 2019 1 2  9
     28067.46 1 0 0 0 0 2019 1 1  9
     21754.89 0 0 1 0 0 2019 1 1  9
      9651.45 0 0 0 1 0 2019 3 1  8
     40118.73 0 0 0 1 0 2019 3 2  9
     49352.82 0 0 0 0 1 2019 1 1  8
     37145.04 0 1 0 0 0 2019 1 1  8
     15024.96 1 0 0 0 0 2019 3 2  8
     90306.27 0 0 0 0 1 2019 1 2  8
     29110.86 0 0 0 1 0 2019 1 2  8
     45127.05 0 0 1 0 0 2019 1 1  8
     16068.36 0 1 0 0 0 2019 3 2  8
     50187.54 0 0 0 0 1 2019 1 2  8
     108357.1 0 0 0 1 0 2019 1 1  8
     36101.64 0 0 0 0 1 2019 2 1  8
     100322.9 0 0 1 0 0 2019 1 2  8
     45127.05 0 0 0 0 1 2019 1 1  8
      5999.55 0 0 0 0 1 2019 1 2  8
     47631.21 0 0 0 0 1 2019 1 2  8
     35892.96 1 0 0 0 0 2019 1 2  8
     69229.59 0 0 1 0 0 2019 1 1  8
     42153.36 0 0 0 0 1 2019 1 2  7
     42153.36 0 1 0 0 0 2019 1 2  8
     29110.86 0 0 1 0 0 2019 3 1  8
     15024.96 1 0 0 0 0 2019 2 2  7
     38501.46 0 0 0 1 0 2019 1 2  8
      65212.5 0 0 0 1 0 2019 3 1  8
     60204.18 1 0 0 0 0 2019 1 1  8
     30102.09 1 0 0 0 0 2019 3 2  7
    end
    label values educat educat
    label def educat 1 "Degree or equivalent", modify
    label def educat 2 "Higher education", modify
    label def educat 3 "GCE A level and below", modify
    label values SEX SEX
    label def SEX 1 "Male", modify
    label def SEX 2 "Female", modify
    label values GOVTOF21 GOVTOF21
    label def GOVTOF21 7 "Eastern", modify
    label def GOVTOF21 8 "London", modify
    label def GOVTOF21 9 "South East", modify
    label def GOVTOF21 10 "South West", modify

    Many thanks in advance!

  • #2
    I think you want to predict the residual. The xb prediction will contain the age effects.

    Comment


    • #3
      Thank you very much George, I predicted the residuals. Then (as I have never performed a similar task before) do i subtract the age effects from the logannualpay variable?

      Comment


      • #4
        the residual is logannualpay absent the age effects, so use that.

        but if you're goal is to determine X's effect on logannualpay, then you can use the age dummies in that model. or use age as a fixed effect.

        Comment


        • #5
          Thank you George. I ran the following code:
          reg logannualpay age28 age29 age30 age31 age32
          predict xb, resid

          However, I am getting some negative values, which I cannot use for earnings right? (see below)

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float xb
            -.0456564
            .26344758
            .26344758
             .3587337
            .05943794
            -.4861776
           -.15996113
           -.45529595
            .23560695
              .300008
             .5436513
           .011720464
             .9760576
            .04102688
            -.4296991
            1.0512615
           -.25349012
           -.02531414
            -.0767533
            .09551843
            .08170395
            -.0767533
             .1005886
           -.25361004
             .8824365
            -.1212799
            -.9000614
             .3030523
            -.4536007
            -.3012851
           -.08392468
            .06648096
           -.08003072
             .1759163
             -.332936
             .5087063
             .4134615
             .3026115
             .5265352
            .13497123
             .5436513
             .2585846
           -.10027047
            .01536293
           -.12787132
             .3287118
            .06648096
             .2482246
            .10888548
            -.2287657
            .08579406
            -.3599359
            -.0767533
            -.2951067
            .57555836
            1.1080374
            1.1232603
            1.0370824
             .7080615
            -.0767533
            .55149156
             .4918009
              .449158
            1.0218595
             .1472784
            1.4230902
             .7950702
             .8395379
             .8518282
            .23560695
            -.1666115
            .07060404
           .011720464
            -.3179324
           -1.1221235
             .3026115
             .5164142
            .27671656
            -.6131775
            1.1206261
          -.018120576
             .4117105
            -.5612609
            .53318554
             1.296201
             .2037569
            1.2106225
             .4269005
            -1.590897
            .48090705
            .25765115
             .8396575
             .3587337
             .4032014
          -.026660247
            -.6131775
             .2614643
             .7884199
             .7748516
            .08170395
          end

          Comment


          • #6
            it's a residual, so that will happen. But, it is measuring income. You could rescale it by the constant term.

            Comment


            • #7
              Code:
              g lannualgrosspay = ln(annualgrosspay)
              g age = 28*age28 + 29*age29 + 30*age30 + 31*age31 + 32*age32
              tabstat lannualgrosspay, by(age)
              reg lannualgrosspay age28 age29 age31 age32
              predict lpay_detrend , resid
              replace lpay_detrend = lpay_detrend+_b[_cons]
              tabstat lpay_detrend, by(age)
              graph bar lannualgrosspay lpay_detrend , over(age)

              Comment


              • #8
                in a log/log model, the scale does not matter.

                Comment


                • #9
                  Thank you very much George!

                  Comment

                  Working...
                  X