Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • quintile per year in panel data

    Dear Statalist,

    I have a problem with creating quintiles in stata14, where quintile is based on the whole data set 1994-2015 rather than per year.

    I'm using a data set from 1994 - 2015 with a variable for real income deflated to 1994. Real income has been increasing, and so when I create quintiles using
    Code:
    xtile quint = realincome, nq(5)
    Code:
    . tabulate    catastrophic_30    quint if year==1994
    
    Catastroph    
    ic 30        5 quantiles of realincome
    percent    1    2          3          4    5    Total
                    
    0    1,048    1,112        817        570    347    3,894
    1    26    4          2          1    3    36
                    
    Total    1,074    1,116        819        571    350    3,930
    
     
    
    tabulate catastrophic_30    quint if year==2015
    
    Catastroph
    ic 30    5 quantiles of realincome
    percent          1    2          3          4    5    Total
                
    0        246    667      1,124      1,630    2,058    5,725
    1        418    154        213        179    154    1,118
                
    Total        664    821      1,337      1,809    2,212    6,843
    The number of people in the last quintile in 1994 is much lower than in 2015. So it seems to be that if you make, for example, 50.000 ruble in 1994, even though for that year you would belong to the first quintile, overall you are now in the - for example - third quintile because income increased in the years 2000-2015

    I think I would prefer to have quintiles based on the specific years. (But, in my analysis i do not want to have quintiles for every single year). So for example, when I run a probit model (and margins) to see what the probability is of having catastrophic health expenditures i want to have 5 quintiles.

    Many thanks,
    Rogier
    Last edited by Rogier Jansen; 19 Aug 2018, 05:43.

  • #2
    Here is some code that may start you in a useful direction.
    Code:
    // create pretend data between 1 and 2 in 2001 and between 2 and 3 in 2002
    clear
    set obs 200
    generate year = 2001 in 1/100
    replace  year = 2002 in 101/200
    generate x = 1+runiform() in 1/100
    replace  x = 2+runiform() in 101/200
    // quintiles separately in each year
    generate quint = .
    forvalues y = 2001/2002 {
        xtile temp = x if year==`y', nq(5)
        replace quint = temp if year==`y'
        drop temp
        }
    table quint year, contents(min x max x)
    Code:
    . table quint year, contents(min x max x)
    
    ------------------------------
              |        year       
        quint |     2001      2002
    ----------+-------------------
            1 |  1.01146  2.002565
              | 1.212845   2.23882
              | 
            2 | 1.222928  2.247012
              | 1.413067  2.462382
              | 
            3 | 1.424769  2.465255
              | 1.562749  2.576547
              | 
            4 | 1.565148  2.581064
              | 1.797118  2.869119
              | 
            5 | 1.814271  2.876975
              | 1.983617  2.991456
    ------------------------------

    Comment


    • #3
      Following up on William's response, I am trying to apply the same approach to create quintiles based on the observations on a specific date. I am working with panel data that includes stock IDs and daily return observations. The date format is [%tddd/nn/YY]. I tried the following but I am getting the error message [invalid syntax r(198)].

      Code:
       
       generate quint = .   
       forvalues y = `28/06/19' {    xtile temp = x if Date==`y', nq(5)    replace quint = temp if Date==`y'    drop temp    }
      I would be grateful for any help!

      Thank you in advance!
      Nour

      Comment


      • #4
        That's indeed invalid syntax. 28/06/19 is not a legal local macro name and so you can't have such a local macro. In any case that is evidently (intended to be) a single daily date. So, there is no need for and no point in a loop. This should be legal

        Code:
        generate quint = .    
        xtile temp = x if Date == mdy(6, 28, 2019), nq(5)    
        replace quint = temp if Date== mdy(6, 28, 2019)  
        drop temp
        and this is better style to avoid the puzzling choreography of variables created only to be destroyed.

        Code:
        xtile quint = x if Date == mdy(6, 28, 2019), nq(5)
        I doubt it's what you really want, but your question doesn't imply otherwise.

        Comment


        • #5
          That is exactly what I needed, thank you very much!

          Comment

          Working...
          X