Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating the highest and lowest quartile for a variable of multiple waves in panel data

    Hi Statalist,

    I want to calculate the highest and lowest quartile for mental health (stdMentalHealth) of people for the waves of available data before people have a child. An example of the dataset is the following:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float wave long pidp float(stdMentalHealth gotFirstChild timingBirthOne)
     7    76165  -.1682427 0  7
     8    76165  -.2205372 0  8
     9    76165  1.4913077 1  9
    10    76165  .29066837 0 10
     3  4794685  -2.734943 0  6
     4  4794685 -2.4201086 0  7
     5  4794685 -1.5108243 0  8
     6  4794685 -1.8128518 1  9
     2 68002049  .29066837 0  4
     7 68002049  -.4489255 1  9
     8 68002049  -.5311026 0 10
     1 68009527   .9406145 0  2
     2 68009527  -.2664283 0  3
     3 68009527 -1.0679218 0  4
     4 68009527   .2127602 0  5
     5 68009527   .6247129 0  6
     6 68009527 -.07859495 0  7
     7 68009527    -.26963 0  8
     8 68009527  -2.099938 1  9
     9 68009527  -.4702702 0 10
    10 68009527  -.8352646 0 11
     1 68035367  -.0252332 0  1
     2 68035367  -.5439094 0  2
     3 68035367  .27572706 0  3
     4 68035367   .3536352 0  4
     5 68035367    .457157 0  5
     6 68035367  -.0252332 0  6
     7 68035367  -.6815827 0  7
     8 68035367  .05587666 0  8
     9 68035367   .3536352 1  9
    10 68035367  .05587666 0 10
     1 68051687  1.0281277 0  6
     2 68051687   .3066769 0  7
     3 68051687  -.4425221 0  8
     4 68051687  .59589756 1  9
     1 68051691   .9224715 0  6
     2 68051691   .6065699 0  7
     3 68051691    .652461 0  8
     4 68051691 -.20346144 1  9
     5 68061288  -.7840373 0  6
     6 68061288  -2.512958 0  7
     7 68061288   .2223653 0  8
     8 68061288   -.215201 1  9
     9 68061288    -.31979 0 10
    10 68061288 -.27710065 0 11
     1 68111527   .8082773 0  8
     2 68111527   .7282347 1  9
     3 68111527  .29066837 0 10
     4 68111527 -1.0860648 0 11
     5 68111527   .9224715 0 12
     6 68111527   .9833038 0 13
     7 68111527  1.2362386 0 14
     8 68111527   .2095585 0 15
     9 68111527  1.0441363 0 16
    10 68111527   1.229835 0 17
     1 68120375   .5329307 0  2
     2 68120375  .59589756 0  3
     3 68120375   .9117991 0  4
     4 68120375   .9117991 0  5
     5 68120375   .6588644 0  6
     6 68120375   .6588644 0  7
     7 68120375   .9117991 0  8
     8 68120375   .6588644 1  9
     9 68120375   -.215201 0 10
    10 68120375 .027061317 0 11
     2 68133289  -.2899075 0  1
     3 68133289  -.6634397 0  2
     4 68133289  -.6901206 0  3
     5 68133289 -1.3710165 0  4
     6 68133289  .13591929 0  5
     7 68133289 -2.2055943 0  6
     8 68133289 -1.0999389 0  7
     9 68133289   -.628221 0  8
    10 68133289 -1.6111444 1  9
     6 68142890 -.05938471 0  6
     7 68142890   .9117991 0  7
     8 68142890  .19568445 0  8
     9 68142890   .6247129 1  9
     1 68163887   .9117991 0  2
     2 68163887   .9117991 0  3
     3 68163887  1.4806354 0  4
     4 68163887  1.4806354 0  5
     5 68163887   .9117991 0  6
     6 68163887   .9117991 0  7
     7 68163887   .9117991 0  8
     8 68163887    .279996 1  9
     9 68163887  -.2600249 0 10
    10 68163887   .9224715 0 11
     1 68174767   1.387786 0  6
     2 68174767  -2.389159 0  7
     3 68174767   .6983521 0  8
     4 68174767   .8904544 1  9
     5 68174767   .9907745 0 10
     6 68174767 -1.0593839 0 11
     7 68174767     .75705 0 12
     8 68174767  .21489467 0 13
     9 68174767 -1.0796614 0 14
    10 68174767  1.0622792 0 15
     1 68202647   -.165041 0  6
     2 68202647  .29066837 0  7
    end
    In my dataset, every individual gets a child when timingBirthOne equals 9. I thus want to calculate which people (pidp) belong to the lowest quartile of mental health (stdMentalHealth) and which people belong to highest quartile of mental health for the available data of the waves before, thus when timingBirthOne equals 1-8 (for the values/waves available). Does anyone have an idea how to perform this?

    Many thanks in advance!

    Kind regards,

    Vincent van Marrewijk

  • #2
    I am not sure I follow exactly what you want, but this may help.

    If we start with mental health then the upper quartile (value not bin) is at

    Code:
    egen p75 = pctile(cond(timingBirthOne <= 8, stdMentalhealth, .)), p(75)
    and the lower quartile uses 25 not 75.

    If the quartiles are be calculated separately for each wave, then you need an extra option

    Code:
    by(timingBirthOne)
    or possibly even a variable for each wave.

    Then your bins can be calculated according to where values lie relative to the lower and upper quartile.

    I am aware of xtile and its competitors and complements, but doubt that they make the calculation here much easier, the sting being that you evidently want the results calculated for 1 to 8 to be visible by later observations.
    Last edited by Nick Cox; 14 May 2021, 07:46.

    Comment


    • #3
      Thank you very much Nick, I'm now almost there I think.

      I have now the p25 and p75 values for timingBirthOne 1-8.
      I actually want to create two dummy variables now. One dummy variable lowest quartile, which is 1 if the individual belongs to the lowest quartile and one dummy highest quartile which is 1 if the individual belongs to the highest quartile.
      I think I therefore have to calculate the means of people stdMentalHealth value for timingBirthOne waves 1-8 and then assign them to the right quartile with the p25 and p75 values.
      Do you maybe know how I could perform this last step?
      I don't necessarily need the results calculated for 1 to 8 to be visible for later observations.

      Thank you in advance!

      Comment


      • #4
        Not so, I think. The upper quartile (meaning bin) is just those observations for which mental heath is greater than the upper quartile (value) The lower quartile (meaning bin) is just those observations for which mental health is less than the lower quartile (value).

        Calculating means of anything whatsoever is nothing to do with such definitions. That might be useful for other reasons.

        Here is the idea stripped down to a fairly simple example.


        Code:
        . sysuse auto, clear
        (1978 automobile data)
        
        . egen p25 = pctile(mpg), p(25) by(foreign)
        
        . egen p75 = pctile(mpg), p(75) by(foreign)
        
        . gen bin = cond(mpg > p75, 4, cond(mpg < p25, 1, .))
        (40 missing values generated)
        
        . sort foreign mpg
        
        . list foreign mpg bin if bin < . , sepby(foreign bin)
        
             +----------------------+
             |  foreign   mpg   bin |
             |----------------------|
          1. | Domestic    12     1 |
          2. | Domestic    12     1 |
          3. | Domestic    14     1 |
          4. | Domestic    14     1 |
          5. | Domestic    14     1 |
          6. | Domestic    14     1 |
          7. | Domestic    14     1 |
          8. | Domestic    15     1 |
          9. | Domestic    15     1 |
         10. | Domestic    16     1 |
         11. | Domestic    16     1 |
         12. | Domestic    16     1 |
         13. | Domestic    16     1 |
             |----------------------|
         42. | Domestic    24     4 |
         43. | Domestic    24     4 |
         44. | Domestic    24     4 |
         45. | Domestic    25     4 |
         46. | Domestic    26     4 |
         47. | Domestic    26     4 |
         48. | Domestic    28     4 |
         49. | Domestic    28     4 |
         50. | Domestic    29     4 |
         51. | Domestic    30     4 |
         52. | Domestic    34     4 |
             |----------------------|
         53. |  Foreign    14     1 |
         54. |  Foreign    17     1 |
         55. |  Foreign    17     1 |
         56. |  Foreign    18     1 |
         57. |  Foreign    18     1 |
             |----------------------|
         70. |  Foreign    30     4 |
         71. |  Foreign    31     4 |
         72. |  Foreign    35     4 |
         73. |  Foreign    35     4 |
         74. |  Foreign    41     4 |
             +----------------------+

        Comment


        • #5
          Dear Nick, Many thanks for your explanation!
          Unfortunately I still can't figure out how I could realise the following: I want for the timingBirthOne 1-8 (for the waves available per person) classify people into quartile 1 or 4 or . if they belong to quartile 2 or 3. But I want people to belong to the same quartile for all the waves 1-8. Therefore I was thinking of calculating the mean scores etc.
          What I now have is that people are classified each wave apart into a certain quartile, while I want them to belong to the same quartile based on their average stdMentalHealth score of all available waves 1-8.
          I hope that I explained it well, thank you very much for your help!

          Comment


          • #6
            Does anyone maybe have an idea how I could do this? Or is this simply not possible?

            Comment

            Working...
            X