Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Construct ranks then test change in ranking

    Dear Stata listers,

    For the following dataset, I want to do two things: (1) create a quintile rank for continuous variable Xvar, then put it into 5 buckets by creating a quintile rank based on all Xvar value of firms in the same industry and same year. (2) test if this rank has changed over time for each time. Specifically I would like to create Uprank dummy set to one if Xvar rank moved up in ranking i.e. and Downrank dummy set to one if Xvar rank moved lower in ranking.

    If you could offer some kind of sample code for (1), that will be very much aprpecaited. For (2) I suppose, I could use some lag function to create change dummy. Your suggestions are more than welcome !
    FirmID year xvar sic2
    1001 1990 0.3 41
    1002 1990 0.4 41
    1003 1990 0.3 41
    1004 1990 0.9 41
    1005 1990 1.1 41
    1006 1990 2 41
    1007 1990 1 41
    1008 1990 1.2 41
    1009 1990 0.8 41
    1010 1990 0.1 41
    1001 1991 0.34 41
    1002 1991 0.36 41
    1003 1991 0.41 41
    1004 1991 0.18 41
    1005 1991 0.23 41
    1006 1991 0.21 41
    1007 1991 0.22 41
    1008 1991 1 41
    1009 1991 1.2 41
    1100 1990 0.23 42
    1101 1990 0.21 42
    1102 1990 0.2 42
    1103 1990 0.4 42
    1104 1990 0.26 42
    1105 1990 0.31 42
    1106 1990 0.42 42
    Note: I did not include all panel observations, above is just an illustration.

    Best,
    Rochelle

  • #2
    Would I be correct in assuming that sic2 is some kind of code for industry? And, in your sample data, all the observations from 1991 are sic2 41, whereas all the observations for 1990 are sic42. Would I be correct in assuming that this is just an accident of your example and that in the larger data set, each year includes multiple sic2's? If so:

    Code:
    levelsof year, local(years)
    levelsof sic2, local(sic2s)
    
    tempvar temp_quintile
    gen int quintile = .
    
    foreach y of local years {
        foreach s of local sic2s {
            xtile `temp_quintile' = xvar if year == `y' & sic2 == `s', nq(5)
            replace quintile = `temp_quintile' if year == `y' & sic2 == `s'
            drop `temp_quintile'
        }
    }
    should do what you want.

    Whether you use this particular method, or some other, bear in mind that division into quintiles can involve arbitrary choices about breaking ties, and where to draw boundaries when the number of objects being grouped is not divisible by 5. The -xtile- command handles these in particular ways, which you can read about in the corresponding section of the online users manual. Other methods of calculating percentiles in Stata can handle these same decisions differently (e.g. the percentiles calculated by -summarize, detail-.) I have shown you the -xtile- method because it is the simplest one to code that I could think of. But you may need to do something more complicated if its ways of resolving these inherent difficulties is not suitable for your purposes.

    Finally, I can't leave a post like this without pointing out that grouping inherently continuous data, no matter how accomplished, throws away information and is often an ill-advised approach. I suspect you probably shouldn't be doing this at all, but that is up to you.

    Comment


    • #3
      I echo Clyde's comments. Turning measurements into a classification is often just throwing away good information. I realise it is a tribal habit in some quarters.

      Despite a long history of being shocked at a distance by the difficulties of producing quantile-based bins, I was shocked at close quarters when I looked at this myself:

      Cox, N.J. 2012. Matrices as look-up tables. Stata Journal 12: 748-758.

      http://www.stata-journal.com/article...article=pr0054

      Matrices in Stata can serve as look-up tables. Because Stata will accept references to matrix elements within many commands, most notably generate
      and replace, users can access and use values from a table in either vector or full matrix form. Examples are given for entry of small datasets, recoding of categorical variables, and quantile-based or similar binning of counted or measured variables. In the last case, the device grants easy exploration of the consequences of different binning conventions and the instability of bin allocation.

      Comment


      • #4
        Dear Clyde,

        Thanks so much for your help !

        You are correct. sic2 is the industry classification code. I ran the program and it worked. In the log file I noticed "nquantiles() must be less than or equal to number of observations plus one", I did google search and saw discussion such as this
        http://www.stata.com/statalist/archi.../msg00154.html

        I assume what I got is not a real error, but a bug in xtile. Correct?

        I agree with you concerning the arbitrary choices about breaking ties.

        I plan to create quintiles for two variables, then compute the change in quintiles as I indicated in my earlier post. My plan to test if the change in xvar's quintiles predicts the change in yvar's quintiles . my posted data does not show yvars. If you see a problem with this approach, please also share you view.

        thanks again,
        Rochelle

        Comment


        • #5
          To my knowledge, -xtile- has no bugs. It probably means that you have some year-industry combination with fewer than 5 observations. Try cross-tabbing sic2 and year to find it. If you want to stick with quintiles, you will have to make a decision how to make an exception for that year-industry combination.

          Comment


          • #6
            Thank you !!!

            Comment


            • #7
              @ Nick,
              thank you ! I saw other posts regarding the issue with classification using xtile.

              @Clyde,

              I am still bugged by the error message "nquantiles() must be less than or equal to number of observations plus one" see full log,
              . foreach y of local years {
              2. foreach s of local sic2s {
              3. xtile `temp_quintile' = xvar if year == `y' & sic2 == `s', nq(5)
              4. replace quintile = `temp_quintile' if year == `y' & sic2 == `s'
              5. drop `temp_quintile'
              6. }
              7. }
              (10 real changes made)
              (7 real changes made)
              (9 real changes made)
              nquantiles() must be less than or equal to number of observations plus one


              using my original dataset, stata 13 did create the quintile (see output below), but why does it still display the error code? My freq variable shows the count is more than 5 or each sic2 and year combination.


              . list

              +-----------------------------------------------+
              | FirmID year xvar sic2 freq quintile |
              |-----------------------------------------------|
              1. | 1001 1990 .3 41 10 1 |
              2. | 1002 1990 .4 41 10 2 |
              3. | 1003 1990 .3 41 10 1 |
              4. | 1004 1990 .9 41 10 3 |
              5. | 1005 1990 1.1 41 10 4 |
              |-----------------------------------------------|
              6. | 1006 1990 2 41 10 5 |
              7. | 1007 1990 1 41 10 4 |
              8. | 1008 1990 1.2 41 10 5 |
              9. | 1009 1990 .8 41 10 3 |
              10. | 1010 1990 .1 41 10 1 |
              |-----------------------------------------------|
              11. | 1001 1991 .34 41 9 3 |
              12. | 1002 1991 .36 41 9 3 |
              13. | 1003 1991 .41 41 9 4 |
              14. | 1004 1991 .18 41 9 1 |
              15. | 1005 1991 .23 41 9 2 |
              |-----------------------------------------------|
              16. | 1006 1991 .21 41 9 1 |
              17. | 1007 1991 .22 41 9 2 |
              18. | 1008 1991 1 41 9 4 |
              19. | 1009 1991 1.2 41 9 5 |
              20. | 1100 1990 .23 42 7 2 |
              |-----------------------------------------------|
              21. | 1101 1990 .21 42 7 1 |
              22. | 1102 1990 .2 42 7 1 |
              23. | 1103 1990 .4 42 7 4 |
              24. | 1104 1990 .26 42 7 3 |
              25. | 1105 1990 .31 42 7 3 |
              |-----------------------------------------------|
              26. | 1106 1990 .42 42 7 5 |
              +-----------------------------------------------+

              Comment


              • #8
                If you try -tab sic2 year- on the sample of data you provided, you will find that there are no observations for year 1991 and sic2 42. That is why you are getting the error message. Because that is also the last combination of year and sic2 to be processed in the -foreach- loops, all of your data actually get a quintile computed. But if this were in the midst of your data, the process would terminate there and refuse to go on.

                You have to decide whether this gap in the data is acceptable and expected or not. And I would strongly advise looking closely at the output of -tab sic2 year- to see if there are any other gaps (or partial gaps where there are fewer than 4 observations for a given year and sic2). If such gaps represent an error in your data, the solution is to fix the data. If you expect there to be some gaps like that, then the code needs to be modified to work around them:

                Code:
                 levelsof year, local(years)
                levelsof sic2, local(sic2s)
                  tempvar temp_quintile
                gen int quintile = .
                  foreach y of local years {
                    foreach s of local sic2s {
                        quietly count if year == `y' & sic2 == `s'
                        if `r(N)' >= 4 {
                            xtile `temp_quintile' = xvar if year == `y' & sic2 == `s', nq(5)
                            replace quintile = `temp_quintile' if year == `y' & sic2 == `s'
                            drop `temp_quintile'
                        }
                        else {
                            display "Insufficient data to compute quintiles for year `y' and sic2 `s'"
                        }
                    }
                }
                The above code will compute the quintiles for all combinations of year and sic2 for which at least 4 observations are present in the data. For combinations of year and sic2 lacking at least 4 observations, the quintile variable will be missing. You will then need to decide what to do about those combinations.
                Last edited by Clyde Schechter; 19 Sep 2014, 21:25. Reason: The minimum N for computing quintiles is 4, not 5. Corrected throughout.

                Comment

                Working...
                X