Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dividing panelset into quantiles

    Dear all,

    Currently I am working with a data set of patent data. I declared it as a panel data set, linking all patents to the inventor. I now want to divide the dataset into quantiles based on the characteristic: the average amount of times they are cited per patent. In short, I want to create a variable which basically divides the inventors in 10 groups based on their average amount of citations. However, I cannot figure out how to do this. The reason is so that I can estimate the effect of eacht quantile on my dependent variable. Could someone help?


  • #2
    We have no way to know how your dataset is structured, and thus no way to be very helpful to you. I suspect your question will have a quick and easy answer if you post an example of your data set using the -dataex- command, as described in the FAQ.

    Comment


    • #3
      I agree with Mike Lacy. There is no data example here and I may be wasting my time making guesses.

      But panels presumably aren't guaranteed of equal length.


      Let's suppose that panels are identified by id but we wish each panel to be entered just once into a classification into quantile-based bins according to a variable citations. Here's how to do it. Tag each panel once, run xtile and then spread the bin identifiers to each observation in the panel.


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(id citations)
      1 1
      1 1
      1 1
      1 1
      1 1
      1 1
      2 2
      2 2
      2 2
      2 2
      2 2
      3 3
      3 3
      3 3
      3 3
      4 4
      4 4
      4 4
      5 5
      5 5
      6 6
      end
      
      egen tag = tag(id)
      
      xtile group=citations if tag, nq(3)
      
      bysort id (tag) : replace group = group[_N]
      
      list, sepby(id)
      
           +-----------------------------+
           | id   citati~s   tag   group |
           |-----------------------------|
        1. |  1          1     0       1 |
        2. |  1          1     0       1 |
        3. |  1          1     0       1 |
        4. |  1          1     0       1 |
        5. |  1          1     0       1 |
        6. |  1          1     1       1 |
           |-----------------------------|
        7. |  2          2     0       1 |
        8. |  2          2     0       1 |
        9. |  2          2     0       1 |
       10. |  2          2     0       1 |
       11. |  2          2     1       1 |
           |-----------------------------|
       12. |  3          3     0       2 |
       13. |  3          3     0       2 |
       14. |  3          3     0       2 |
       15. |  3          3     1       2 |
           |-----------------------------|
       16. |  4          4     0       2 |
       17. |  4          4     0       2 |
       18. |  4          4     1       2 |
           |-----------------------------|
       19. |  5          5     0       3 |
       20. |  5          5     1       3 |
           |-----------------------------|
       21. |  6          6     1       3 |
           +-----------------------------+
      .

      It was just easier to invent a toy example in which the number of citations was the same as the identifier, but absolutely nothing here hinges on that. Even more obviously, I classified 6 panels into 3 bins, but your value of 3 just needs to be 10.

      There are slightly grumpy comments here and there on calling bins by the term quantiles. Historically, there's no argument the quantiles are values, or estimated values, not the bins or intervals they delimit. More at e.g. https://journals.sagepub.com/doi/abs...867X1801800311

      Comment


      • #4
        Nick Cox Thank you! That actually helped me perfectly. Next time I will make sure to add the data.

        Comment


        • #5
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float inventorID int cites5yr
           1  0
           2  2
           3  1
           3  1
           4  2
           5  0
           6  0
           6  7
           7  7
           8  2
           9  3
          10  1
          11  0
          12  2
          12  0
          12  3
          13  0
          14  0
          15  0
          16  1
          16  2
          16  1
          16  3
          16  3
          16  0
          17  1
          17  0
          17  2
          18  0
          19  0
          20  2
          21 15
          22  0
          23  8
          23  0
          23  2
          24  9
          25  4
          25  1
          26  2
          27  5
          28  6
          28  0
          28  2
          28  0
          28  0
          28  0
          29  0
          30  5
          30  0
          30  1
          30  0
          31  6
          31 12
          31  7
          31  0
          31  0
          32  3
          33  0
          34  1
          34  1
          34  2
          35  0
          36  8
          37  0
          37  0
          37  0
          38  4
          39  0
          39  0
          40  1
          41  1
          42  0
          42  2
          43  0
          44 11
          45  2
          45  0
          46  0
          47  0
          47  0
          48  0
          48  1
          48  5
          48  0
          48  1
          48  4
          48  4
          48  8
          48  0
          48  1
          48  6
          48  5
          48  2
          48  0
          48  4
          48  0
          48  3
          48  2
          48  0
          end
          so this is the data. I thought it worked perfectly, but no person gets divided into bins 2 to 4. Hence, each person gets either sorted into group 1 or 5 to 10. Is there an explanation for that?

          Comment


          • #6
            See the reference cited in #3 and also https://www.stata-journal.com/articl...article=pr0054

            In one word: ties!

            About 40% of your panels have 0 cites, and they must all belong in the same bin.

            With your data, this graph shows one symbol per panel:


            Code:
            egen tag = tag(inventorID)
            
            xtile decile=cites if tag, nq(10)
            
             
            quantile cites if tag, mla(decile) mlabpos(0) ms(none) rlopts(lc(none)) yla(, ang(h)) xla(0 "0" 1 "1" 0.1(0.1)0.9, format(%02.1f))
            Click image for larger version

Name:	xtitleprob.png
Views:	1
Size:	19.1 KB
ID:	1501526

            Comment

            Working...
            X