dividing panelset into quantiles

Frederick den Hartog

Join Date: Jun 2019

Posts: 5
#1

dividing panelset into quantiles

03 Jun 2019, 10:07

Dear all,

Currently I am working with a data set of patent data. I declared it as a panel data set, linking all patents to the inventor. I now want to divide the dataset into quantiles based on the characteristic: the average amount of times they are cited per patent. In short, I want to create a variable which basically divides the inventors in 10 groups based on their average amount of citations. However, I cannot figure out how to do this. The reason is so that I can estimate the effect of eacht quantile on my dependent variable. Could someone help?
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2407
#2

03 Jun 2019, 10:30

We have no way to know how your dataset is structured, and thus no way to be very helpful to you. I suspect your question will have a quick and easy answer if you post an example of your data set using the -dataex- command, as described in the FAQ.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35486

03 Jun 2019, 10:33

I agree with Mike Lacy. There is no data example here and I may be wasting my time making guesses.

But panels presumably aren't guaranteed of equal length.

Let's suppose that panels are identified by id but we wish each panel to be entered just once into a classification into quantile-based bins according to a variable citations. Here's how to do it. Tag each panel once, run xtile and then spread the bin identifiers to each observation in the panel.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id citations)
1 1
1 1
1 1
1 1
1 1
1 1
2 2
2 2
2 2
2 2
2 2
3 3
3 3
3 3
3 3
4 4
4 4
4 4
5 5
5 5
6 6
end

egen tag = tag(id)

xtile group=citations if tag, nq(3)

bysort id (tag) : replace group = group[_N]

list, sepby(id)

     +-----------------------------+
     | id   citati~s   tag   group |
     |-----------------------------|
  1. |  1          1     0       1 |
  2. |  1          1     0       1 |
  3. |  1          1     0       1 |
  4. |  1          1     0       1 |
  5. |  1          1     0       1 |
  6. |  1          1     1       1 |
     |-----------------------------|
  7. |  2          2     0       1 |
  8. |  2          2     0       1 |
  9. |  2          2     0       1 |
 10. |  2          2     0       1 |
 11. |  2          2     1       1 |
     |-----------------------------|
 12. |  3          3     0       2 |
 13. |  3          3     0       2 |
 14. |  3          3     0       2 |
 15. |  3          3     1       2 |
     |-----------------------------|
 16. |  4          4     0       2 |
 17. |  4          4     0       2 |
 18. |  4          4     1       2 |
     |-----------------------------|
 19. |  5          5     0       3 |
 20. |  5          5     1       3 |
     |-----------------------------|
 21. |  6          6     1       3 |
     +-----------------------------+

.

It was just easier to invent a toy example in which the number of citations was the same as the identifier, but absolutely nothing here hinges on that. Even more obviously, I classified 6 panels into 3 bins, but your value of 3 just needs to be 10.

There are slightly grumpy comments here and there on calling bins by the term quantiles. Historically, there's no argument the quantiles are values, or estimated values, not the bins or intervals they delimit. More at e.g. https://journals.sagepub.com/doi/abs...867X1801800311

Comment

Frederick den Hartog

Join Date: Jun 2019

Posts: 5
#4

04 Jun 2019, 04:39

Nick Cox Thank you! That actually helped me perfectly. Next time I will make sure to add the data.
Comment

Frederick den Hartog

Join Date: Jun 2019
Posts: 5

04 Jun 2019, 05:58

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float inventorID int cites5yr
 1  0
 2  2
 3  1
 3  1
 4  2
 5  0
 6  0
 6  7
 7  7
 8  2
 9  3
10  1
11  0
12  2
12  0
12  3
13  0
14  0
15  0
16  1
16  2
16  1
16  3
16  3
16  0
17  1
17  0
17  2
18  0
19  0
20  2
21 15
22  0
23  8
23  0
23  2
24  9
25  4
25  1
26  2
27  5
28  6
28  0
28  2
28  0
28  0
28  0
29  0
30  5
30  0
30  1
30  0
31  6
31 12
31  7
31  0
31  0
32  3
33  0
34  1
34  1
34  2
35  0
36  8
37  0
37  0
37  0
38  4
39  0
39  0
40  1
41  1
42  0
42  2
43  0
44 11
45  2
45  0
46  0
47  0
47  0
48  0
48  1
48  5
48  0
48  1
48  4
48  4
48  8
48  0
48  1
48  6
48  5
48  2
48  0
48  4
48  0
48  3
48  2
48  0
end

so this is the data. I thought it worked perfectly, but no person gets divided into bins 2 to 4. Hence, each person gets either sorted into group 1 or 5 to 10. Is there an explanation for that?

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35486
#6

04 Jun 2019, 06:13

See the reference cited in #3 and also https://www.stata-journal.com/articl...article=pr0054

In one word: ties!

About 40% of your panels have 0 cites, and they must all belong in the same bin.

With your data, this graph shows one symbol per panel:

Code:

egen tag = tag(inventorID) xtile decile=cites if tag, nq(10) quantile cites if tag, mla(decile) mlabpos(0) ms(none) rlopts(lc(none)) yla(, ang(h)) xla(0 "0" 1 "1" 0.1(0.1)0.9, format(%02.1f))
Comment

Announcement