Regression by quantile group

Anne-Claire Jo

Join Date: Feb 2021
Posts: 162

Regression by quantile group

13 May 2025, 09:10

Hello stata users,

I am working with cross-country industry-level panel data and I am trying to make a regression of variable "jcr" & "jdr" on growth across different quantile groups.
My dataset looks like:

Code:

* Example generated by -dataex-. For more info, type help dataex

clear
input str3 country int year byte ind double growth byte quantile float(jcr jdr)
"A" 2002 3     .672191858291626 1    -.1229408   -.2389851
"A" 2002 3   .13869287073612213 2  -.033738595  -.11796902
"A" 2002 3  .022968925535678864 3   .004111842  -.05286625
"A" 2002 3 .0017980386037379503 4  -.011298022  -.06083192
"A" 2002 3  -.16667737066745758 5   .068663485 -.013689965
"A" 2002 3  -.17840731143951416 6  -.006880734  -.15333694
"A" 2003 3    .7250598669052124 1    -.2283749   -.3857791
"A" 2003 3    .1564776450395584 2  -.018166976  -.09158104
"A" 2003 3  .051283758133649826 3   -.05724212  -.09636812
"A" 2003 3  .023065701127052307 4    -.0402614  -.06571764
"A" 2003 3  -.20438718795776367 5  .0044542346  -.05645589
"A" 2003 3  -.36091873049736023 6   .016714025   -.0307614
"A" 2004 3    .5813043117523193 1   -.29487726   -.3380778
"A" 2004 3    .1335824579000473 2     .0070048  -.06939671
"A" 2004 3   .07415185123682022 3  -.029579455   -.0826437
"A" 2004 3   .04221270605921745 4   -.04617803  -.08350244
"A" 2004 3  -.12049635499715805 5      .044223 -.028452955
"A" 2004 3    .7893650531768799 6    .23076923  -.12546878
"A" 2005 3    .7710216641426086 1   -.10427842   -.2658622
"A" 2005 3    .1161515936255455 2   -.10758134  -.16785954
"A" 2005 3  .027848316356539726 3   .031212064  -.03418614
"A" 2005 3  -.01820971444249153 4  -.001233272  -.04062689
"A" 2005 3  -.18122969567775726 5    .01013001  -.04751295
"A" 2005 3                    . 6    .01120852  -.04151956
"A" 2006 3   .39854076504707336 1    .02969432  -.16627036
"A" 2006 3   .08353615552186966 2   -.04770102  -.10662634
"A" 2006 3  .027338284999132156 3 -.0021104466  -.07062218
"A" 2006 3 -.025064121931791306 4   .008168118  -.03237979
"A" 2006 3  -.19563138484954834 5   .007862739   -.0444931
"A" 2006 3                    . 6   -.02727273  -.06576048
"A" 2007 3    .2896402180194855 1  .0016892842   -.0936298
"A" 2007 3 -.010519917123019695 2   -.06810414  -.11413156
"A" 2007 3  -.09132310748100281 3   -.04691949   -.1015541
"A" 2007 3    -.152311772108078 4   .016302703 -.035093993
"A" 2007 3   -.1548743098974228 5   -.03119042   -.0533259
"A" 2007 3                    . 6   -.27761194   -.2806777
"A" 2008 3    .4056224822998047 1    -.0411319   -.1085815
"A" 2008 3   .02111530490219593 2  -.010045745  -.07682251
"A" 2008 3  -.03370849788188934 3   -.04762908   -.0966561
"A" 2008 3  -.05902251601219177 4  -.034295693  -.06143871
"A" 2008 3  -.23769980669021606 5  -.035204045  -.04672369
"A" 2008 3                    . 6  -.072829135   -.1020152
"A" 2009 3    .5678138732910156 1     -.191908   -.2166389
"A" 2009 3   .02235862798988819 2    -.0951769  -.13724859
"A" 2009 3  .004819469526410103 3    -.1038592  -.13945746
"A" 2009 3  -.07165567576885223 4   -.01909801  -.05106484
"A" 2009 3   -.1486140936613083 5   -.04328514  -.05875266
"A" 2009 3                    . 6   -.17534094  -.22955263
"A" 2010 3    .3793466091156006 1   -.19918746  -.22900696
"A" 2010 3  -.06447786837816238 2   -.08834474  -.14494775
"A" 2010 3  -.13184867799282074 3   -.06881586  -.12275226
"A" 2010 3   -.1692247837781906 4  -.015930763  -.05490859
"A" 2010 3   -.2831023931503296 5  -.015777078  -.03964461
"A" 2010 3                    . 6     .0885202  -.14853851
"A" 2011 3    .5959907174110413 1    -.3571327   -.3936089
"A" 2011 3  .011601963080465794 2   -.14843367  -.18707436
"A" 2011 3 -.051778193563222885 3   .028758563  -.06055511
"A" 2011 3  -.10236292332410812 4   -.00695848  -.04402136
"A" 2011 3  -.20277179777622223 5 -.0022399058  -.04195277
"A" 2011 3                    . 6  -.034061827   -.1164549
"A" 2012 3    .7737371325492859 1    -.1421047   -.2277383
"A" 2012 3   .20508399605751038 2   -.06518236  -.12053429
"A" 2012 3   .01851370744407177 3   -.06133205  -.12030085
"A" 2012 3 -.005175860598683357 4  -.033680763    -.067842
"A" 2012 3    -.166717529296875 5 -.0021038372  -.06958006
"A" 2012 3                    . 6     -.305692   -.3566578
"A" 2013 3    .7758398652076721 1     -.214703  -.27317035
"A" 2013 3   .12453848123550415 2   -.05056649   -.0967647
"A" 2013 3 .0033996901474893093 3    .03965649 -.066506475
"A" 2013 3  .004305284004658461 4   -.01898581  -.06434934
"A" 2013 3  -.07516103982925415 5  -.022770027  -.04234344
"A" 2013 3                    . 6   .010594276  -.06073594
"A" 2014 3    .8845993280410767 1     -.127823   -.1997209
"A" 2014 3   .10411844402551651 2    -.0556562  -.12020043
"A" 2014 3   .08039402961730957 3   -.05245333  -.12238398
"A" 2014 3  .030378490686416626 4   .006042691  -.05860734
"A" 2014 3   -.1739516258239746 5   -.01096147   -.0391375
"A" 2014 3                    . 6    .10652217   -.1263277
"A" 2015 3    .6916394233703613 1    -.1605144  -.24799043
"A" 2015 3    .1250390261411667 2   -.04474998  -.10783525
"A" 2015 3  .031132899224758148 3  .0011184321  -.07011509
"A" 2015 3  .031201032921671867 4     .0431358  -.02584259
"A" 2015 3   -.1113152801990509 5    .01262757 -.019299036
"A" 2015 3                    . 6    -.5246286   -.5481781
"A" 2016 3    .6162927150726318 1   -.19741857  -.28648126
"A" 2016 3   .09826809167861938 2   -.05692242  -.10655385
"A" 2016 3  .010408439673483372 3 -.0040609976  -.06685121
"A" 2016 3 -.034870076924562454 4  -.016038928  -.05093071
"A" 2016 3  -.17934803664684296 5   .014396247  -.02446498
"A" 2016 3                    . 6   -.10151955   -.1657626
"A" 2017 3    .5765115022659302 1    -.2072452   -.3230486
"A" 2017 3   .10422269999980927 2   -.06889203  -.11480728
"A" 2017 3 -.010332309640944004 3   -.01343173 -.062518075
"A" 2017 3  .010408302769064903 4  .0039381958  -.03178757
"A" 2017 3  -.19977127015590668 5   .021094946 -.017537108
"A" 2017 3                    . 6  -.016029337  -.06462012
"A" 2018 3    .6575387120246887 1   -.23188405   -.3012422
"A" 2018 3   .10164050757884979 2   -.11843801   -.1689077
"A" 2018 3   .05479102581739426 3  -.008364083  -.06457564
"A" 2018 3  .004252022132277489 4   -.06862622  -.11158565
end
label values ind ind_labels
label def ind_label 3 "M", modify
label values quantile quantilelabels
label def quantilelabels 1 "0-10", modify
label def quantilelabels 2 "10-40", modify
label def quantilelabels 3 "40-60", modify
label def quantilelabels 4 "60-90", modify
label def quantilelabels 5 "90-100", modify
label def quantilelabels 6 "Unknown", modify

I would like to generate a regress jcr and jdr on growth across quantile groups (in one regression).
I would preferably use -reghdfe- with weighting (w = LP_sum_w) in the code like reghdge [aw=w], with absorb vce(cluster ...) options for country*industry*year fixed effects.
But the problem is I am unsure how to perform this regression since xtset country year is not working as there are multiple observations for each country industry year as well.
Maybe my question is unclear but hope someone could help me with this issue, please!

Thanks!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

13 May 2025, 09:53

Your data are incompatible with -xtset country year-. But before we get into that, why do you want to -xtset- this data in the first place? If you are using -reghdfe-, you do not need to -xtset- the data. So just run your -reghdfe-.

To use -xtset country year- you would need country and year to jointly uniquely identify observations, which, as you have observed you do not. So you need to omit the time variable from your -xtset- command. Assuming you really needed -xtset- (say you were going to use -xtreg, fe- instead of -reghdfe-) you could do that by running -xtset country- with no time variable. Or, if you needed the fixed effect to also incorporate the industry, you could do that with two commands:

Code:

egen country_ind = group(country ind) xtset country_ind

again with no time variable.

What are the consequences of using -xtset- without a time variable. It means that you cannot use time-series operators like lags and leads, and you cannot model autoregressive structure. It is not just some arbitrary rule that Stata imposes. If you think about it, you cannot even make sense of the concept of lags and leads when the panel and time fixed effects do not uniquely identify observations. Consider an observation with country == "A" and year == 2003. What would the first lag of that observation be? In your example data there are 6 different observations with country == "A" and year == 2002, and, I suspect, in the real data which presumably contains more than one industry, there are probably many more such observations. Which one would be "the lag?" There is no answer to that question.

So if you have no need of time-series operators or autoregressive error structures, you can proceed with -xtset country- (or, -xtset country ind-, or, if you are just using -reghdfe-, you can skip -xtset- altogether). If you think you do need time-series operators or autoregressive error structures, then you need to rethink what you are doing. My best guess in that case is that the combination of country ind quantile and year would uniquely identify data, and if that is true, it probably makes sense to consider the lag for an observation with a given country and year to be the unique observation in the preceding year that has not just the same country and year, but also the same industry and quantile. So in that case you could

Code:

egen compound_fe = group(country ind quantile) xtset compound_fe year

and proceed.
1 like
Comment
Anne-Claire Jo

Join Date: Feb 2021

Posts: 162
#3

14 May 2025, 06:30

Clyde Schechter thanks for your reply.

I tried

Code:

local quant quantile egen id = group(country ind_a7 `quant') xtset id year

and there is an error:

Code:

. egen id = group(country ind_a7 `quant') . xtset id year repeated time values within panel

And for the regression, if I want to run the regression across quantile group, then should it be like:

Code:

reghdfe growth jcr jdr [aw=w],a(i.cty#i.ind#i.year) vce(cluster i.cty#i.ind#i.`quant')

?
I wanted to make a regression jcr/jdr of each quantile group on growth.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#4

14 May 2025, 08:59

With regard to your -xtset- error, two things. There is no variable ind_a7 in your example data. You do have a variable there called ind. Is ind_a7 supposed to be the same thing? On the assumption that ind_a7 and ind are supposed to be the same thing (and for some reason you renamed it to ind when you posted your example), I cannot reproduce the problem you show:

Code:

. local quant quantile . egen id = group(country ind `quant') . xtset id year Panel variable: id (unbalanced) Time variable: year, 2002 to 2018 Delta: 1 unit

So, to look into this farther, you will have to post a new data example that reproduces the problem you are having. It still does not seem you need to -xtset- your data in the first place. However, the contrast between your result in #3 and mine here suggest that there is something going on in the data that neither you nor I understand, and this may be Stata's way of telling you that your data are in need of repair.

I'm not 100% certain I understand what you want to do with your regressions. It sounds like you want to run 6 separate regressions of growth on jdr and jcr, one regression for each of the 6 quantiles. And it also sounds like you want to do it with a single command, perhaps to make it possible to compare the results of the different regressions easily using the -margins- command. That suggests setting up an interaction between quantile and the jdr, jcr variables. But there are issues about the appropriate selection of the absorbed effects and the clustering. I don't think quantile should be part of the clustering in this situation where you are, in effect, doing separate regressions on each quantile: you would be setting up a single cluster in that way. Moreover, including country, ind, and year in the absorbed effects would be, at least according to the data example I have, reducing each "panel" to a single observation within each quantile--which will preclude getting standard errors, test statistics, and confidence intervals. My best guess, but it is only a guess, is that you want something like this:

Code:

encode country, gen(n_country) reghdfe growth i.quantile##c.(jdr jcr), absorb(n_country ind year) vce(cluster n_country ind)

This does not produce usable results in your data example, but that is because your data example contains only one country and one ind. A better data example is needed for better advice, as well as to figure out what is going on with your -xtset- attempt.
1 like
Comment

Anne-Claire Jo

Join Date: Feb 2021
Posts: 162

14 May 2025, 10:48

[CODE]

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str3 country int year byte(ind quantile) double growth float(jcr jdr)
"G" 2004 3 1    1.140887975692749  -.09449836  -.2254811
"F" 2001 3 1     .275103539228439  -.09505257  -.1898155
"C" 2021 3 1   .12128254771232605   -.1671006 -.27766332
"C" 2014 3 1     .427598237991333  -.10552166 -.17084736
"R" 2000 3 1    .4636540710926056  -.22107357 -.25503013
"S" 2020 3 1    .5561659932136536   -.1391513  -.2022949
"H" 2001 3 1   1.0553784370422363   .04528788 -.13052186
"P" 2014 3 1    .5561138987541199  -.04665263 -.08069968
"R" 2002 3 1   .36146026849746704  -.17984806 -.25376314
"B" 2012 3 1                    .           .          .
"E" 2022 3 1                    .           .          .
"B" 2022 3 1                    .           .          .
"E" 2000 3 1   .24927306175231934  -.20075756 -.28940502
"S" 2015 3 1    .4388723075389862   -.1844301 -.36970705
"C" 2000 3 1    .3916475772857666  -.25427234  -.3280953
"F" 2005 3 1   .08298581093549728   -.3419062  -.3724806
"R" 2016 3 1    .9854686260223389  -.25202978  -.3305247
"I" 2020 3 1   1.1019179821014404  -.05275162 -.10910322
"S" 2016 3 1    .3959568440914154  -.24525705 -.28809455
"C" 2008 3 1   .20407089591026306   -.2016781  -.2362006
"S" 2020 3 1 -.003734334371984005  .026068375 -.03981481
"F" 2003 3 1      .58949214220047  -.26517788 -.27176937
"R" 2010 3 1    .3289223611354828  -.19918746 -.22900696
"F" 2014 3 1   .17456720769405365  -.06475763 -.09883963
"F" 2006 3 1    .1594994068145752   -.2708372  -.3080709
"C" 2013 3 1   .20407089591026306   -.2016781  -.2362006
"C" 2015 3 1   .20407089591026306   -.2016781  -.2362006
"C" 2010 3 1   .20407089591026306   -.2016781  -.2362006
end
label values ind ind_labels
label def ind_labels 3 "M", modify
label values quantile quantilelabels
label def quantilelabels 1 "0-10", modify

Clyde Schechter maybe this can help?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#6

14 May 2025, 12:20

So, look at the following observations

Code:

country year ind quantile growth jcr jdr S 2020 M 0-10 -.00373433 .0260684 -.0398148 S 2020 M 0-10 .55616599 -.1391513 -.2022949

They have the same country year, industry and quantile. That is why the code shown in #3 fails.

The question now becomes
Is the data wrong, and, if so, how to correct it?

If the data set is supposed to be, for example, firm-level data, then there is nothing wrong with there being two observations for the same country, year, industry and quantile, provided they are for different firms. In that case the data is correct and the code is wrong. To -xtset- this kind of data, one would have to instead do something like:

Code:

egen id = group(country year ind firm quantile) xtset id year

I want to emphasize once more, however, that you have not so far shown any reason for needing to -xtset- this data. So, in my view, the issue here is whether this is important in its own right or only because it is telling us that there may be something wrong with the data.

However, if the data set is supposed to be country-year-industry-quantile level data (so each observation is about all the firms in that country, industry, and quantile in that year, not just about one of them), then there is a problem because these two observations contradict each other. There may also be other such contradictions in the full data set. To find them all, run

Code:

duplicates tag country ind year quantile, gen(flag) browse if flag

Once you have found all the conflicting observations, you have to figure out which of them is (are) correct, if any. You will have to eliminate all, or all but one of the existing observations for each country-year-industry-quantile group of observations and retain the correct one if it is present, or provide a correct one if it is not present. However, I recommend against simply patching up this data set in that way. Evidently something went wrong in the data management that produced this data set. We have found problems in it, and that raises the suspicion that there may be other problems we have yet to uncover. So you should review the entire stream of data management from the original source file(s) down to the current data set. At a minimum, find the point at which the spurious observations got mistakenly put into the data set (or failed to be eliminated from it at the appropriate point), and fix the code that did that. But while you are doing that, be alert to other possible errors in the data management code, and fix any that you find along the way. Then re-generate the data set using the corrected code.

If it turns out that your data management code is correct and the error is in the source files you were provided, then you have to contact the purveyor of those files, inform them of the error, and ask them to provide corrected files.
1 like
Comment

Announcement

Regression by quantile group

Comment

Comment

Comment

Comment

Comment