Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression by quantile group

    Hello stata users,


    I am working with cross-country industry-level panel data and I am trying to make a regression of variable "jcr" & "jdr" on growth across different quantile groups.
    My dataset looks like:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    
    clear
    input str3 country int year byte ind double growth byte quantile float(jcr jdr)
    "A" 2002 3     .672191858291626 1    -.1229408   -.2389851
    "A" 2002 3   .13869287073612213 2  -.033738595  -.11796902
    "A" 2002 3  .022968925535678864 3   .004111842  -.05286625
    "A" 2002 3 .0017980386037379503 4  -.011298022  -.06083192
    "A" 2002 3  -.16667737066745758 5   .068663485 -.013689965
    "A" 2002 3  -.17840731143951416 6  -.006880734  -.15333694
    "A" 2003 3    .7250598669052124 1    -.2283749   -.3857791
    "A" 2003 3    .1564776450395584 2  -.018166976  -.09158104
    "A" 2003 3  .051283758133649826 3   -.05724212  -.09636812
    "A" 2003 3  .023065701127052307 4    -.0402614  -.06571764
    "A" 2003 3  -.20438718795776367 5  .0044542346  -.05645589
    "A" 2003 3  -.36091873049736023 6   .016714025   -.0307614
    "A" 2004 3    .5813043117523193 1   -.29487726   -.3380778
    "A" 2004 3    .1335824579000473 2     .0070048  -.06939671
    "A" 2004 3   .07415185123682022 3  -.029579455   -.0826437
    "A" 2004 3   .04221270605921745 4   -.04617803  -.08350244
    "A" 2004 3  -.12049635499715805 5      .044223 -.028452955
    "A" 2004 3    .7893650531768799 6    .23076923  -.12546878
    "A" 2005 3    .7710216641426086 1   -.10427842   -.2658622
    "A" 2005 3    .1161515936255455 2   -.10758134  -.16785954
    "A" 2005 3  .027848316356539726 3   .031212064  -.03418614
    "A" 2005 3  -.01820971444249153 4  -.001233272  -.04062689
    "A" 2005 3  -.18122969567775726 5    .01013001  -.04751295
    "A" 2005 3                    . 6    .01120852  -.04151956
    "A" 2006 3   .39854076504707336 1    .02969432  -.16627036
    "A" 2006 3   .08353615552186966 2   -.04770102  -.10662634
    "A" 2006 3  .027338284999132156 3 -.0021104466  -.07062218
    "A" 2006 3 -.025064121931791306 4   .008168118  -.03237979
    "A" 2006 3  -.19563138484954834 5   .007862739   -.0444931
    "A" 2006 3                    . 6   -.02727273  -.06576048
    "A" 2007 3    .2896402180194855 1  .0016892842   -.0936298
    "A" 2007 3 -.010519917123019695 2   -.06810414  -.11413156
    "A" 2007 3  -.09132310748100281 3   -.04691949   -.1015541
    "A" 2007 3    -.152311772108078 4   .016302703 -.035093993
    "A" 2007 3   -.1548743098974228 5   -.03119042   -.0533259
    "A" 2007 3                    . 6   -.27761194   -.2806777
    "A" 2008 3    .4056224822998047 1    -.0411319   -.1085815
    "A" 2008 3   .02111530490219593 2  -.010045745  -.07682251
    "A" 2008 3  -.03370849788188934 3   -.04762908   -.0966561
    "A" 2008 3  -.05902251601219177 4  -.034295693  -.06143871
    "A" 2008 3  -.23769980669021606 5  -.035204045  -.04672369
    "A" 2008 3                    . 6  -.072829135   -.1020152
    "A" 2009 3    .5678138732910156 1     -.191908   -.2166389
    "A" 2009 3   .02235862798988819 2    -.0951769  -.13724859
    "A" 2009 3  .004819469526410103 3    -.1038592  -.13945746
    "A" 2009 3  -.07165567576885223 4   -.01909801  -.05106484
    "A" 2009 3   -.1486140936613083 5   -.04328514  -.05875266
    "A" 2009 3                    . 6   -.17534094  -.22955263
    "A" 2010 3    .3793466091156006 1   -.19918746  -.22900696
    "A" 2010 3  -.06447786837816238 2   -.08834474  -.14494775
    "A" 2010 3  -.13184867799282074 3   -.06881586  -.12275226
    "A" 2010 3   -.1692247837781906 4  -.015930763  -.05490859
    "A" 2010 3   -.2831023931503296 5  -.015777078  -.03964461
    "A" 2010 3                    . 6     .0885202  -.14853851
    "A" 2011 3    .5959907174110413 1    -.3571327   -.3936089
    "A" 2011 3  .011601963080465794 2   -.14843367  -.18707436
    "A" 2011 3 -.051778193563222885 3   .028758563  -.06055511
    "A" 2011 3  -.10236292332410812 4   -.00695848  -.04402136
    "A" 2011 3  -.20277179777622223 5 -.0022399058  -.04195277
    "A" 2011 3                    . 6  -.034061827   -.1164549
    "A" 2012 3    .7737371325492859 1    -.1421047   -.2277383
    "A" 2012 3   .20508399605751038 2   -.06518236  -.12053429
    "A" 2012 3   .01851370744407177 3   -.06133205  -.12030085
    "A" 2012 3 -.005175860598683357 4  -.033680763    -.067842
    "A" 2012 3    -.166717529296875 5 -.0021038372  -.06958006
    "A" 2012 3                    . 6     -.305692   -.3566578
    "A" 2013 3    .7758398652076721 1     -.214703  -.27317035
    "A" 2013 3   .12453848123550415 2   -.05056649   -.0967647
    "A" 2013 3 .0033996901474893093 3    .03965649 -.066506475
    "A" 2013 3  .004305284004658461 4   -.01898581  -.06434934
    "A" 2013 3  -.07516103982925415 5  -.022770027  -.04234344
    "A" 2013 3                    . 6   .010594276  -.06073594
    "A" 2014 3    .8845993280410767 1     -.127823   -.1997209
    "A" 2014 3   .10411844402551651 2    -.0556562  -.12020043
    "A" 2014 3   .08039402961730957 3   -.05245333  -.12238398
    "A" 2014 3  .030378490686416626 4   .006042691  -.05860734
    "A" 2014 3   -.1739516258239746 5   -.01096147   -.0391375
    "A" 2014 3                    . 6    .10652217   -.1263277
    "A" 2015 3    .6916394233703613 1    -.1605144  -.24799043
    "A" 2015 3    .1250390261411667 2   -.04474998  -.10783525
    "A" 2015 3  .031132899224758148 3  .0011184321  -.07011509
    "A" 2015 3  .031201032921671867 4     .0431358  -.02584259
    "A" 2015 3   -.1113152801990509 5    .01262757 -.019299036
    "A" 2015 3                    . 6    -.5246286   -.5481781
    "A" 2016 3    .6162927150726318 1   -.19741857  -.28648126
    "A" 2016 3   .09826809167861938 2   -.05692242  -.10655385
    "A" 2016 3  .010408439673483372 3 -.0040609976  -.06685121
    "A" 2016 3 -.034870076924562454 4  -.016038928  -.05093071
    "A" 2016 3  -.17934803664684296 5   .014396247  -.02446498
    "A" 2016 3                    . 6   -.10151955   -.1657626
    "A" 2017 3    .5765115022659302 1    -.2072452   -.3230486
    "A" 2017 3   .10422269999980927 2   -.06889203  -.11480728
    "A" 2017 3 -.010332309640944004 3   -.01343173 -.062518075
    "A" 2017 3  .010408302769064903 4  .0039381958  -.03178757
    "A" 2017 3  -.19977127015590668 5   .021094946 -.017537108
    "A" 2017 3                    . 6  -.016029337  -.06462012
    "A" 2018 3    .6575387120246887 1   -.23188405   -.3012422
    "A" 2018 3   .10164050757884979 2   -.11843801   -.1689077
    "A" 2018 3   .05479102581739426 3  -.008364083  -.06457564
    "A" 2018 3  .004252022132277489 4   -.06862622  -.11158565
    end
    label values ind ind_labels
    label def ind_label 3 "M", modify
    label values quantile quantilelabels
    label def quantilelabels 1 "0-10", modify
    label def quantilelabels 2 "10-40", modify
    label def quantilelabels 3 "40-60", modify
    label def quantilelabels 4 "60-90", modify
    label def quantilelabels 5 "90-100", modify
    label def quantilelabels 6 "Unknown", modify

    I would like to generate a regress jcr and jdr on growth across quantile groups (in one regression).
    I would preferably use -reghdfe- with weighting (w = LP_sum_w) in the code like reghdge [aw=w], with absorb vce(cluster ...) options for country*industry*year fixed effects.
    But the problem is I am unsure how to perform this regression since xtset country year is not working as there are multiple observations for each country industry year as well.
    Maybe my question is unclear but hope someone could help me with this issue, please!

    Thanks!





  • #2
    Your data are incompatible with -xtset country year-. But before we get into that, why do you want to -xtset- this data in the first place? If you are using -reghdfe-, you do not need to -xtset- the data. So just run your -reghdfe-.

    To use -xtset country year- you would need country and year to jointly uniquely identify observations, which, as you have observed you do not. So you need to omit the time variable from your -xtset- command. Assuming you really needed -xtset- (say you were going to use -xtreg, fe- instead of -reghdfe-) you could do that by running -xtset country- with no time variable. Or, if you needed the fixed effect to also incorporate the industry, you could do that with two commands:
    Code:
    egen country_ind = group(country ind)
    xtset country_ind
    again with no time variable.

    What are the consequences of using -xtset- without a time variable. It means that you cannot use time-series operators like lags and leads, and you cannot model autoregressive structure. It is not just some arbitrary rule that Stata imposes. If you think about it, you cannot even make sense of the concept of lags and leads when the panel and time fixed effects do not uniquely identify observations. Consider an observation with country == "A" and year == 2003. What would the first lag of that observation be? In your example data there are 6 different observations with country == "A" and year == 2002, and, I suspect, in the real data which presumably contains more than one industry, there are probably many more such observations. Which one would be "the lag?" There is no answer to that question.

    So if you have no need of time-series operators or autoregressive error structures, you can proceed with -xtset country- (or, -xtset country ind-, or, if you are just using -reghdfe-, you can skip -xtset- altogether). If you think you do need time-series operators or autoregressive error structures, then you need to rethink what you are doing. My best guess in that case is that the combination of country ind quantile and year would uniquely identify data, and if that is true, it probably makes sense to consider the lag for an observation with a given country and year to be the unique observation in the preceding year that has not just the same country and year, but also the same industry and quantile. So in that case you could
    Code:
    egen compound_fe = group(country ind quantile)
    xtset compound_fe year
    and proceed.

    Comment


    • #3
      Clyde Schechter thanks for your reply.

      I tried
      Code:
      local quant quantile
      egen id = group(country ind_a7 `quant')
      xtset id year
      and there is an error:

      Code:
      . egen id = group(country ind_a7 `quant')
      
      . xtset id year
      repeated time values within panel
      And for the regression, if I want to run the regression across quantile group, then should it be like:

      Code:
      reghdfe growth jcr jdr [aw=w],a(i.cty#i.ind#i.year) vce(cluster i.cty#i.ind#i.`quant')
      ?
      I wanted to make a regression jcr/jdr of each quantile group on growth.

      Comment


      • #4
        With regard to your -xtset- error, two things. There is no variable ind_a7 in your example data. You do have a variable there called ind. Is ind_a7 supposed to be the same thing? On the assumption that ind_a7 and ind are supposed to be the same thing (and for some reason you renamed it to ind when you posted your example), I cannot reproduce the problem you show:
        Code:
        . local quant quantile
        
        . egen id = group(country ind `quant')
        
        . xtset id year
        
        Panel variable: id (unbalanced)
         Time variable: year, 2002 to 2018
                 Delta: 1 unit
        So, to look into this farther, you will have to post a new data example that reproduces the problem you are having. It still does not seem you need to -xtset- your data in the first place. However, the contrast between your result in #3 and mine here suggest that there is something going on in the data that neither you nor I understand, and this may be Stata's way of telling you that your data are in need of repair.

        I'm not 100% certain I understand what you want to do with your regressions. It sounds like you want to run 6 separate regressions of growth on jdr and jcr, one regression for each of the 6 quantiles. And it also sounds like you want to do it with a single command, perhaps to make it possible to compare the results of the different regressions easily using the -margins- command. That suggests setting up an interaction between quantile and the jdr, jcr variables. But there are issues about the appropriate selection of the absorbed effects and the clustering. I don't think quantile should be part of the clustering in this situation where you are, in effect, doing separate regressions on each quantile: you would be setting up a single cluster in that way. Moreover, including country, ind, and year in the absorbed effects would be, at least according to the data example I have, reducing each "panel" to a single observation within each quantile--which will preclude getting standard errors, test statistics, and confidence intervals. My best guess, but it is only a guess, is that you want something like this:
        Code:
        encode country, gen(n_country)
        reghdfe growth i.quantile##c.(jdr jcr), absorb(n_country ind year) vce(cluster n_country ind)
        This does not produce usable results in your data example, but that is because your data example contains only one country and one ind. A better data example is needed for better advice, as well as to figure out what is going on with your -xtset- attempt.

        Comment


        • #5
          [CODE]

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str3 country int year byte(ind quantile) double growth float(jcr jdr)
          "G" 2004 3 1    1.140887975692749  -.09449836  -.2254811
          "F" 2001 3 1     .275103539228439  -.09505257  -.1898155
          "C" 2021 3 1   .12128254771232605   -.1671006 -.27766332
          "C" 2014 3 1     .427598237991333  -.10552166 -.17084736
          "R" 2000 3 1    .4636540710926056  -.22107357 -.25503013
          "S" 2020 3 1    .5561659932136536   -.1391513  -.2022949
          "H" 2001 3 1   1.0553784370422363   .04528788 -.13052186
          "P" 2014 3 1    .5561138987541199  -.04665263 -.08069968
          "R" 2002 3 1   .36146026849746704  -.17984806 -.25376314
          "B" 2012 3 1                    .           .          .
          "E" 2022 3 1                    .           .          .
          "B" 2022 3 1                    .           .          .
          "E" 2000 3 1   .24927306175231934  -.20075756 -.28940502
          "S" 2015 3 1    .4388723075389862   -.1844301 -.36970705
          "C" 2000 3 1    .3916475772857666  -.25427234  -.3280953
          "F" 2005 3 1   .08298581093549728   -.3419062  -.3724806
          "R" 2016 3 1    .9854686260223389  -.25202978  -.3305247
          "I" 2020 3 1   1.1019179821014404  -.05275162 -.10910322
          "S" 2016 3 1    .3959568440914154  -.24525705 -.28809455
          "C" 2008 3 1   .20407089591026306   -.2016781  -.2362006
          "S" 2020 3 1 -.003734334371984005  .026068375 -.03981481
          "F" 2003 3 1      .58949214220047  -.26517788 -.27176937
          "R" 2010 3 1    .3289223611354828  -.19918746 -.22900696
          "F" 2014 3 1   .17456720769405365  -.06475763 -.09883963
          "F" 2006 3 1    .1594994068145752   -.2708372  -.3080709
          "C" 2013 3 1   .20407089591026306   -.2016781  -.2362006
          "C" 2015 3 1   .20407089591026306   -.2016781  -.2362006
          "C" 2010 3 1   .20407089591026306   -.2016781  -.2362006
          end
          label values ind ind_labels
          label def ind_labels 3 "M", modify
          label values quantile quantilelabels
          label def quantilelabels 1 "0-10", modify

          Clyde Schechter maybe this can help?

          Comment


          • #6
            So, look at the following observations
            Code:
            country    year    ind    quantile    growth    jcr    jdr
            S    2020    M    0-10    -.00373433    .0260684    -.0398148
            S    2020    M    0-10    .55616599    -.1391513    -.2022949
            They have the same country year, industry and quantile. That is why the code shown in #3 fails.

            The question now becomes
            • Is the data wrong, and, if so, how to correct it?
            If the data set is supposed to be, for example, firm-level data, then there is nothing wrong with there being two observations for the same country, year, industry and quantile, provided they are for different firms. In that case the data is correct and the code is wrong. To -xtset- this kind of data, one would have to instead do something like:
            Code:
            egen id = group(country year ind firm quantile)
            xtset id year
            I want to emphasize once more, however, that you have not so far shown any reason for needing to -xtset- this data. So, in my view, the issue here is whether this is important in its own right or only because it is telling us that there may be something wrong with the data.

            However, if the data set is supposed to be country-year-industry-quantile level data (so each observation is about all the firms in that country, industry, and quantile in that year, not just about one of them), then there is a problem because these two observations contradict each other. There may also be other such contradictions in the full data set. To find them all, run
            Code:
            duplicates tag country ind year quantile, gen(flag)
            browse if flag
            Once you have found all the conflicting observations, you have to figure out which of them is (are) correct, if any. You will have to eliminate all, or all but one of the existing observations for each country-year-industry-quantile group of observations and retain the correct one if it is present, or provide a correct one if it is not present. However, I recommend against simply patching up this data set in that way. Evidently something went wrong in the data management that produced this data set. We have found problems in it, and that raises the suspicion that there may be other problems we have yet to uncover. So you should review the entire stream of data management from the original source file(s) down to the current data set. At a minimum, find the point at which the spurious observations got mistakenly put into the data set (or failed to be eliminated from it at the appropriate point), and fix the code that did that. But while you are doing that, be alert to other possible errors in the data management code, and fix any that you find along the way. Then re-generate the data set using the corrected code.

            If it turns out that your data management code is correct and the error is in the source files you were provided, then you have to contact the purveyor of those files, inform them of the error, and ask them to provide corrected files.

            Comment

            Working...
            X