3 dimensional panel clustering

david parsley

Join Date: Sep 2020
Posts: 14

3 dimensional panel clustering

25 Sep 2021, 17:07

Hi, I have a 3 dimensional panel: country, firm, date. I use xtset countryfirm date. and I want to cluster by country date. I get error message that "panels are not nested within clusters". I have attached a subset of the data with 2 firms per country with 12 months each, and 2 countries. The original data has 52 countries, varying numbers of firms (from low teens, to thousands), and 15 years of monthly data. I can't figure out what I've done wrong. I'm using the command: xtreg mktrf climateresid i.country, vce(cluster country date). Thanks.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long country float(date countryfirm mktrf) double climateresid
1 540  1 -2.76                    .
1 541  1  1.89   -.5187058035931387
1 542  1 -1.97    .3266362390933121
1 543  1 -2.61   -.3461968642854406
1 544  1  3.65    .0954452596125687
1 545  1   .57    .2886939441208527
1 546  1  3.92  -.09544542329737962
1 547  1 -1.22     .327815105554067
1 548  1   .49  -.21045129388480227
1 549  1 -2.02 -.057502935293711324
1 550  1  3.61   .23119086027145386
1 551  1  -.25  -.01838169912672885
1 540  2 -2.76                    .
1 541  2  1.89   -.5187058035931387
1 542  2 -1.97    .3266362390933121
1 543  2 -2.61   -.3461968642854406
1 544  2  3.65    .0954452596125687
1 545  2   .57    .2886939441208527
1 546  2  3.92  -.09544542329737962
1 547  2 -1.22     .327815105554067
1 548  2   .49  -.21045129388480227
1 549  2 -2.02 -.057502935293711324
1 550  2  3.61   .23119086027145386
1 551  2  -.25  -.01838169912672885
2 540 51 -2.76                    .
2 541 51  1.89   -.5187058035931387
2 542 51 -1.97    .3266362390933121
2 543 51 -2.61   -.3461968642854406
2 544 51  3.65    .0954452596125687
2 545 51   .57    .2886939441208527
2 546 51  3.92  -.09544542329737962
2 547 51 -1.22     .327815105554067
2 548 51   .49  -.21045129388480227
2 549 51 -2.02 -.057502935293711324
2 550 51  3.61   .23119086027145386
2 551 51  -.25  -.01838169912672885
2 540 52 -2.76                    .
2 541 52  1.89   -.5187058035931387
2 542 52 -1.97    .3266362390933121
2 543 52 -2.61   -.3461968642854406
2 544 52  3.65    .0954452596125687
2 545 52   .57    .2886939441208527
2 546 52  3.92  -.09544542329737962
2 547 52 -1.22     .327815105554067
2 548 52   .49  -.21045129388480227
2 549 52 -2.02 -.057502935293711324
2 550 52  3.61   .23119086027145386
2 551 52  -.25  -.01838169912672885
end
format %tm date
label values country country
label def country 1 "Argentina", modify
label def country 2 "Australia", modify

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#2

25 Sep 2021, 17:22

I use xtset countryfirm date. and I want to cluster by country date.

You can't do that. Each cluster must include all observations from any panel that it touches. So if you think about a cluster defined by a particular country and date, it will split up every countryfirm-defined panel into separate clusters, one for each date that countryfirm appears in the data. That's not allowed. You can use countryfirm as the cluster variable if you want.
1 like
Comment
david parsley

Join Date: Sep 2020

Posts: 14
#3

25 Sep 2021, 20:04

Thanks, I really believe that there may be some correlation across firms on dates - i.e., time. Or possibly correlation within a country on dates. Clustering by countryfirm ignores time? Reading the full documentation for xtreg, it's unclear to me whether cluster(countryfirm) addresses within country correlation of all firms i and j at each time t? Thanks so much.

Last edited by david parsley; 25 Sep 2021, 20:51.
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#4

26 Sep 2021, 10:37

You can download -vcemway- and try:

HTML Code:

vcemway xtreg mktrf climateresid i.country, cluster(country date) nonest
Comment
cengiz hokka

Join Date: Apr 2023

Posts: 19
#5

09 Apr 2023, 14:27

Hello everyone,
I am researching the effect of corporate governance practices on the profitability of banks. I have 12 years of data, 146 banks and 17 countries (7 developed countries and 10 developing countries). Can a multidimensional panel be applied to such a data structure? and multidimensional, will we consider banks and countries intertwined or analyze them as non-nested models?
thanks a lot for your help Clyde Schechter Carlo Lazzaro @Cleyde Schechter
Comment
Zvonimir Kulis

Join Date: Apr 2023

Posts: 12
#6

09 Apr 2023, 14:48

Originally posted by cengiz hokka View Post

Hello everyone,
I am researching the effect of corporate governance practices on the profitability of banks. I have 12 years of data, 146 banks and 17 countries (7 developed countries and 10 developing countries). Can a multidimensional panel be applied to such a data structure? and multidimensional, will we consider banks and countries intertwined or analyze them as non-nested models?
thanks a lot for your help Clyde Schechter Carlo Lazzaro @Cleyde Schechter

Hey, I know very little about the topic (how to apply this), but the solution to this approach can be multilevel panel analysis. It is available in Stata:
https://www.stata.com/features/overv...ession-models/

I suggest first reading about multilevel regression (generally) and then see its possibilities for panel data. Hope it works out for you.

Also look at: https://www.stata.com/manuals/me.pdf

Last edited by Zvonimir Kulis; 09 Apr 2023, 14:53.
2 likes
Comment
cengiz hokka

Join Date: Apr 2023

Posts: 19
#7

09 Apr 2023, 15:13

Originally posted by Zvonimir Kulis View Post

Hey, I know very little about the topic (how to apply this), but the solution to this approach can be multilevel panel analysis. It is available in Stata:
https://www.stata.com/features/overv...ession-models/

I suggest first reading about multilevel regression (generally) and then see its possibilities for panel data. Hope it works out for you.

Also look at: https://www.stata.com/manuals/me.pdf

thank you very much for your help Zvonimir Kulis
1 like
Comment
cengiz hokka

Join Date: Apr 2023

Posts: 19
#8

10 Apr 2023, 02:12

Originally posted by Clyde Schechter View Post

You can't do that. Each cluster must include all observations from any panel that it touches. So if you think about a cluster defined by a particular country and date, it will split up every countryfirm-defined panel into separate clusters, one for each date that countryfirm appears in the data. That's not allowed. You can use countryfirm as the cluster variable if you want.

dear Clyde Schechter, I would be grateful if you can reply to my post
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#9

10 Apr 2023, 08:48

I assume you are referring to the post in #5. How you would approach this depends on the relationships among the banks and countries.

If each bank operates only in a single country, then this would be nested data. To reflect that nesting structure, you would use a random-effects model, with observations nested within banks, which are nested within countries. Now, as you probably know, with random effects models of observational data, the estimates may not be consistent because there may be a correlation between the observed variables and the random effects. (This problem does not arise with data from randomized experiments.) In this situation, in the finance and economics industries, many people strongly prefer to use a fixed effects model, which guarantees consistency of estimates. The problem is that there is no way, in this situation, to capture the three-dimensional structure of the data. Usually the bank would be chosen as the absorbed fixed effect and the VCE would be clustered at the country level.

If some banks operate in multiple countries (or, at the extreme, if all banks operate in all countries) then you have a multiple-membership model (or, at the extreme, a crossed random effects model). Again, to truly capture this structure of the data requires using a random-effects model (with crossed random effects). Again, in finance and economics, many would simply use a fixed effects model, though here the choice of model is a bit less clear. One might simply incorporate separate fixed effects at both the bank and country level, but this time clustering the VCE and the country level will not be possible. Another issue that arises in this situation is whether the country level and bank level effects (whether fixed or random) actually interact: perhaps one needs the effects to be at the country#bank level. All of this requires careful thought based on an understanding of how the bank and country effects operate on the outcome variable in the real world--in other words, it is a substantive question rather than a statistical one.
Comment
cengiz hokka

Join Date: Apr 2023

Posts: 19
#10

10 Apr 2023, 15:06

Thank you very much for your answer, Mr. Clyde.
-By random effects pattern do you mean xtmixed?
In my previous stata training, one of our trainers said: When working with multidimensional panel data, we first check which dimensions have an effect with the RE xtmixed command. For example;

Code:

xtmixed roa topvar syo lr1 ykb ts ioyo tecrube ly byo gdp ||_all:R.country ||_all:R.bank ||_all:R.year, mle

later
We run the model again according to which dimension has an effect. For example, let's assume that all 3 dimensions have an effect.

xtmixed roa topvar syo lr1 ykb ts ioyo tecrube ly byo gdp ||_all:R.country ||_all:R.bank ||_all:R.year, mle

We then run this model in 3D with a fixed effects estimator. however, we cannot predict this model directly with fixed effects. To do this we need to do some transformations. If we convert it for the roa variable;
. egen meanroa = mean(roa)
. egen meanroac=mean (roa), by(country)
. egen meanroab=mean (roa), by(bank)
egen meanroay=mean(roa), by(year)
. gen dfroa=roa-meanroac-meanroab-meanroay+2*meanroa

After making these transformations for each variable, we can predict the model with the -reg- command. For example;

Code:

reg roa topvar syo lr1 ykb ts ioyo deneyim ly byo gdp,nocons

Is such an approach correct? what are your suggestions?
Also, as you said, is there a way to understand that country and bank effects are intertwined? Would it be wrong to make a country#bank? And finally, is there no way to clustering country and bank influence?

Last edited by cengiz hokka; 10 Apr 2023, 15:16.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#11

10 Apr 2023, 15:29

More or less. Since version 13, which by now is pretty old, -xtmixed- was renamed as -mixed-. And maximum likelihood estimation has been the default since version 14 or 15, so the -mle- option is not needed in modern Stata. The use of the -_all: R.variable- notation is appropriate here only for crossed random effects (or multiple membership model). If your effects are nested, it is incorrect. For nested effects the random effects part of the model would just be || country: || bank:. By the way, in either case, nested or crossed, there would not be a year level unless you have multiple observations of the same bank and country within a year.

The transformed model you are describing is not one I have seen before, and I can't really make sense of it. Moreover, the regression you show afterwards doesn't even use the transformed variables! So I think you have the details wrong. What I have seen, and it is called the correlated random effects model or the Mundlak model, is somewhat like what you showed, but the transformations are carried out on some or all of the independent variables. Then you also incorporate all of the transformed variables as predictors in the regression. Also, the final regression is again done with -mixed-, not -regress- in order to capture the level structure in the data. It is not always necessary to do this whole procedure, however. It depends on whether you need to separately estimate the within- and between- effects of the variables at the various levels. Those effects certainly can be different, but sometimes they are not, in which case the extra work accomplishes little. Sometimes the only way to know is to try it and see what happens, but sometimes theory can tell you in advance. By the way, if you end up having only two levels that are effective, there is a user-written command, -xthybrid- that does this for you automatically.

Also, as you said, is there a way to understand that country and bank effects are intertwined? Would it be a wrong way to make country#bank?

I'm not sure what you are asking here. What is "intertwined?" If you mean, do they interact, you might have some theoretical basis, or results from previous studies that would give you guidance on that. If there is no prior information, you could try just creating a variable that indicates combinations of country and bank and using that as a single level and see if it gives better results.

and finally, is it not possible in any way to clustering country and bank influence?

I don't know what you mean by clustering country and bank influence.

Let me add that you still have never said whether you have banks that work in multiple countries, or just one country for each bank. Surely you already know that from your data. If you specify which of these situations we have, we can stop wasting time discussing contingencies for the inapplicable situation and focus on what we actually have to deal with.

Last edited by Clyde Schechter; 10 Apr 2023, 15:33.
1 like
Comment
cengiz hokka

Join Date: Apr 2023

Posts: 19
#12

10 Apr 2023, 16:31

Mr. Clyde Schechter, thank you for the information you provided.
I am new to panel data analysis. So I apologize for not understanding some of the things you are saying.
By the way, you're right, transformed variables are used after the -reg- command, I mistyped it. and all variables are transformed the same way. for example;

Code:

reg dfroa dftopvar dfsyo dflr1 dfykb dfts dfioyo dfdeneyim dfly dfbyo dfgdp,nocons

By the way, in either case, nested or crossed, there would not be a year level unless you have multiple observations of the same bank and country within a year.

I don't understand what you are talking about here. my data is between 2011-2021 annually. I have a total of 146 bank data from 17 countries. and some of the banks operate in more than one country. In this case, is it a multiple membership model?

Also, as you said, is there a way to understand that country and bank effects are intertwined? Would it be a wrong way to make country#bank?

yes I meant country-bank interaction. Is it possible to analyze a three-dimensional panel of data without doing this interaction?

and returning to the previous question, do you think it is correct to estimate the 3D model with the transformed data after the -reg- command?

Finally, there are those who predict models of this structure as follows:

Code:

xtset country xtreg roa topvar syo lr1 ykb ts ioyo deneyim ly byo gdp i.year i.bank, fe

or

Code:

egen cou_bank=group(country bank) xtset cou_bank year xtreg roa topvar syo lr1 ykb ts ioyo deneyim ly byo gdp i.year,fe

What are your thoughts on these approaches? What path should I follow?
thank you for your help.

Last edited by cengiz hokka; 10 Apr 2023, 16:34.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#13

10 Apr 2023, 16:45

and some of the banks operate in more than one country. In this case, is it a multiple membership model?

Yes, this is a multiple membership model.

[quote]Is it possible to analyze a three-dimensional panel of data without doing this interaction?[/quote[
Yes. If it is reasonable to assume that the effect of each bank is, at least within reason, constant no matter which country it is operating in, and that the effect of each country is the same (again within reason) on all of the banks operating within it, then you can avoid using this interaction. Whether that is, in fact, reasonable to assume requires a knowledge of banking operations and regulation that is well beyond anything I know about. You will have to make that decision yourself, or if you are not comfortable making that decision, you need to consult people with experience in that area. It's not a statistical question. It's a substantive question and I can't help you with it.

do you think it is correct to estimate the 3D model with the transformed data after the -reg- command?

As I indicated earlier, I am not familiar with a model like that using -regress-. I have seen it done, still using a mixed-effects model, but not with regress. And, again, whether that is necessary or appropriate depends on your specific research questions and whether they require the separate identification of within-level and between-level effects of the variables. So again, that's a substantive question that I can't answer for you.

Finally, there are those who predict models of this structure as follows:

Code:

xtset country xtreg roa topvar syo lr1 ykb ts ioyo deneyim ly byo gdp i.year i.bank, fe
or
Code:

egen cou_bank=group(country bank) xtset cou_bank xtreg roa topvar syo lr1 ykb ts ioyo deneyim ly byo gdp i.year,fe
What are your thoughts on these approaches? What path should I follow?

These approaches are widely used ways of (over)simplifying high-dimensional models. Personally, I dislike them and do not use them myself. But you see them used often in economics and finance, so presumably others think highly of them. I think that comes largely from the strong preference in those disciplines for fixed-effects models over random effects models, a preference that I, an outsider to those fields, do not share. Since your subject matter appears to fall within those disciplines, and I presume you ultimately want to publish this work (or perhaps it is part of a thesis or dissertation for a degree) I do not want to press you too hard into doing what I think is right but might be rejected by your target audience.
Comment
cengiz hokka

Join Date: Apr 2023

Posts: 19
#14

10 Apr 2023, 17:30

Dear Clyde Schechter, thank you for your patience and responses.
yes, you are right, I will do this analysis for my Phd thesis. actually, as you said, I could have done this analysis in a simple way like the others did. but I want my job to be in the right way. I want to be satisfied with my analysis.
otherwise I would have already finished my analysis by now. And although I have a very short time left, I do not want to finish the thesis without fully learning the analysis part.

After estimating the multi-member model with the mixed command, is the analysis finished?
ie, are the results after running the -mixed- command the final results for us? Could you please explain this with an example.
Also, is there a robust option in the -mixed- command? can hypothetical deviation tests be done after this command? such as heteroscedasticity test, autocorrelation test.
I asked a lot of questions, I beg your forgiveness.

so Mr. Cleyde Schechter I am open to your suggestions.

actually, some people suggested dynamic panel data analysis to get rid of the multidimensional part. The problem here was; I thought of using a 2-stage system GMM (with the xtabond2 command) but at this stage I had a hard time specifying internal and external variables.
my model was:

Code:

xtabond2 roa l.roa topvar syo lr1 ykb ts ioyo experience ly byo gdp, gmm(l.roa) iv(topvar syo lr1 ykb ts ioyo experience ly byo gdp) ort twostep robust

I couldn't find how to specify variables to put both in the gmm bracket and inside the iv bracket in this command. In addition, when I use the robust command, almost all the variables become insignificantly.

Do you have any advice for dynamic panel data analysis?
in addition to the three-dimensional mode
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30061
#15

10 Apr 2023, 22:38

After estimating the multi-member model with the mixed command, is the analysis finished?
ie, are the results after running the -mixed- command the final results for us? Could you please explain this with an example.
Also, is there a robust option in the -mixed- command? can hypothetical deviation tests be done after this command? such as heteroscedasticity test, autocorrelation test.

The analysis may be finished if you do not, to answer your research questions, need to separate out within-level from between-level effects. The usual commands that test for heteroscedasticity and autocorrelation are not available after -mixed-, but they are not really needed because you can deal with them with cluster robust standard errors. The option is not specified as -robust- but as -vce(robust)-, or for clustered standard errors, -vce(cluster bank)-.

Do you have any advice for dynamic panel data analysis?

No. I don't know anything about this.

Last edited by Clyde Schechter; 10 Apr 2023, 22:42.
1 like
Comment

Announcement

3 dimensional panel clustering

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment