Panel regression on subsamples

Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#1

Panel regression on subsamples

01 Jun 2015, 03:18

Hi guys,

I have a very simple question. For my final thesis I'm running regressions on macroeconomic data, in which the dependent variable is the level of mergers and acquisitions between two countries, and the independent variables are a set of control variables. I have a data set spanning the years 1990 through 2013 for 27 countries.

Now, I am trying to run a regression on different subsamples (e.g. the countries that are in the Eurozone the period 1999 - 2013, the countries that were in the periphery and the countries that were in the center of the Eurozone during that period).

I've thought of using dummy variables, but I'm not quite sure as to how I should run these regressions? Do you have any suggestions what would be the best way to run the regressions for the subsamples?

Kind regards,
Jaap Bovenkamer
Tags: fixed effects, panel data, regression, subsample
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

01 Jun 2015, 07:51

I thought I had a suggestion, but I realize your description doesn't provide complete information to formulate the problem. Please review the FAQ linked at the top of this page, especially sections 9-12 on effectively presenting your problem, and then provide a more complete description of your data and your problem. For example, is it the case that each observation includes two country codes indicating the "between two countries" that the M&A activity in the observation occurs? And if so, how do you mean to define your subsamples: both countries are in the subsample, or just one of the two countries? Or have I completely misunderstood the statement of the problem? You see why I ask for a more complete presentation of the problem.
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#3

01 Jun 2015, 12:27

Hi William,

I have included a screenshot of my dataset to try and make the situation a bit more clear. The dataset is set to panel data using: xtset IDCross Year, yearly. It consists of data on macro-level cross-border mergers and acquisitions over the period 1990 - 2013 for 27 developed countries. What I am trying to do is determining the effect of the Economic Monetary Union (the adoption of the Euro) on cross-border mergers and acquisitions.

In the third column in the first attachment you see the natural log of the total value of mergers and acquisitions originating from a firm in Australia to target firms in Austria in a particular year. Thus, in each year that at least one merger or acquisition happened between a firm in the acquiring country to the target country, I have an observation. Hence, the panel is unbalanced (but that should not influence my results).

I have run several regressions using xtreg on the entire dataset, which consists of developed countries.
The sample can then be further divided into countries that have adopted the Euro and those that have not. Then also, the Euro area can be divided into the center and periphery.

What I am interested in doing is dividing the sample to find specific results per group in a specific time period. To illustrate this point, I have attached a screenshot of a table that specifies this exactly.

In short, my question is how I can find the effect of the adoption of the euro on cross-border M&A levels in each of the groups in a particular time period, and how I should run these regressions.

Kind regards,
Jaap

Attached Files
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

01 Jun 2015, 14:02

As the FAQ indicates, screenshots are not really helpful, but I think I understood enough of your description to give you some broad guidance.

One approach is to create a dummy variable called (for example) ez and coded (for example) 0 for countries that have not adopted the Euro, 1 for countries in the periphery of the Euro zone, and 2 for countries in the center of the Euro zone. Then you run regressions using commands such as

Code:

regress y x1 x2 ... if ez==2 & inrange(Year,1990,1999)

to restrict the observations to those in the center of the Euro zone in years 1990-1999. See help if and help language and the Stata User's Guide for further details.
Comment

Jaap Bovenkamer

Join Date: Jun 2015
Posts: 12

02 Jun 2015, 15:20

William,

Thanks a lot for the explanation, helps a great deal. I have continued working on the code, which works fine now, except for one problem. I have defined the dummy variable as you said (0 = Developed, 1 = center, 2 = periphery, so that 1 and 2 = Eurozone).

Using the code below:

Code:

*1990 - 2013*
*Fixed effects*
*Developed*
quietly xtreg LnValueTrans EMUxEMU lnGDPCOO lnGDPTAR lnBilTrade lnDistCap Border CommonLanguage Colonialhistory CivilLibertiesCoC CivilLibertiesTarget lnMcapGDPCoO lnMcapGDPTar FinDepthTar FinDepthCoO TaxRateCoC TaxRateTarget nonEMUxEMU EUxEU nonEUxEU i.Year, fe robust
quietly estimates store developed
*European*
quietly xtreg LnValueTrans EMUxEMU lnGDPCOO lnGDPTAR lnBilTrade lnDistCap Border CommonLanguage Colonialhistory CivilLibertiesCoC CivilLibertiesTarget lnMcapGDPCoO lnMcapGDPTar FinDepthTar FinDepthCoO TaxRateCoC TaxRateTarget nonEMUxEMU EUxEU nonEUxEU i.Year if Region==1 | Region==2, fe robust
quietly estimates store eurozone
*Center*
quietly xtreg LnValueTrans EMUxEMU lnGDPCOO lnGDPTAR lnBilTrade lnDistCap Border CommonLanguage Colonialhistory CivilLibertiesCoC CivilLibertiesTarget lnMcapGDPCoO lnMcapGDPTar FinDepthTar FinDepthCoO TaxRateCoC TaxRateTarget nonEMUxEMU EUxEU nonEUxEU i.Year if Region==1, fe robust
quietly estimates store center
*Periphery*
quietly xtreg LnValueTrans EMUxEMU lnGDPCOO lnGDPTAR lnBilTrade lnDistCap Border CommonLanguage Colonialhistory CivilLibertiesCoC CivilLibertiesTarget lnMcapGDPCoO lnMcapGDPTar FinDepthTar FinDepthCoO TaxRateCoC TaxRateTarget nonEMUxEMU EUxEU nonEUxEU i.Year if Region==2, fe robust
quietly estimates store periphery
estimates table developed eurozone center periphery, p(%9.2f)

I have run the regression, but then I get a collinearity problem with the dummy for the Center region. This does not happen when I run a random effects regression (using the exact same specifications, just substituting the fe for re), and I am puzzled as to why this should happen? Do you have any suggestions?

Kind regards,
Jaap

Last edited by Jaap Bovenkamer; 02 Jun 2015, 15:22.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

02 Jun 2015, 19:03

Without understanding your variables, and without seeing the results from the xtreg run(s) with the collinearity problem, I can only guess.

When you created Region variable, I am guessing that it was based on the country of the target firm.

I see variables EMUxEMU and nonEMUxEMU, and I am guessing these are 1/0 dummy variables representing "non-EMU acquiring firm, EMU target firm" and "EMU acquiring firm, EMU target firm". When you restrict your data to target firms in EMU countries (in Region 1 or 2 only), then for every observation selected, either nonEMUxEMU==1 or EMUxEMU==1. In general this is not a good thing.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17706
#7

03 Jun 2015, 00:24

Jaap:
another possible drawback is that by typing -xtreg, fe- with -if- you may end up with a too limited number of observations against the number of predictors.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#8

03 Jun 2015, 08:53

Gentlemen,

William's interpretation of the variables EMUxEMU and nonEMUxEMU is indeed correct, as is the assumption that the region variable is based on the country of the target firm. I have looked at Carlo's suggestion, but a minimum of 776 cases for each region are generated, so I do not suspect that that is the problem.

I have attached the screenshots of the output below (terribly sorry, I couldn't find in the FAQ how to incorporate output in a "code-like" way here). I do not understand why the random effects model does give the output, but the fixed effect one does not..

I had set up the Region variable as follows
0 = Developed (which are all countries in the dataset)
1 or 2 = Eurozone
1 = Center of Eurozone
2 = Periphery of Eurozone

I do indeed get results for both the models if I drop the nonEMUxEMU variable (I'm interested in finding the EMUxEMU coefficient for each of the regions), but I would rather not drop the variable..

Is there any other way that would allow me to estimate the effect in the specific regions for EMUxEMU without dropping the nonEMUxEMU variable?

Kind regards,
Jaap
Attached Files

Last edited by Jaap Bovenkamer; 03 Jun 2015, 09:14.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#9

03 Jun 2015, 10:12

For future reference, I have attached a screen capture of the top left corner of this page showing the link to the FAQ.

Your collinearity is unavoidable in your fixed effects model, as it would be if you were to run a pooled model with an intercept term. It is not a Stata problem, it is a statistical problem due to the formulation of your model to include two variables that sum to a constant.
Attached Files
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

03 Jun 2015, 12:12

Further thoughts: I believe you mislabelled your screen captures, because the "Random Effects" screen capture is the one that is missing nonEMUxEMU, and the "Fixed Effects" includes it. Note that a number of other variables were omitted from the regression which was missing nonEMUxEMU. From their names, I'm now guessing that those variables were constant within each country, and Stata identified that fact and omitted them from the model. You had the same sort of problem with nonEMUxEMU and EMUxEMU, except that it was their sum that is a constant, and Stata cannot detect that level of complication until the regression is underway, at which point it presumably gave you some sort of diagnostic message which you did not include in your screenshot.
Comment
Jaap Bovenkamer

Join Date: Jun 2015

Posts: 12
#11

03 Jun 2015, 14:10

Indeed I did mislabel the screenshots, but what puzzles me is that the effect was defined for all regions in the random effects model when both the EMUxEMU and nonEMUxEMU variables were included. Could there be any other explanations as to why the fixed effects model is not specified for the EMUxEMU in one instance whereas the random effects model is (when both the EMUxEMU and nonEMUxEMU variables are included)? Your conclusion is right regarding the other variables, as there is no variation in these in each cross-section.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#12

03 Jun 2015, 16:08

Because the random effects model specifies the cross-sectional effect differently than does the fixed effects model, lack of variation within a cross-section does not cause multicollinearity issues for your random effects model, in the way that it does for your fixed effects model. This is true for individual variables that are constant within each cross section, and it is true for linear combinations of variables where the linear combination is constant within each cross section, as the linear combination nonEMUxEMU+EMUxEMU is a constant 1.
Comment

Announcement