Margins command for FE Panel Data with interaction between a quadratic variable and a factor variable

Deepika Deshpande

Join Date: Dec 2020

Posts: 107
#1

Margins command for FE Panel Data with interaction between a quadratic variable and a factor variable

27 Dec 2020, 15:12

Hi, I am running a FE Panel regression. My code looks as follows. I would like to know how I can run marginsplot to graph the interaction between Crisis and the quadratic variable GeoSegLag1Sqd. Thank you so much.

xtreg ROS_NI_win05 LnRev TDTE c.GeoSegLag1##i.Crisis c.GeoSegLag1Sqd##i.Crisis i.Year if Year !=8 & Year !
> =9, fe
note: 19.Year omitted because of collinearity

Fixed-effects (within) regression Number of obs = 3,606
Group variable: Company Number of groups = 481

R-sq: Obs per group:
within = 0.0931 min = 1
between = 0.1441 avg = 7.5
overall = 0.0994 max = 17

F(22,3103) = 14.48
corr(u_i, Xb) = -0.1214 Prob > F = 0.0000

----------------------------------------------------------------------------------------
ROS_NI_win05 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
LnRev | .0633686 .0055241 11.47 0.000 .0525373 .0741999
TDTE | -.0000258 6.04e-06 -4.27 0.000 -.0000376 -.000014
GeoSegLag1 | .0632204 .0640665 0.99 0.324 -.0623966 .1888375
1.Crisis | -.0336558 .0336082 -1.00 0.317 -.0995523 .0322408
|
Crisis#c.GeoSegLag1 |
1 | -.1149112 .0664864 -1.73 0.084 -.245273 .0154506
|
GeoSegLag1Sqd | -.0834435 .0411033 -2.03 0.042 -.1640359 -.002851
|
Crisis#c.GeoSegLag1Sqd |
1 | .0761059 .0440249 1.73 0.084 -.0102149 .1624267
|
Year |
2 | -.0061536 .0670958 -0.09 0.927 -.1377102 .1254031
3 | .040696 .0318427 1.28 0.201 -.0217389 .1031309
4 | .0674057 .0304286 2.22 0.027 .0077435 .127068
5 | .0566512 .0303746 1.87 0.062 -.0029052 .1162076
6 | .06247 .0301485 2.07 0.038 .0033569 .121583
7 | .1562648 .021781 7.17 0.000 .1135582 .1989714
10 | .116791 .0203401 5.74 0.000 .0769097 .1566723
11 | .104613 .0201204 5.20 0.000 .0651623 .1440637
12 | .0848195 .0198493 4.27 0.000 .0459005 .1237386
13 | .0787627 .0198663 3.96 0.000 .0398103 .1177151
14 | .0439599 .0193564 2.27 0.023 .0060073 .0819124
15 | .0128736 .0192094 0.67 0.503 -.0247908 .050538
16 | -.0005675 .0187859 -0.03 0.976 -.0374015 .0362666
17 | .0041892 .0185708 0.23 0.822 -.032223 .0406014
18 | .0114753 .0182558 0.63 0.530 -.0243195 .04727
19 | 0 (omitted)
|
_cons | -.1962245 .0348483 -5.63 0.000 -.2645526 -.1278963
-----------------------+----------------------------------------------------------------
sigma_u | .25069615
sigma_e | .22260268
rho | .55914835 (fraction of variance due to u_i)
----------------------------------------------------------------------------------------
F test that all u_i=0: F(480, 3103) = 6.33 Prob > F = 0.0000

GeoSegLag1Sqd
Tags: None
Manish Srivastava

Join Date: Apr 2014

Posts: 21
#2

27 Dec 2020, 15:35

Stata does not know GeoSegLag1Sqd is actually the squared term of GeoSegLag1. So, the better term to include is i.Crisis##c.GeoSegLag1## c.GeoSegLag1. Since GeoSegLag1 is a continuous variable, you can use at () values in the 'margin' command for GeoSegLag1. Also, the variable name 'GeoSegLag1' suggests you have 'manually' created the lagged variable. To avoid potential errors, you should use l1. GeoSeg after declaring your data a panel data with tsset.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#3

27 Dec 2020, 15:39

Well, your regression is not properly set up to do that, so you have to fix that first. In order to use a quadratic term with -margins- you have to use proper factor variable notation in the regression. Creating a separate square variable produces incorrect results. Then to use -margins- you need to identify a set of interesting values of GeoSegLag1. For the purposes of illustrating the code, I will assume these values are 10, 20, 30, 40, and 50.

Code:

xtreg ROS_NI_win05 LnRev TDTE c.GeoSegLag1##c.GeoSegLag1##i.Crisis i.Year if Year !=8 & Year !=9, fe // PREDICTED OUTCOMES margins Crisis, at(GeoSegLag1 = (10 20 30 40 50)) marginsplot, name(predicted_outcomes, replace) // MARGINAL EFFECTS margins Crisis, dydx(GeoSegLag1) at(GeoSegLag1 = (10 20 30 40 50)) marginsplot, name(marginal_effects, replace)

Added: Crossed with #2, who makes the same point and additionally notes that a homebrew variable GeoSegLag1, if intended to the first lag of a variable GeoSeg, may be problematic as well, and the use of L1.GeoSeg is safer.

Last edited by Clyde Schechter; 27 Dec 2020, 15:41.
1 like
Comment
Deepika Deshpande

Join Date: Dec 2020

Posts: 107
#4

27 Dec 2020, 19:27

Thanks Manish and Clyde. I am very new to STATA and your responses were most helpful. I changed the regression equation as per your suggestions at it works very well (results are pasted below.) However, I am still struggling with trying to get a marginplot (graph) of crisis and the l1.GeoSeg. (I presume it should be an inverted-U.) I have pasted this piece of code in bold below the xtreg code. Any help would be most welcome. Thanks so much!

. xtreg ROS_NI_win05 LnRev TDTE c.l1.GeoSeg##c.l1.GeoSeg##i.Crisis i.Year if Year !=8 & Year !=9, fe
note: 19.Year omitted because of collinearity

Fixed-effects (within) regression Number of obs = 3,498
Group variable: Company Number of groups = 465

R-sq: Obs per group:
within = 0.1001 min = 1
between = 0.1492 avg = 7.5
overall = 0.1009 max = 16

F(21,3012) = 15.95
corr(u_i, Xb) = -0.1693 Prob > F = 0.0000

--------------------------------------------------------------------------------------------
ROS_NI_win05 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
LnRev | .0706095 .0057165 12.35 0.000 .0594008 .0818181
TDTE | -.0000241 6.06e-06 -3.98 0.000 -.000036 -.0000122
|
GeoSeg |
L1. | .0643521 .0716051 0.90 0.369 -.0760477 .2047518
|
cL.GeoSeg#cL.GeoSeg | -.0890383 .0461474 -1.93 0.054 -.179522 .0014453
|
1.Crisis | -.0321321 .0672928 -0.48 0.633 -.1640765 .0998124
|
Crisis#cL.GeoSeg |
1 | -.1292482 .0721018 -1.79 0.073 -.2706219 .0121255
|
Crisis#cL.GeoSeg#cL.GeoSeg |
1 | .0857151 .0484196 1.77 0.077 -.0092237 .1806539
|
Year |
3 | .0471469 .0654212 0.72 0.471 -.0811279 .1754217
4 | .0723122 .0648559 1.11 0.265 -.0548541 .1994785
5 | .0593705 .0647764 0.92 0.359 -.06764 .186381
6 | .0646863 .0647209 1.00 0.318 -.0622152 .1915878
7 | .1596848 .0216993 7.36 0.000 .1171378 .2022318
10 | .117295 .0202614 5.79 0.000 .0775675 .1570225
11 | .1048125 .0200422 5.23 0.000 .0655147 .1441104
12 | .0852402 .0197685 4.31 0.000 .046479 .1240014
13 | .0795861 .0197836 4.02 0.000 .0407954 .1183768
14 | .0451949 .0192737 2.34 0.019 .0074039 .0829859
15 | .0143948 .019124 0.75 0.452 -.0231026 .0518922
16 | .0021967 .0187029 0.12 0.907 -.0344751 .0388685
17 | .0044567 .0184845 0.24 0.809 -.0317869 .0407002
18 | .0122765 .0181715 0.68 0.499 -.0233533 .0479064
19 | 0 (omitted)
|
_cons | -.2247597 .0682938 -3.29 0.001 -.3586668 -.0908525
---------------------------+----------------------------------------------------------------
sigma_u | .24808724
sigma_e | .22151085
rho | .55641321 (fraction of variance due to u_i)
--------------------------------------------------------------------------------------------
F test that all u_i=0: F(464, 3012) = 6.61 Prob > F = 0.0000

. margins Crisis, at(l1.GeoSeg = (0 1 2))

Predictive margins Number of obs = 3,498
Model VCE : Conventional

Expression : Linear prediction, predict()

1._at : L.GeoSeg = 0

2._at : L.GeoSeg = 1

3._at : L.GeoSeg = 2

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at#Crisis |
1 0 | . (not estimable)
1 1 | . (not estimable)
2 0 | . (not estimable)
2 1 | . (not estimable)
3 0 | . (not estimable)
3 1 | . (not estimable)
------------------------------------------------------------------------------

Last edited by Deepika Deshpande; 27 Dec 2020, 19:44.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#5

27 Dec 2020, 19:42

Did you try the code I suggested in #3? How did its results differ from what you are looking for?

Your regression output does suggest an inverted U-shape. But it also says that the vertex of the inverted U will be at L.GeoSeg approximately = 0.36. If that value is outside the range of values of L.GeoSeg for which you calculate the margins, then you will not see an inverted-U shape. You will see a somewhat curvilinear graph, but one that does not reach an apex. And if 0.36 is far outside the range of values of L.GeoSeg, the graph will even appear to be very close to a straight line.

Last edited by Clyde Schechter; 27 Dec 2020, 19:50.
Comment
Deepika Deshpande

Join Date: Dec 2020

Posts: 107
#6

27 Dec 2020, 20:11

Hi Clyde,

Thanks so much. Yes, I did try the code you suggested in #3 i.e.
margins Crisis, at(GeoSegLag1 = (10 20 30 40 50)) The only thing I changed is that I used l1.GeoSeg and I kept the range as 0 to 2. I have pasted my code in the post above and am reporting it below (- not sure why the results show "unestimable):

. margins Crisis, at(l1.GeoSeg = (0 1 2)) Predictive margins Number of obs = 3,498 Model VCE : Conventional Expression : Linear prediction, predict() 1._at : L.GeoSeg = 0 2._at : L.GeoSeg = 1 3._at : L.GeoSeg = 2 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at#Crisis | 1 0 | . (not estimable) 1 1 | . (not estimable) 2 0 | . (not estimable) 2 1 | . (not estimable) 3 0 | . (not estimable) 3 1 | . (not estimable) ------------------------------------------------------------------------------

Last edited by Deepika Deshpande; 27 Dec 2020, 21:09.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#7

28 Dec 2020, 09:02

Ah, yes, I should have foreseen that. I take it that your variable Crisis is an indicator for some year, perhaps 2009? If so, it is colinear with the i.year variables. That is why you both got the warning "note: 19.Year omitted because of collinearity" and why the margins you want are not estimable. Because of the colinear relationship among these, Crisis is, in effect, just another year variable. And none of those effects is separately estimable because they are artifacts of the particular years that were omitted to break the colinearity. So you cannot simultaneously estimate an effect of Crisis and include the i.year term in your model.

One solution is to omit i.year altogether. If that doesn't make sense in real-world terms, then consider coarse-graining time. For example, create a new time variable that indicates two-year periods. If my guess that Crisis corresponds to a single year was correct, then Crisis will not be colinear with these two-year period variables, and you will be able to include both Crisis and the two-year period indicators and then get results.
1 like
Comment
Manish Srivastava

Join Date: Apr 2014

Posts: 21
#8

28 Dec 2020, 15:29

Clyde has offered very good suggestions. Also, you will not see an inverted-U shape graph with just three values of GeoSeg (0,1,2). My guess is it is capturing number of geographic segments, and it probably has three unique values. If that's not the case and it's a continuous variable, you should increments it with smaller steps. at(l1.GeoSeg = (0 (.10) 2)).
1 like
Comment
Deepika Deshpande

Join Date: Dec 2020

Posts: 107
#9

28 Dec 2020, 17:29

Clyde and Manish - Thank you both for your responses. Your suggestions worked and I managed to get the graph I was looking for. Thank you so much. Your explanations are also very helpful, although I have a few doubts.

While I removed 2019 and managed to obtain the marginsplot, I am still confused why only one of the years in my panel data (i.e. 2019) ends up being collinear with Crisis. By the way my panel data contains years 2001-2019. By way of further background, I would like to provide these details:

1. My panel data contains a number of companies (i.,e. groups) across years 2001-2019 (i.e. time series.)
2. Crisis was an indicator for all years starting 2007 i.e. post the Global Financial Crisis.
3. By using FE estimation, company-specific time invariant variables could be removed.
4. I also wanted to remove any time specific trends and hence I included the dummy i.Years.
5. By doing step 3 and 4 above, I was hoping I could get the exact impact of the independent variables that I was interested in (ie. GeoSeg and Crisis.)
6. While I am not specifically interested in the dummy variable i.Year, I am wondering if by removing i.Year, my FE model is truly a model for the effects of GeoSeg and Crisis or whether it is impacted by the time related factors as well.

In case you have any guidance for me, I would greatly appreciate it. Thanks so much for all your help so far. It has been invaluable.

Regards,
Deepika
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#10

28 Dec 2020, 18:25

I am still confused why only one of the years in my panel data (i.e. 2019) ends up being collinear with Crisis.

That is not what happened. First let's just consider the years 2001-2019--we'll bring in Crisis later.

It is standard from introductory regression courses that if you introduce an indicator ("dummy") variable for all 19 of these years, they will be colinear. In fact, the exact colinearity equation will be

2001.year + 2002.year + ... + 2018.year + 2019.year - 1*_cons = 0

because each of these year variables must be either 0 or 1, and in any year exactly one of them will be 1 and the others will be zero (and _cons is always 1). So that is the colinearity equation. ALL 19 OF THE VARIABLES and the constant term ARE COLINEAR, which is to say that they participate in a linear combination having at least one non-zero coefficient that adds up to zero. Notice, however, that you can break this relationship by omitting any one of the variables. You can omit the constant term--though for various reasons this is not desirable in most contexts.Or you could omit 2001.year and leave everything else. Or you could leave 2001.year in and remove, say, 2012.year. It doesn't matter which year indicator you remove: as long as you remove one of them, the colinearity is broken. For example, if you remove 20012.year, then in 2012, all of the year indicators are zero, and _cons is, as always, 1, so that the total 2001.year + ... + 2011.year + 2013.year+...+2019.year - 1*_cons = -1, not zero. A similar calculation for any of the years yields the same conclusion: you can remove any one of the variables, and the colinearity is eliminated. Conventionally, when we are dealing with year indicators like this, we remove the earliest year. If the indicators were for some other variable that doesn't have a natural ordering, then some arbitrary choice is usually made. What is important to understand is (and it can be proved, though I won't attempt a proof here) which indicator you remove does not matter in terms of the regression results for the variables that do not participate in the colinearity. And, even more important, predictions of the model about outcomes are also the same regardless of which way you break the colinearity. But the coefficients of the variables that do participate in the colinearity do change if you change which variable(s) you omit. Consequently, the effects of these variables that participate in the colinearity cannot be calculated in the regression because the coefficients you get for them depend on which variable(s) were left out to overcome colinearity. Or, as statisticians, and the -margins- command put it, these are "not estimable."

Now let's look at what happens when we through the variable Crisis into the mix. Crisis was not itself a year indicator as I had imagined. But it was instead an indicator for a subset of years, namely those >= 2007. The details differ, but the end result and conclusion are the same. Consider now the variables 2001.year through 2019.year, and Crisis. The following equation will be true in every year:

1*Crisis - (2007.year + 2008.year + ... + 2019.year) = 0.

That's because for any year 2007 or later, Crisis will be 1 and exactly one of 2007.year through 2019.year will be 1, so we total this to zero, and in any year before 2007, all of those variables will be 0, so we still get a zero total. Notice that this is a new colinear relationship that is separate from the one involving just the i.year variables and the constant term. So, to eliminate all colinearity from the model, we have two colinear relationships that must be broken. That will require the removal of two variables to do that. If you look carefully at your outputs from your original regression that led to your "not estimable" problem, you will notice that in addition to 2001 being removed (as would normally happen with just i.year) Stata also removed i.2019 to break the second colinearity, the one in which Crisis was involved. Again, the choice of removing 2019 was arbitrary. It could have removed any of the years, or it could have removed Crisis. Any of these would have eliminated the second colinearity.

So the important thing is to remember that a colinearity relationship can involve many variables, and to the extent it does, they are all in the same situation. So it is not the case that Crisis was only colinear with 2019. It was colinear with all of the years 2007 through 2019, but the removal of any one of those variables is sufficient to disrupt the colinearity. Stata's algorithms for dealing with colinearity happened to end up omitting 2001 (for the colinearity involving only the years and _cons) and 2019 (for the colinearity of the later years with Crisis). But clearly many other possibilities exist.
Comment
Deepika Deshpande

Join Date: Dec 2020

Posts: 107
#11

28 Dec 2020, 18:52

Hi Clyde,

Thanks again Your explanations are clear, particularly the collinearity equation. (I am returning to academics and statistics after a long corporate career and hence apologise for my patchy knowledge on statistics. I am trying to get up to speed quickly.)

While I understand why the i.Year variables are collinear, I am now wondering if there is any other method I can employ (-i.e. other than introducing the iYear dummy-) to exclude the effects of time and get the pure effect of GeoSeg and Crisis.

Apologies in advance for my many questions. In case there is any guidance you can share, I would greatly appreciate. Many thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#12

28 Dec 2020, 19:07

There is simply no possibility of including all of the years and the Crisis variable, or even a reduced set of years that still leaves Crisis in a colinearity. It is mathematically impossible. The best I think you can do is to coarse-grain time in such a way that some value of time is partly in the pre-Crisis epoch and partly post-Crisis. For example, given your years run from 1 through 19:

Code:

gen two_year_block = 2*floor(year/2)

will give you 9 values. Value 0 corresponds to year 1 only (and a non-existent year 0), value 2 corresponds to years 2 and 3, value 4 corresponds to 4 and 5, and, crucially, value 6 corresponds to years 6 and 7. This latter is crucial because this means that two_year_block = 6 is partly pre-Crisis (2006) and partly post-Crisis (2007). Crisis is not colinear with these two-year block variables. And as each one represents only two years, it is a reasonable way to capture shocks, except those occurring at very high frequency. I think this is your best bet.
Comment
Deepika Deshpande

Join Date: Dec 2020

Posts: 107
#13

28 Dec 2020, 19:14

Clyde .... Thanks a lot. Your feedback has been most helpful! Really appreciate it!
Comment
Deepika Deshpande

Join Date: Dec 2020

Posts: 107
#14

30 Dec 2020, 18:43

Hello, I needed a clarification regarding regression. As a quick background:
1. I am running a multiple regression model. (i.e. 2+ IVs.)
2. I am interested in the goodness of fit of my regression model and hence I believe I should use either R-Squared or Standard Error of Regression.
3. While R-Squared is clear for me, I have a doubt regarding the use of Standard Error of Regression. The reason for this is that the STATA output seems to give me a Standard Error for each of the Coefficients and hence I am not sure which one to use.
4. Would it be correct for me to assume that in the case of multiple regression, I should use R-squared to measure goodness of fit (and not use the many Standard Error of Coefficients.)

Thanks very much.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30121
#15

30 Dec 2020, 19:34

If you look at some Stata output from -regress- you will see:

Code:

. regress price mpg headroom Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 10.44 Model | 144280501 2 72140250.4 Prob > F = 0.0001 Residual | 490784895 71 6912463.32 R-squared = 0.2272 -------------+---------------------------------- Adj R-squared = 0.2054 Total | 635065396 73 8699525.97 Root MSE = 2629.2 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -259.1057 58.42485 -4.43 0.000 -375.6015 -142.6098 headroom | -334.0215 399.5499 -0.84 0.406 -1130.701 462.6585 _cons | 12683.31 2074.497 6.11 0.000 8546.885 16819.74 ------------------------------------------------------------------------------

I am not familiar with any "Standard Error of Regression," however I think you may be referring to what is called the Root Mean Squared Error. That and the R squared are shown in red in the above example output.

The root mean squared error is a measure of fit of the model, but it is in the units of the outcome variable. In the example above, it would be denominated in dollars. So there is no sensible comparison of Root MSE from one regression to another having a different outcome variable. Also, Root MSE is an inverse measure of fit: a smaller value indicates a better fit. A perfect fit would have Root MSE = 0.

By contrast, R square is dimensionless and it is sometimes sensible to compare the R square from two different regressions. Also, as I imagine you already know, R square ranges between 0 (no explained variance at all) to 1 (perfect fit).

By the way, this question is off the topic of the thread. These threads are not simply dialogs between a questioner and a responder. Other people read along on topics of interest, or come and specifically search threads by title for specific questions. When a thread goes off topic, these people's time is wasted with extraneous material, or they may be unable to find what they are looking for because it is in a thread with an unrelated title. In the future, when you have a new question, please start a New Topic.
Comment

Announcement

Margins command for FE Panel Data with interaction between a quadratic variable and a factor variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment