nonlinearity in panel data regression

radwa saleh

Join Date: Jan 2018

Posts: 24
#1

nonlinearity in panel data regression

17 Jan 2018, 09:20

Hi Statalist,

i am working on my masterthesis about the effect of dividends announcements across various market quotations and various market conditions . i am running panel data regression i have 29000 observations, 2 dependent variables and around 15 independent variables. my two main variables of interest are 2 dummy variables the first one is dummy market quotation (vary across firms ) and dummy market condition (vary across time ) so the first question is do i have to run two regression models one for the dummy market quotation using xtregress and the between effect and the other is xtregress using the fixed effect to account for the dummy market conditions while keeping all other independent variables. the second problem is that when i checked for the linearity of the relationship between the dependent and the independent variables they all went to be nonlinear so if i will opt for transforming my data should i transform both the dependent and independent, and what kind of transformation and on what basis should i choose it???? (keeping in mind that my dependent variables contain negative values). another problem is my adjusted R square is very small although i added all the explanatory variables that was used in previous literature so i think the reason is because of the nonlinearity am i right?? i have also tried to use nonliner models but i have found out that its more applicable on small sample which is not my case. so am i using the right model or not?????

hope you can help
thanks in advance
Radwa
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#2

17 Jan 2018, 10:54

There is no "one size fits all" transformation that will linearize the relationships in your data. I would start by doing some scatter plots to see what the non-linearities look like. Then you can choose an appropriate transformation based on what you see.

I'm not sure I completely understand your explanation of the data. If one of your predictors of interest is constant over time within every firm, then you will not be able to estimate it's effects with -xtreg, fe- and you would need to use -xtreg, be-, or possibly a random effects model instead. If another preditor of interest is constant across all firms for any given year, then its effect will be estimable in -xtreg, fe- provided you do not also include time indicators in the model.
Comment
radwa saleh

Join Date: Jan 2018

Posts: 24
#3

18 Jan 2018, 04:02

First of all, Thank you Clyde for your quick reply. secondly, sorry if my explanation was not that clear.
well i will try to make it more clear, my dependent variable is simply cumulative average abnormal return ( for a 2 days event window day 0 and day 1), as for my independent variables i have the 2 dummy variables which are the main variables of interest one vary within each firm but constant over time and the other dummy variable vary over time but nor related to the firms. the rest of the variables are simply financial ratios ( liquidity, profitability and leverage ratios) that vary across firms and over time and i have the macro economic variables ( GDP, interest rate and inflation rate ) that vary over time not across firms as my study is conducted only on a single country (UK). my question is if i used xtreg, be i will include only the dummy quotation along with all other variables financial and economics or will i have to remove the economic variables as they vary over time not across firms. also for the xtreg, fe i will include the dummy condition along with all the other variables however you have mentioned " provided you do not also include time indicators in the model" i don't completely get what you mean by that, should i remove the economic variables???

my second question is how do i increase my R square(its 2%) although i have added all the variables that was mentioned in previous literature and why they have a large R2(their R2 is 20%) and i don't????? is it because of the non linearity of my data. i have checked the scatter plot for each and every variable and they are all non linear without having any definite curve they are just a meaningless cluster. below are three examples of the scatter plot diagrams that i got.

appreciated if you can help
thanks in advance
Best Regards
Radwa
Attached Files

Graph.gph (234.2 KB, 1 view)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#4

18 Jan 2018, 08:45

Since your macroeconomic variables are calculated at the country level, and there is just one country, they vary only with time, and they are the same across all firms. Consequently if you were to include i.year variables, those would be colinear with all of the macroeconomic variables and many of them would have to be omitted. Either you can chose which ones to omit by excluding them from your -xtreg- command, or Stata will choose them for you (arbitrarily, or, I believe, based on the order in which they appear in the -xtreg- command). So what to choose? Well, if you include the year indicators, you will automatically be adjusting your analysis for all year-specific effects (observed, like your macroeconomic variables, or unobserved, or even unobservable in principle). Whereas if you include the macroeconomic variables and omit year indicators, you adjust only for the particular macroeconomic variables you have. So it seems to me that including i.year and omitting the macro-economic variables makes much more sense than the other way around.

Now, let's return to the question of linearizing the relationships between your firm-level variables such as debt-to-equity or beta. (I presume these vary over time within firms--I know nothing about finance and little about economics.) I agree that the graphs don't provide any sense of any kind of relationship. I haven't seen the code that was used to generate these graphs, but I'm guessing it was something along the lines of -graph twoway scatter car01 debttoequity- etc. The limitation of this approach is that it smashes together different firms that are at very different levels of these variables. So to get a better sense of what is going on in this panel data, and, in particular, to see what is going on within firm (which is what -xtreg, fe- picks up), you need to do something like -graph twoway scatter car01 debttoequity, by(firm)-. (If the number of firms is large, pick a representative sample of perhaps a dozen and just work with those.) This will give you a better picture of what the within-firm relationships look like. They may be more linear looking than the graphs in #3 suggest.

As for boosting up your R², I have no advice to offer. The data are what they are. Yes, it is possible that some variable transformations will linearize some curvilinear relationships and increase R² a bit. It is worth your while to pursue that, not just to make R² look better, but to get a better specified model. But, ultimately, when you've done that, you'll have to settle for whatever is there. You can throw in more variables to boost R², but if the variables aren't truly relevant you're just making an overfitted model. In fact, if you increase the number of variables to equal the number of observations you will get R² = 1, but that won't shed any light on your research question: the relationship of your two key predictors to car01. Variables should be included based on their being meaningful contributors to the outcome variable.
Comment
radwa saleh

Join Date: Jan 2018

Posts: 24
#5

18 Jan 2018, 09:27

Thank you Clyde for your great concern this is first, secondly I got what you want to say regarding boosting up the R² and how to deal with the non linearity problem i will try it now, however i still don't get what you meant by [So it seems to me that including i.year and omitting the macro-economic variables makes much more sense than the other way around.] do you mean that if i include my dummy condition which vary over time when using xtreg,fe i should remove all the other variables that vary over years but not within firms like the macro economic variables???? but i have tried using xtreg,fe and included both the macro economic variables and the dummy condition and Stata did not omit any variables. another question again when running xtreg,be can i include the time variant variables like macro economic ones or not because they don't vary across firms????

another question is if i will log transform my dependent variable only bearing in mind that it includes negative values i have tried using the following command but it did not work, i am working on Stata 12, and it gives me unrecognized command , i only copied and pasted the command

sign(x) * log(abs(x) + 1)

asinh(x)

sign(x) * abs(x)^(1/3)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#6

18 Jan 2018, 09:50

Because your variable GDP (and other macroeconomic indicators) are the same for all firms in any year, you cannot include both i.year and those variables in an -xtreg, fe-. So you must choose: i.year, or the macroeconomic variables, but not both. In my view, i.year is better because it adjusts for everything that is firm-invariant, not just the particular variables you happen to have data on.

If you need to do some kind of transformation on your outcome variable, you definitely can't use log because you have 0 and negative values. The reason you are getting "unrecognized command" is that what you show are not commands. They are expressions. So you need to do something like:

Code:

gen x_transformed = asinh(x)

etc.
Comment
radwa saleh

Join Date: Jan 2018

Posts: 24
#7

18 Jan 2018, 10:33

i have picked a sample of firms trying to get a more representative picture as you have suggested and this is what i get. even when i transformed the dependent variable using the above mentioned command the graph did not get any much better and when i tried to transform both the dependent and the independent no thing has improved. now the question is can i run the model with this non linear data and just interpret the findings because i have tried almost everything and nothing has changed so i am thinking of giving up ?? is it a problem of what i am doing or a data problem??

thanks again, a lot of appreciation
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#8

18 Jan 2018, 10:45

So, I would interpret that graph is showing that car01 is essentially unrelated to debttoequity. (And I really wonder if that one value of 3000 is a data error!) So, I would not include debttoequity in the model: it won't help you at all, and no amount of transformation will change that.
Comment
radwa saleh

Join Date: Jan 2018

Posts: 24
#9

18 Jan 2018, 11:08

what if all my variables are like this?????? should i run another test bearing in mid that i have a large number of observations???
another question i did not remove any outliers do you think this is the cause of the problem??
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#10

18 Jan 2018, 11:40

what if all my variables are like this??????

If all your variables are like this, then you have data on a bunch of variables that are irrelevant to your study. You may need to rethink how you approach the problem and start over with a better set of variables.

should i run another test bearing in mid that i have a large number of observations???

I don't know what kind of other test you have in mind here, so I can't comment.

i did not remove any outliers do you think this is the cause of the problem??

No, and removing outliers is not appropriate unless you can verify, or have convincing reasons to believe, that the outliers are data errors. And judging from the graphs you have shown, if anything, removing the outliers would make your variables even less related to each other! But, speaking generally, the appropriate response to outliers is to understand where they come from and what effect they are having on your analysis. Removing outliers on your outcome variable is an absolute no-no. For predictor variables, if outliers don't fit, or distort the model, then it may be appropriate to restrict the model to non-outlying cases (and be very clear when reporting your results that they only apply within a restricted range of the variables) and, if possible, develop a separate model that applies to the outlying range. Better still is to attempt to identify which observations are outliers and why, and then perhaps include non-linear terms, interaction terms, or new variables that predict outlier status in the model. There are numerous possibilities that can arise along the way, and it isn't possible to cover them here in any detail.
Comment
radwa saleh

Join Date: Jan 2018

Posts: 24
#11

18 Jan 2018, 12:12

thank you again but if there any chance that i am doing anything wrong???? because the choice of these variables is based on previous literature so noway there is no relationship between them as these were tested before and they found out that there is a relationship and a significant one the only contribution in my study is the dummy variables. and even when i remove the dummy variables and run regression without them i did not also get similar results
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30114
#12

18 Jan 2018, 13:26

You have shown some graphs, but not the code that generated them, nor an example of the data. So I have no way to know if you are doing anything wrong: I don't know what you are doing. I'm simply taking the graphs you show at face value and assuming they represent what you describe.

So if you want reassurance that what you are doing is correct, you need to show example data (use -dataex-; see FAQ #12 if you are not familiar with it) and you need to show the code that is generating the results that trouble you.

Another possibility to consider is that your data is not correct. The graphs you show include a few points that look rather out of whack. While just a few erroneous points usually won't distort analytic results in a major way, occasionally they do. More important, where the data contain a few obvious errors, it is likely they will contain many more errors that are not obvious. So a review of your data set is in order. This would entail first vetting the sources and pedigrees of any data you received from others, and then a careful review of whatever data management you have carried out to get from those sources to your working data set.
Comment

Announcement

nonlinearity in panel data regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment