Negative values in dependent variable and zeros in sample (FDI-data)

Emmy Lundblad

Join Date: May 2016

Posts: 21
#1

Negative values in dependent variable and zeros in sample (FDI-data)

04 May 2016, 06:50

Hello!

I have two questions:

1) My aim is to use bilateral FDI inflows as my dependent variable in a PPML regression (FDIijt = FDI flow from country i to country j in time t). As I understand this model will work well with my sample, which includes a high number of zeros. Is there any instance where the number of zeros in relation non-zero values are "too many" for this model (PPML)?

2) The other question I have is how I should treat my dependent variable, since it includes negative values? Thereby not suitable for PPML in its current condition.

Does any one have a solution for a transformation of these values? It is my impression that many researchers take log (FDIijt) which implies turing the negative values into zeros. Which not seems statistically optimal. And as I understand one should not log your dependent variable when using PPML.

I've seen the transformation below used. But I am not that sure how the interpretation of the regression is affected by this.
Tags: FDI, negative values, panel data, PPML
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

04 May 2016, 11:22

Your analysis raises many questions. Is there a reason for PPML instead of poisson or OLS regression? Are the imposed distributional assumptions from PPML or poisson what you really want? Are you getting duplicate observations (i.e., i to j and j to i at time t)? You probably don't want both in the data - it would be double counting a single real observation. If PPML is really the right procedure then you might simply use the ij or ji pair that was positive.

With meaningful negative numbers and lots of zeros, you do not want to take a log transform. That gives an undefined value for all the zeros and negative numbers. Converting such undefined values into zeros in your logged variable is like replacing the original data where you had negative numbers with the value 1. That is, the log of a very small positive number is a negative number. Then you don't want to code the zeros and negative numbers into 0's, i.e., make them bigger than the small positive number in the log. Sometimes people add a small value to a variable to make the zeros into a positive number allowing a log, but that doesn't work for negative numbers. You're also estimating a non-linear relation between the regressors and the dv, so a non-linear transformation of the dv makes it even more complicated.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#3

04 May 2016, 13:33

Dear Emmy,

The answer to you first question is simple: no; ppml should work fine with any proportion of zeros.

Your second question is more challenging. If you do not have many negative observations and these are of small magnitude, then ppml may still be fine. The critical assumption in ppml is that the conditional mean is always positive; strictly speaking there is no need for the dependent variable to be non-negative. If, however, or negative observations are such that it is likely that the conditional mean may be negative for some value of the regressors, then ppml is simply not suitable.

Best wishes,

Joao
Comment
Emmy Lundblad

Join Date: May 2016

Posts: 21
#4

05 May 2016, 02:06

No as you write Mr Bromiley, it is one observation i to j at t (and not the other way around). I was recommended to use PPML for reasons Mr Silva and co-writers has pointed out in the Log of gravity. Since I'm using a gravity model, with FDI and not trade, I have negative values.

If I use OLS, I'm guessing I should log the FDI-data since it is skewed.. But then I'm back att square one. The amount of negative values are quite substantial.

The transformation i mentioned in my post did not show I see now. But I attached it below:

This is used in one paper to handle the negative values. As i mentioned, I'm not sure how this would affect the interpretation of the regression with this type of transformation.

Thank you both for your answers!
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#5

05 May 2016, 13:45

Dear Emmy,

That is the inverse hyperbolic sine transformation. If you use that transformation you will not be able to interpret the coefficients as elasticities.

I see form your profile that you are a student; you should discuss with your adviser what is the most appropriate way to deal with this problem. I do not believe the profession has reached a consensus on how to deal with FDI data and so, unless you are doing a PhD on this topic, you can just use the solution your supervisor suggests.

Best of luck,

Joao
2 likes
Comment
Emmy Lundblad

Join Date: May 2016

Posts: 21
#6

09 May 2016, 02:33

That was my concern with the transformation.

My advisor has been unavailable for some time, this is the reason for my posts in this forum. Thank you so much for your help and guidance!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#7

09 May 2016, 03:45

A transformation closer to the logarithm which is defined for zero and negative values is

Code:

sign(x) * log(1 + abs(x))

which behaves like log(x) for x >> 0 and like -log(-x) for x << 0.

Some information in transint from SSC

Code:

ssc inst transint h transint
Comment
William Nolan

Join Date: Aug 2022

Posts: 9
#8

21 Aug 2022, 03:37

I am currently at the analysis stage of my dissertation thesis looking at determinants of Chinese OFDI to Africa post-2013.

my dv is chinese ofdi to host countries however I am running into issues due to the non-positive values of my dv

i have run fixed effects and random effects regressions on my panel data using the log of fdi but this excludes non-positive fdi values - the results of my random effects model are solid with a high model r2 and coefficients and significance levels that support my literature supported hypotheses - however I question the significance of these results in light of omittance of non-positive values (reduces my no. of observations from 360 to 310)

i have also run an ols with averages of each iv and my dv across the 7 year period of my study

any suggestions to deal with this issue
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#9

21 Aug 2022, 04:09

William Nolan Please post the commands you have used. As this thread already documents, neglog(y) = sign(y) * log(1 + abs(y)) and asinh(y) are alternative transformations that can cope with negative arguments. One implication is that any associated elasticity is no longer constant, but variable, which doesn't to me seem outrageous or insuperable.

Another pointer is how far your examiners would expect to see a comparison between a model omitting negative values and one for the entire dataset.

I am no economist, but it seems to me that your negative values are real and should not be omitted just because they are awkward for one rather conventional analysis.

What is acceptable at your institution we can't tell. But having advised students at all levels from first degree to Ph.D. I imagine that you may still have scope to get advice from appropriate teachers.
1 like
Comment
William Nolan

Join Date: Aug 2022

Posts: 9
#10

21 Aug 2022, 04:39

Nick,

Thank you for the detailed response.

What are your views on the statistical soundness and representativeness of averaging each variables across the 8 years for each host country and then running an OLS regression on these averaged variables?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#11

21 Aug 2022, 05:16

People don't seem to want to believe me when I say "I am no economist" but whether averaging across time is a good idea seems to me to boil down to what economics question you want to answer. I don't know what you mean by statistical soundness or representativeness here. except that it is my view that, vacuously, replacing a variable by its mean is what you might do if you don't much care about variations around the mean. It's hard for me to imagine that you will improve your analysis by averaging over time, however -- again, except in so far as some panel methods boil down to comparing between-panel and within-panel variation.

Economists who have posted earlier in this thread and who have expertise you want to tap will surely echo my request that you show them the exact commands you have used.
1 like
Comment
William Nolan

Join Date: Aug 2022

Posts: 9
#12

21 Aug 2022, 05:37

generate LFDI=log(FDI)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#13

21 Aug 2022, 06:48

Sorry, no. I didn't mean that. These commands:

i have run fixed effects and random effects regressions on my panel data
Comment
William Nolan

Join Date: Aug 2022

Posts: 9
#14

21 Aug 2022, 13:28

xtreg (Log of DV) (Independent Variables), re
xtreg (Log of DV) (Independent Variables), fe
Comment
William Nolan

Join Date: Aug 2022

Posts: 9
#15

21 Aug 2022, 13:29

Hausman test implied random effects to be correct model
random effects model returned r2 of 0.67
1 like
Comment

Announcement

Negative values in dependent variable and zeros in sample (FDI-data)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment