panel data analysis-no observations

elena mengo

Join Date: Jun 2015

Posts: 30
#1

panel data analysis-no observations

09 Jun 2015, 08:02

Hello,
I would like to ask a question regarding a panel data analysis I am try to run for my dissertation.
My aim is to analyse the relationship between CO2 emissions gdp, fdi and trade. I gathered all the data and I am trying to work with stata to analyse them; I used the commands xtset and xtreg since I am dealing with panel data. However, when I run the regression, this is what I get:
no observations
r(2000)

I suppose the problem regards the missing data (indeed there are several missing values) but I do not know how to deal with that. If anyone can suggest me how to proceed, I would really appreciate because at the moment I am stuck and I do not know how to come up with any solution.

Attached Files
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

09 Jun 2015, 08:12

Elena:
as per FAQ, it's better to post what you typed and what Stata gave you back (and attach a dataset in .dta format, if deemed necessary), instead of describing what happened.
That said, a possible approach for dealing with missing values in panel data (but take a step behind: what is the reason why those data are missing? Is their missingness ignorable or not?) is via -ipolate-.
But this won't probably help without knowing the whole story.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#3

09 Jun 2015, 08:40

Elena, for future reference, attaching screen shots or photographs often results in unreadable attachments. In your case, it was possible to read it, but you can't count on that. This forum, for whatever reason, just doesn't deal well with pictures. The best way to show data is to either attach a .dta as Carlo suggested, or to -list- some representative variables and observations and copy Stata's output into a code block here.

That said, I don't think that the missing values you note are the whole problem. Certainly, any observation with a missing value on any of the regression variables will be omitted. But the first several observations in your screen shot have no variables with missing values and they should not be omitted.

I think the more likely problem is that one of your regression variables is, unbeknownst to you, actually a string variable that looks like a number. Try running -des- and see if that is the problem. If it is, you need to -destring- any string variable that should be numeric.
Comment
elena mengo

Join Date: Jun 2015

Posts: 30
#4

09 Jun 2015, 08:43

Hello Carlo and thanks for your reply.
here it is what I taped in stata:

xtset cnt year,yearly
panel variable: cnt (strongly balanced)
time variable: year, 1990 to 2010
delta: 1 year

. xtreg co2 gdp fdi trade
no observations
r(2000);

end of do-file

r(2000);

I gathered the data on CO2 emissions, fdi and trade through the world bank database and since I am dealing with eastern European countries from 1990 to 2010, some data are missing especially those at the beginning of the 1990s.

I have attended the econometric class at university last semester but we have not approached the panel data analysis so these issues are quite new for me

thanks,

Elena
Attached Files

diss.dta (11.6 KB, 1 view)
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#5

09 Jun 2015, 08:54

The problem is that "fdi" is a string variable, not a numeric variable. You will have to destring it (and remove the spaces first).
Comment
Nick Cain

Join Date: May 2014

Posts: 84
#6

09 Jun 2015, 08:57

Hi Elena,

Forgive me if this is too obvious a suggestion (I could not download the .dta file), but have you checked variable types to ensure they are numeric format?

You can use the describe command to ensure that your numeric variables are saved as in the proper format. It sometimes happens that numbers can get saved as string variables when you are downloading and importing data. This page has a nice tutorial on fixing the problem: http://www.ats.ucla.edu/stat/stata/faq/destring.htm

Best,

-nick
Comment
Joshua D Merfeld

Join Date: Jun 2015

Posts: 86
#7

09 Jun 2015, 09:07

This should fix the problem (I think--in fact, I'd love for someone to confirm this):

Code:

replace fdi = trim(fdi) // Removes leading and trailing blanks replace fdi = subinstr(fdi," ","",.) // Removes blanks within the string destring fdi, replace

EDIT: Now that I look at this, I think the first command is actually superfluous, as the second command should replace ALL blanks.

Last edited by Joshua D Merfeld; 09 Jun 2015, 09:18.
2 likes
Comment
elena mengo

Join Date: Jun 2015

Posts: 30
#8

09 Jun 2015, 09:19

First of all thanks everyone for your help.

I tried the code who Joshua typed above and I confirm the problem was the variable fdi, so then I typed again the variable describe and here it is what I obtained :
. describe

Contains data from E:\ECONOMIC DISSERTATION\Data Dissertation\diss.dta
obs: 210
vars: 8 9 Jun 2015 15:34
size: 10,290 (99.9% of memory free)
------------------------------------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------------------------------------
country str15 %15s
year int %ty Year
co2 float %8.0g CO2
gdp float %8.0g GDP
fdi double %10.0g FDI
trade float %8.0g Trade
cnt float %9.0g group(country)
yr float %9.0g group(year)
------------------------------------------------------------------------------------------------------------------------------------------------
Sorted by: cnt year

Afterwords I run the regression and I got this:

. xtset cnt year,yearly
panel variable: cnt (strongly balanced)
time variable: year, 1990 to 2010
delta: 1 year

. xtreg co2 gdp fdi trade

Random-effects GLS regression Number of obs = 191
Group variable: cnt Number of groups = 10

R-sq: within = 0.0482 Obs per group: min = 16
between = 0.3999 avg = 19.1
overall = 0.0620 max = 21

Wald chi2(3) = 8.57
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0356

------------------------------------------------------------------------------
co2 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gdp | -.0000154 .0000143 -1.08 0.279 -.0000434 .0000125
fdi | -.0000271 .0000179 -1.52 0.129 -.0000622 7.90e-06
trade | -.0006801 .003093 -0.22 0.826 -.0067422 .005382
_cons | 7.493848 .8998836 8.33 0.000 5.730108 9.257587
-------------+----------------------------------------------------------------
sigma_u | 2.7057659
sigma_e | .64531866
rho | .94618023 (fraction of variance due to u_i)
------------------------------------------------------------------------------

the number of observations now is 190 instead of 210 but what is concerning I think is the p value....looks like all the variables are not significant which is awkward according to my studies. what do you think I should do?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#9

09 Jun 2015, 09:43

Elena:
the decreased number of observations is due to missing values.
The lack of statistical significance (something that scared me when I was younger, so much younger than today) may well be due to a limited sample size (you have on average 19.1 observations for each of the 10 groups).
As an aside, you can drop -country-, keep -cnt- only and assign each value the name of a nation:

Code:

label define cnt 1 Bulgaria 2 Czech_Republic 3 Estonia 4 Hungary 5 Latvia 6 Lithuania 7 Poland 8 Romania 9 Slovak_Republic 10 Slovenia label val cnt cnt

Eventually, i would also consider to -cluster- the standard errors of your coefficients on -cnt- (once -label-led).

Kind regards,
Carlo
(Stata 19.0)
Comment
elena mengo

Join Date: Jun 2015

Posts: 30
#10

09 Jun 2015, 09:57

Thanks Carlo,

I use this code at the beginning of my data analysis:
code:
egen cnt = group(country)
list country cnt in 1/10, sepby(country)

Am I wrong or is the same thing?
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

#11

09 Jun 2015, 10:14

Elena:
it would have been the same thing had the values been -label-led:

Code:

label define cnt 1 Bulgaria 2 Czech_Republic 3 Estonia 4 Hungary 5 Latvia 6 Lithuania 7 Poland 8 Romania 9 Slovak_Republic 10 Slovenia
label val cnt cnt
egen cnt_2 = group(country)// I have changed your original cnt in cnt_2
list country cnt cnt_2 in 1/10, sepby(country) list country cnt cnt_2 in 1/10, sepby(country)

     +-----------------------------+
     |  country        cnt   cnt_2 |
     |-----------------------------|
  1. | Bulgaria   Bulgaria       1 |
  2. | Bulgaria   Bulgaria       1 |
  3. | Bulgaria   Bulgaria       1 |
  4. | Bulgaria   Bulgaria       1 |
  5. | Bulgaria   Bulgaria       1 |
  6. | Bulgaria   Bulgaria       1 |
  7. | Bulgaria   Bulgaria       1 |
  8. | Bulgaria   Bulgaria       1 |
  9. | Bulgaria   Bulgaria       1 |
 10. | Bulgaria   Bulgaria       1 |
     +-----------------------------+

Kind regards,
Carlo
(Stata 19.0)

Comment

Eric de Souza

Join Date: Mar 2014

Posts: 587
#12

09 Jun 2015, 10:42

Carlo, too few countries to cluster on.
Comment
elena mengo

Join Date: Jun 2015

Posts: 30
#13

09 Jun 2015, 10:49

I created a new variable called gdp2 to see if there is a turning point (a theory says that at some point as the income rise the people will start to pollute less) and I got this new table:

. xtreg co2 gdp gdp2 fdi trade

Random-effects GLS regression Number of obs = 191
Group variable: cnt_2 Number of groups = 10

R-sq: within = 0.0705 Obs per group: min = 16
between = 0.3375 avg = 19.1
overall = 0.0528 max = 21

Wald chi2(4) = 13.01
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0112

------------------------------------------------------------------------------
co2 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gdp | -.0000824 .0000357 -2.31 0.021 -.0001522 -.0000125
gdp2 | 2.89e-09 1.42e-09 2.04 0.041 1.12e-10 5.67e-09
fdi | -.0000119 .0000192 -0.62 0.534 -.0000495 .0000257
trade | -.0006117 .0030543 -0.20 0.841 -.006598 .0053746
_cons | 7.691307 .967794 7.95 0.000 5.794466 9.588149
-------------+----------------------------------------------------------------
sigma_u | 2.9274344
sigma_e | .63952089
rho | .95445001 (fraction of variance due to u_i)

the gdp2 coefficient looks a bit strange to me...can anyone tell me if it is possible or is likely that I made a mistake?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#14

09 Jun 2015, 10:52

Eric:
yes, with 10 countries only default standard errors do not differ that much from -cluster-ed ones.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#15

09 Jun 2015, 11:06

So if you have a simple quadratic relationship co2 = a*gdp^2 + b*gdp + c (we ignore the other variables for present purposes), a basic fact of parabolas is that the vertex (turning point) will occur at gdp = -b/2a, which from your output works out to about 14256. That seems like a reasonable result to me, assuming your gdp is denominated in US dollars.

Remember that these gdp numbers are large numbers, so when you square them you get gigantic numbers. In order to then include them in an estimate of co2 (which may be moderate size numbers), they need a very small coefficient.

As an aside, rather than calculating a separate gdp2 variable and adding it to your model, you could have done:

Code:

xtreg co2 c.gdp##c.gdp fdi trade

The advantage of that is that if, following the regression, you want to look at marginal effects of any of your variables, the -margins- command will understand that you have a quadratic term in gdp and will handle it correctly. With your code, if you try to get the marginal effect of anything, -margins- may get it wrong.

Two other suggestions about including quadratic models. Although the results you got strike me as plausible as they stand, if you are uncomfortable with micro-coefficients, you could rescale gdp to some other unit. If it is currently in US dollars you could rescale it to thousands of US dollars--then the coefficients will look like more "normal" numbers.

Another approach often used is to center the gdp variable around some value you think may be near the turning point. (In this case 15,000 might be a good choice.) So, you can run:

Code:

gen gdp_c = gdp - 15000 xtreg co2 c.gdp_c##c.gdp_c fdi trade

With this algebraic transformation of your model, the coefficient of gdp_c will be close to zero, and the regression will show you that the co2~gdp relationship looks parabolic with an axis of symmetry near 15000. This transformation, in addition to simplifying things by nearly eliminating the linear term, also has the indirect effect of rescaling to somewhat smaller numbers.
Comment

Announcement

panel data analysis-no observations

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment