Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • panel data analysis-no observations

    Hello,
    I would like to ask a question regarding a panel data analysis I am try to run for my dissertation.

    My aim is to analyse the relationship between CO2 emissions gdp, fdi and trade. I gathered all the data and I am trying to work with stata to analyse them; I used the commands xtset and xtreg since I am dealing with panel data. However, when I run the regression, this is what I get:
    no observations
    r(2000)

    I suppose the problem regards the missing data (indeed there are several missing values) but I do not know how to deal with that. If anyone can suggest me how to proceed, I would really appreciate because at the moment I am stuck and I do not know how to come up with any solution.


    Attached Files

  • #2
    Elena:
    as per FAQ, it's better to post what you typed and what Stata gave you back (and attach a dataset in .dta format, if deemed necessary), instead of describing what happened.
    That said, a possible approach for dealing with missing values in panel data (but take a step behind: what is the reason why those data are missing? Is their missingness ignorable or not?) is via -ipolate-.
    But this won't probably help without knowing the whole story.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Elena, for future reference, attaching screen shots or photographs often results in unreadable attachments. In your case, it was possible to read it, but you can't count on that. This forum, for whatever reason, just doesn't deal well with pictures. The best way to show data is to either attach a .dta as Carlo suggested, or to -list- some representative variables and observations and copy Stata's output into a code block here.

      That said, I don't think that the missing values you note are the whole problem. Certainly, any observation with a missing value on any of the regression variables will be omitted. But the first several observations in your screen shot have no variables with missing values and they should not be omitted.

      I think the more likely problem is that one of your regression variables is, unbeknownst to you, actually a string variable that looks like a number. Try running -des- and see if that is the problem. If it is, you need to -destring- any string variable that should be numeric.

      Comment


      • #4
        Hello Carlo and thanks for your reply.
        here it is what I taped in stata:

        xtset cnt year,yearly
        panel variable: cnt (strongly balanced)
        time variable: year, 1990 to 2010
        delta: 1 year

        . xtreg co2 gdp fdi trade
        no observations
        r(2000);

        end of do-file

        r(2000);

        I gathered the data on CO2 emissions, fdi and trade through the world bank database and since I am dealing with eastern European countries from 1990 to 2010, some data are missing especially those at the beginning of the 1990s.

        I have attended the econometric class at university last semester but we have not approached the panel data analysis so these issues are quite new for me

        thanks,

        Elena
        Attached Files

        Comment


        • #5
          The problem is that "fdi" is a string variable, not a numeric variable. You will have to destring it (and remove the spaces first).

          Comment


          • #6
            Hi Elena,

            Forgive me if this is too obvious a suggestion (I could not download the .dta file), but have you checked variable types to ensure they are numeric format?

            You can use the describe command to ensure that your numeric variables are saved as in the proper format. It sometimes happens that numbers can get saved as string variables when you are downloading and importing data. This page has a nice tutorial on fixing the problem: http://www.ats.ucla.edu/stat/stata/faq/destring.htm

            Best,

            -nick

            Comment


            • #7
              This should fix the problem (I think--in fact, I'd love for someone to confirm this):
              Code:
               replace fdi = trim(fdi)  // Removes leading and trailing blanks
              replace fdi = subinstr(fdi," ","",.)  // Removes blanks within the string
              destring fdi, replace
              EDIT: Now that I look at this, I think the first command is actually superfluous, as the second command should replace ALL blanks.
              Last edited by Joshua D Merfeld; 09 Jun 2015, 09:18.

              Comment


              • #8
                First of all thanks everyone for your help.

                I tried the code who Joshua typed above and I confirm the problem was the variable fdi, so then I typed again the variable describe and here it is what I obtained :
                . describe

                Contains data from E:\ECONOMIC DISSERTATION\Data Dissertation\diss.dta
                obs: 210
                vars: 8 9 Jun 2015 15:34
                size: 10,290 (99.9% of memory free)
                ------------------------------------------------------------------------------------------------------------------------------------------------
                storage display value
                variable name type format label variable label
                ------------------------------------------------------------------------------------------------------------------------------------------------
                country str15 %15s
                year int %ty Year
                co2 float %8.0g CO2
                gdp float %8.0g GDP
                fdi double %10.0g FDI
                trade float %8.0g Trade
                cnt float %9.0g group(country)
                yr float %9.0g group(year)
                ------------------------------------------------------------------------------------------------------------------------------------------------
                Sorted by: cnt year

                Afterwords I run the regression and I got this:

                . xtset cnt year,yearly
                panel variable: cnt (strongly balanced)
                time variable: year, 1990 to 2010
                delta: 1 year

                . xtreg co2 gdp fdi trade

                Random-effects GLS regression Number of obs = 191
                Group variable: cnt Number of groups = 10

                R-sq: within = 0.0482 Obs per group: min = 16
                between = 0.3999 avg = 19.1
                overall = 0.0620 max = 21

                Wald chi2(3) = 8.57
                corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0356

                ------------------------------------------------------------------------------
                co2 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                gdp | -.0000154 .0000143 -1.08 0.279 -.0000434 .0000125
                fdi | -.0000271 .0000179 -1.52 0.129 -.0000622 7.90e-06
                trade | -.0006801 .003093 -0.22 0.826 -.0067422 .005382
                _cons | 7.493848 .8998836 8.33 0.000 5.730108 9.257587
                -------------+----------------------------------------------------------------
                sigma_u | 2.7057659
                sigma_e | .64531866
                rho | .94618023 (fraction of variance due to u_i)
                ------------------------------------------------------------------------------

                the number of observations now is 190 instead of 210 but what is concerning I think is the p value....looks like all the variables are not significant which is awkward according to my studies. what do you think I should do?

                Comment


                • #9
                  Elena:
                  the decreased number of observations is due to missing values.
                  The lack of statistical significance (something that scared me when I was younger, so much younger than today) may well be due to a limited sample size (you have on average 19.1 observations for each of the 10 groups).
                  As an aside, you can drop -country-, keep -cnt- only and assign each value the name of a nation:
                  Code:
                  label define cnt 1 Bulgaria 2 Czech_Republic 3 Estonia 4 Hungary 5 Latvia 6 Lithuania 7 Poland 8 Romania 9 Slovak_Republic 10 Slovenia
                  label val cnt cnt
                  Eventually, i would also consider to -cluster- the standard errors of your coefficients on -cnt- (once -label-led).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thanks Carlo,

                    I use this code at the beginning of my data analysis:
                    code:
                    egen cnt = group(country)
                    list country cnt in 1/10, sepby(country)

                    Am I wrong or is the same thing?


                    Comment


                    • #11
                      Elena:
                      it would have been the same thing had the values been -label-led:
                      Code:
                      label define cnt 1 Bulgaria 2 Czech_Republic 3 Estonia 4 Hungary 5 Latvia 6 Lithuania 7 Poland 8 Romania 9 Slovak_Republic 10 Slovenia
                      label val cnt cnt
                      egen cnt_2 = group(country)// I have changed your original cnt in cnt_2
                      list country cnt cnt_2 in 1/10, sepby(country) list country cnt cnt_2 in 1/10, sepby(country)
                      
                           +-----------------------------+
                           |  country        cnt   cnt_2 |
                           |-----------------------------|
                        1. | Bulgaria   Bulgaria       1 |
                        2. | Bulgaria   Bulgaria       1 |
                        3. | Bulgaria   Bulgaria       1 |
                        4. | Bulgaria   Bulgaria       1 |
                        5. | Bulgaria   Bulgaria       1 |
                        6. | Bulgaria   Bulgaria       1 |
                        7. | Bulgaria   Bulgaria       1 |
                        8. | Bulgaria   Bulgaria       1 |
                        9. | Bulgaria   Bulgaria       1 |
                       10. | Bulgaria   Bulgaria       1 |
                           +-----------------------------+
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Carlo, too few countries to cluster on.

                        Comment


                        • #13
                          I created a new variable called gdp2 to see if there is a turning point (a theory says that at some point as the income rise the people will start to pollute less) and I got this new table:

                          . xtreg co2 gdp gdp2 fdi trade

                          Random-effects GLS regression Number of obs = 191
                          Group variable: cnt_2 Number of groups = 10

                          R-sq: within = 0.0705 Obs per group: min = 16
                          between = 0.3375 avg = 19.1
                          overall = 0.0528 max = 21

                          Wald chi2(4) = 13.01
                          corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0112

                          ------------------------------------------------------------------------------
                          co2 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                          gdp | -.0000824 .0000357 -2.31 0.021 -.0001522 -.0000125
                          gdp2 | 2.89e-09 1.42e-09 2.04 0.041 1.12e-10 5.67e-09
                          fdi | -.0000119 .0000192 -0.62 0.534 -.0000495 .0000257
                          trade | -.0006117 .0030543 -0.20 0.841 -.006598 .0053746
                          _cons | 7.691307 .967794 7.95 0.000 5.794466 9.588149
                          -------------+----------------------------------------------------------------
                          sigma_u | 2.9274344
                          sigma_e | .63952089
                          rho | .95445001 (fraction of variance due to u_i)

                          the gdp2 coefficient looks a bit strange to me...can anyone tell me if it is possible or is likely that I made a mistake?

                          Comment


                          • #14
                            Eric:
                            yes, with 10 countries only default standard errors do not differ that much from -cluster-ed ones.
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #15
                              So if you have a simple quadratic relationship co2 = a*gdp^2 + b*gdp + c (we ignore the other variables for present purposes), a basic fact of parabolas is that the vertex (turning point) will occur at gdp = -b/2a, which from your output works out to about 14256. That seems like a reasonable result to me, assuming your gdp is denominated in US dollars.

                              Remember that these gdp numbers are large numbers, so when you square them you get gigantic numbers. In order to then include them in an estimate of co2 (which may be moderate size numbers), they need a very small coefficient.

                              As an aside, rather than calculating a separate gdp2 variable and adding it to your model, you could have done:

                              Code:
                              xtreg co2 c.gdp##c.gdp fdi trade
                              The advantage of that is that if, following the regression, you want to look at marginal effects of any of your variables, the -margins- command will understand that you have a quadratic term in gdp and will handle it correctly. With your code, if you try to get the marginal effect of anything, -margins- may get it wrong.

                              Two other suggestions about including quadratic models. Although the results you got strike me as plausible as they stand, if you are uncomfortable with micro-coefficients, you could rescale gdp to some other unit. If it is currently in US dollars you could rescale it to thousands of US dollars--then the coefficients will look like more "normal" numbers.

                              Another approach often used is to center the gdp variable around some value you think may be near the turning point. (In this case 15,000 might be a good choice.) So, you can run:
                              Code:
                              gen gdp_c = gdp - 15000
                              xtreg co2 c.gdp_c##c.gdp_c fdi trade
                              With this algebraic transformation of your model, the coefficient of gdp_c will be close to zero, and the regression will show you that the co2~gdp relationship looks parabolic with an axis of symmetry near 15000. This transformation, in addition to simplifying things by nearly eliminating the linear term, also has the indirect effect of rescaling to somewhat smaller numbers.

                              Comment

                              Working...
                              X