Importance of misspecification test vs. R-sq. and consequences of xtsktest

Helen Hickmann

Join Date: May 2020
Posts: 24

Importance of misspecification test vs. R-sq. and consequences of xtsktest

30 Jun 2020, 04:21

Dear Statalist Members,

I am analyzing a balanced panel of around 2400 firms over 12 years (Stata 13). The output I am able to present here is based on test data, as I am not allowed (or able to) extract the original files. The only difference is the number of firms, which is higher in the original dataset, and that most of my explanatory variables turn out to be significant, unlike in this sample data. F-statistic in the original is F(11,13432) Prob>F 0.0000, R-sq. overall is 0.9639.

My goal is to analyze the effect of investments in computer (investict, dummy 0-1), product and process innovations (dummies 0-1) on the demand for highskilled workers. Controls include the size of the firm in terms of employees (total), the industry, a dummy for West Germany (west), a dummy for a collective bargaining agreement (collective), the state of the art of production equipment (tech) and if the firm deals with RnD (dummy), and some more.

I have used xtserial and xttest3 which have lead me to include clustered robust standard errors. Using xtoverid,made me decide to use fixed effects. -testparm- has made me include year fixed effects. So my regression is now:

Code:

  xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd t
> ech i.year, fe vce(cluster idnum)
note: west omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =      4344
Group variable: idnum                           Number of groups   =       498

R-sq:  within  = 0.1005                         Obs per group: min =         1
       between = 0.5034                                        avg =       8.7
       overall = 0.4393                                        max =        11

                                                F(21,497)          =      2.60
corr(u_i, Xb)  = 0.3892                         Prob > F           =    0.0001

                                (Std. Err. adjusted for 498 clusters in idnum)
------------------------------------------------------------------------------
             |               Robust
   highskill |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   investict |   .7032893   .2711382     2.59   0.010      .170571    1.236008
product_inno |   .2723859   .6988765     0.39   0.697    -1.100731    1.645503
process_inno |  -.3938082   .4501978    -0.87   0.382    -1.278334    .4907173
       total |    .101938   .0245108     4.16   0.000     .0537805    .1500954
        west |          0  (omitted)
    industry |   .1624997   .1911486     0.85   0.396    -.2130592    .5380586
  collective |  -.2838042   .5861356    -0.48   0.628    -1.435413    .8678049
 exportshare |   .8483747   2.351452     0.36   0.718    -3.771638    5.468387
  investment |   1.44e-06   5.98e-07     2.41   0.016     2.68e-07    2.62e-06
    turnover |  -1.99e-07   1.39e-07    -1.43   0.153    -4.73e-07    7.46e-08
         rnd |  -1.103514   .9824249    -1.12   0.262    -3.033732    .8267042
        tech |  -.6756037   .2828397    -2.39   0.017    -1.231313   -.1198947
             |
        year |
       2008  |   .0310991   .3815399     0.08   0.935    -.7185309    .7807291
       2009  |   .4981931   .3197414     1.56   0.120    -.1300184    1.126405
       2010  |   .7890588   .4913133     1.61   0.109    -.1762483    1.754366
       2011  |   1.109093   .5630923     1.97   0.049     .0027585    2.215428
       2012  |   1.189345   .5407669     2.20   0.028      .126874    2.251816
       2013  |   .0965383   .7094676     0.14   0.892    -1.297387    1.490464
       2014  |   .4120097   .6609871     0.62   0.533    -.8866637    1.710683
       2015  |  -.1867301   .7267681    -0.26   0.797    -1.614647    1.241187
       2016  |   .1137137   .5447759     0.21   0.835     -.956634    1.184061
       2017  |  -.4267298   .7349041    -0.58   0.562    -1.870632    1.017172
             |
       _cons |   4.706464   2.350515     2.00   0.046     .0882924    9.324636
-------------+----------------------------------------------------------------
     sigma_u |  22.632204
     sigma_e |  7.5596268
         rho |  .89962854   (fraction of variance due to u_i)
------------------------------------------------------------------------------

I originally intended to use the share of highskilled employees as my dependent variable, but after reading the paper of Kronman (1993) and several posts in this forum concerning the problems with ratios, I have switched to using the absolute number of highskilled employees (highskill) and include the total number of employees as a control. This has increased my R-squared by a lot (it was only 0.016 before).

On the other hand, I tested my model specification using:

Code:

 predict fitted, xb
g sq_fitted=fitted^2
xtreg highskill fitted sq_fitted
test sq_fitted

The p-value was 0.8 before when using the share, now it is significant (0.0000) and telling me my model is misspecified. Now my question is, if the test I used to test for misspecification is the right thing to do here and if yes, what else can I do now concerning my specification? Or is a high R-Sq. enough to argue that my model fits?

Also I don't understand why the dummy for west would be omitted, none of the regressors are highly correlated.

I have read many posts in this forum and run several tests that made me end up with this fixed effects regression model, so I am confused about the result of the specification test. I have also tried -areg-, absorb(idnum) vce(cluster idnum), which has slightly different coefficients and a higher R-Sq. (as is normal) than the -xtreg, fe- but it has the same result in the misspecification test.

Testing for normality using

Code:

 xtreg highskill investict product_inno process_inno total west industry collective exportshare investment turnover rnd tech, re vce(cluster idnum)

(re because it is not possible with fe) and then -xtsktest- has given me the following:

Code:

   xtsktest
(running _xtsktest_calculations on estimation sample)

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50

Tests for skewness and kurtosis                 Number of obs      =      4344
                                                Replications       =        50

                                 (Replications based on 498 clusters in idnum)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  Skewness_e |  -1805.438   1230.613    -1.47   0.142    -4217.396    606.5195
  Kurtosis_e |   456552.4   194447.7     2.35   0.019     75441.97    837662.8
  Skewness_u |    12182.3   2960.393     4.12   0.000     6380.038    17984.56
  Kurtosis_u |    1510700   274557.2     5.50   0.000     972577.4     2048822
------------------------------------------------------------------------------
Joint test for Normality on e:        chi2(2) =   7.67    Prob > chi2 = 0.0217
Joint test for Normality on u:        chi2(2) =  47.21    Prob > chi2 = 0.0000
------------------------------------------------------------------------------

Could this mean I should transform my data using logs as there are issues with normality? or what are the consequences?

I appreciate any input on my issues, thanks in advance,

Helen

Last edited by Helen Hickmann; 30 Jun 2020, 04:30.

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

30 Jun 2020, 06:43

Helen:
the first thing I would look at (via -estat vce, corr) is possible quasi-extreme multicolinearity issue in your original code (where most the coefficients do not reach statistical significance despite a respectable R-sq within).
The misspecification test you ran actually tests the correctness of the regressand functional form (see -linktest- entry in Stata .pdf model for more details, as -linktest- uses the same machinery), although oftentimes is (aslo) interpreted as a test of misspecification of the predictors: that said, it may be that some of your regressors have a non-linear relationship with the regressand.
Eventually, I do not think that striving for normality is relevant, as normality is a (weak) requirement for the (idiosyncratic) residual distribution only and with a large sample even minimal departures form normality make the alarm start (I would visually inspect the whole matter, instead).

Last edited by Carlo Lazzaro; 30 Jun 2020, 06:46.

Kind regards,
Carlo
(Stata 19.0)
Comment
Helen Hickmann

Join Date: May 2020

Posts: 24
#3

30 Jun 2020, 07:48

Dear Carlo,

Thank you for your reply. I was maybe a little unclear above: so in my original data, most of my regressors actually are significant, and the R-sq as stated above is way higher than here in the sample data. I have run -estat vce, corr- in my original data and the only high correlation is between turnover and total (almost 1), which is logical considering that bigger firms (in terms of employees) have a higher turnover (or sales volume). I was afraid to exclude one of the two since both are highly significant and Stata did not throw them out... or is it better to do so (throw out one)? All other correlations between the independent variables are lower than 0.1.

When it comes to normality I did inspect my variables graphically and they do not seem to be normally distributed as they do not have negative values. For variables like the total amount of employees, the amount of highskilled or the turnover and investment amounts as well as the exportshares, the distributions are all more concentrated on the lower (left) end, as i have more smaller/medium sized firms in the panel (and in Germany so this is more or less representative).

I have therefore tried

Code:

xtreg lnhighskill investict product_inno process_inno lntotal west collective lnexportshare lninvestment lnturnover rnd tech i.year, fe vce(cluster idnum)

and now my model seems to be correctly specified according to the same test as above (0.5594), but my R-sq. is only half the size (from 0.43 above to 0.21 with the logs). Is this log transformation therefore a better fit for my data? -linktest- seems not applicable after -xtreg-

Considering linearity it is indeed the case that from looking at scatter plots the relationship between highskill and the regressors is not very linear. Most of them are dummies anyways but even the continous ones do not seem linear.

As I am new to data analysis what I would think of doing next would be to create maybe squared variables of each non-dummy or non-categorical regressor and include it in the regression to see if there is any significance for the squared term? or is there a more efficient way to do that? Or might the above linear transformation do the job, as the misspecification test went better this time, regardless of a lower R-sq.?

Best,

Helen
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#4

30 Jun 2020, 08:02

Helen:
1) thanks for clarifying the difference between your original dataset and the excerpt that you shared with the list;
2) as far as the R-sq is concerned, you should look at the within one as you're using the -fe- estimator (overall R-sq is less meaningful in this rescpect);
3) a correlation about 1 between two variables or coefficient seems really high. I would try with including -turnover- only;
4) squaring continuos predictors (eg, investment) looking for possible turning-points is worh exploring.

Kind regards,
Carlo
(Stata 19.0)
Comment

Helen Hickmann

Join Date: May 2020
Posts: 24

30 Jun 2020, 08:54

Dear Carlo,

Thanks for the information on the appropriate R-sq and dismissing one of the strongly correlated regressors. I have included investment squared and it is significant. As turnover was not significant anymore but 'total' was, i have decided to only include total employees.

Code:

 xtreg highskill investict product_inno process_inno total west industry collective exportshar
> e investment investment_sq rnd tech i.year, fe vce(cluster idnum)
note: west omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =      6402
Group variable: idnum                           Number of groups   =       657

R-sq:  within  = 0.1029                         Obs per group: min =         1
       between = 0.5260                                        avg =       9.7
       overall = 0.4988                                        max =        11

                                                F(20,656)          =         .
corr(u_i, Xb)  = 0.4542                         Prob > F           =         .

                                 (Std. Err. adjusted for 657 clusters in idnum)
-------------------------------------------------------------------------------
              |               Robust
    highskill |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    investict |   .6781545   .2696692     2.51   0.012     .1486355    1.207673
 product_inno |   .2201725   .6099916     0.36   0.718     -.977599    1.417944
 process_inno |  -.2346403   .3503019    -0.67   0.503    -.9224886    .4532079
        total |   .0968363   .0205078     4.72   0.000     .0565675    .1371052
         west |          0  (omitted)
     industry |   .0944955   .1731603     0.55   0.585    -.2455199    .4345109
   collective |  -.4380177   .5091838    -0.86   0.390    -1.437844    .5618089
  exportshare |   .5127848   2.028985     0.25   0.801    -3.471304    4.496873
   investment |   1.31e-07   4.31e-07     0.30   0.762    -7.15e-07    9.77e-07
investment_sq |   4.96e-14   1.31e-14     3.78   0.000     2.39e-14    7.54e-14
          rnd |  -1.466807   .8043435    -1.82   0.069    -3.046206     .112591
         tech |  -.5374448   .2554219    -2.10   0.036    -1.038988   -.0359017
              |
         year |
        2008  |   .0244317   .2850558     0.09   0.932    -.5353002    .5841636
        2009  |   .5002077   .2657605     1.88   0.060    -.0216361    1.022051
        2010  |   .5927328   .4292332     1.38   0.168     -.250104     1.43557
        2011  |   .9605033   .4526084     2.12   0.034     .0717674    1.849239
        2012  |   1.033362   .4275999     2.42   0.016     .1937323    1.872992
        2013  |   .2971908   .5621296     0.53   0.597    -.8065995    1.400981
        2014  |  -.0327852   .5648572    -0.06   0.954    -1.141931    1.076361
        2015  |  -.0607577     .55997    -0.11   0.914    -1.160307    1.038792
        2016  |   .1996527   .4519628     0.44   0.659    -.6878155    1.087121
        2017  |  -.1020679   .5694883    -0.18   0.858    -1.220308    1.016172
              |
        _cons |   4.985016   2.240526     2.22   0.026     .5855491    9.384482
--------------+----------------------------------------------------------------
      sigma_u |   24.22636
      sigma_e |  7.4350321
          rho |  .91392085   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

But the test

Code:

  predict fitted, xb
g sq_fitted=fitted^2
xtreg lnhighskill fitted sq_fitted

again shows misspecification

Code:

. test sq_fitted  

 ( 1)  sq_fitted = 0

           chi2(  1) =   92.14
         Prob > chi2 =    0.0000

If i take a logarithmic transformation of my dependent variable as well as my continous variables, I get the following:

Code:

 xtreg lnhighskill investict product_inno process_inno lntotal west collective lnexportshare l
> ninvestment rnd tech industry i.year, fe vce(cluster idnum)
note: west omitted because of collinearity

Fixed-effects (within) regression               Number of obs      =      1043
Group variable: idnum                           Number of groups   =       198

R-sq:  within  = 0.1016                         Obs per group: min =         1
       between = 0.4145                                        avg =       5.3
       overall = 0.4122                                        max =        11

                                                F(20,197)          =      2.85
corr(u_i, Xb)  = 0.3630                         Prob > F           =    0.0001

                                 (Std. Err. adjusted for 198 clusters in idnum)
-------------------------------------------------------------------------------
              |               Robust
  lnhighskill |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
    investict |    .112618   .0413907     2.72   0.007     .0309923    .1942437
 product_inno |   .0483549   .0393228     1.23   0.220    -.0291928    .1259026
 process_inno |  -.0030902    .045346    -0.07   0.946    -.0925162    .0863357
      lntotal |   .5126123   .1167062     4.39   0.000     .2824584    .7427663
         west |          0  (omitted)
   collective |  -.0049241   .0681019    -0.07   0.942    -.1392264    .1293782
lnexportshare |   .0548722   .0338597     1.62   0.107    -.0119019    .1216462
 lninvestment |    .005109   .0188699     0.27   0.787    -.0321039    .0423219
          rnd |   -.057603   .0670661    -0.86   0.391    -.1898627    .0746566
         tech |  -.0533592   .0265694    -2.01   0.046    -.1057562   -.0009623
     industry |   .0341748   .0256811     1.33   0.185    -.0164704      .08482
              |
         year |
        2008  |   .0244506    .053138     0.46   0.646    -.0803417    .1292429
        2009  |  -.0230317   .0615113    -0.37   0.708    -.1443369    .0982734
        2010  |  -.0086223   .0552691    -0.16   0.876    -.1176173    .1003727
        2011  |   .0435392   .0711816     0.61   0.541    -.0968365    .1839149
        2012  |   .0858316   .0760463     1.13   0.260    -.0641377    .2358009
        2013  |   .0200317   .0731886     0.27   0.785     -.124302    .1643654
        2014  |   .0327455   .0719557     0.46   0.650    -.1091569    .1746478
        2015  |   .1280943   .0697003     1.84   0.068    -.0093602    .2655487
        2016  |   .0727282   .0732099     0.99   0.322    -.0716475     .217104
        2017  |    .054778   .0704262     0.78   0.438     -.084108    .1936641
              |
        _cons |  -.3458275   .6061856    -0.57   0.569    -1.541273    .8496184
--------------+----------------------------------------------------------------
      sigma_u |  1.2328732
      sigma_e |  .38180985
          rho |  .91248488   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

and further

Code:

  test sq_fitted

 ( 1)  sq_fitted = 0

           chi2(  1) =    0.20
         Prob > chi2 =    0.6529

So to me it seems like i should go for the log transformed model, even though it costs me many observations due to zero values for companies not having highskilled employees or having an investment in the previous year. But this now sounds not convincing after just having seen that the squared term of investment was significant. Or is it possible to mix the two, take logs of some and squared terms of investment?

Best,

Helen

Last edited by Helen Hickmann; 30 Jun 2020, 08:58.

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#6

30 Jun 2020, 10:09

Helen:
as per your data, I would go logged, event though it means working with about 20% of the original sample.
It might be that the (seemingly relevant) number of 0 values company causes you problems in your original code.
As an aside, why creating interactions (and/or categorical variables) by hand when -fvvarlist- notation can do it for you?

Kind regards,
Carlo
(Stata 19.0)
Comment
Helen Hickmann

Join Date: May 2020

Posts: 24
#7

30 Jun 2020, 16:31

Dear Carlo,

Thank you very much, I will go for the log transformed model. I will research if there is a way to do a log transformation which will not make me loose all firms who did not invest in the previous year and have a zero value for ‚investment’ as that would bias my results for sure.

As for -fvvarlist- I am not sure what you refer to? I have so far no interaction terms in my model and my categorical variable is ‚tech‘ which is like that due to the survey design. Or do you mean I should build categories using -fvvarlist- of my continuous variables like investment or total employees? But then I cannot no them right? I am a little confused sorry

best

Last edited by Helen Hickmann; 30 Jun 2020, 16:41.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#8

01 Jul 2020, 01:00

Helen:
1) unfortunately, I do not think that you can avoid getting rid of those firms with 0 value for investments, if you log. As a second thought, it may well be that data collectiion (ie, mixing up together firms that invested or not in the previous year) influences the feasibility of data analysis, as it seems that, in order to work on a model that is not misspecified, you should limit the -e(sample)- to those firms that did invest (and I do not think you can do anything about that, if logging is actually the way to go). Conversely, you may want to investigate what happens if you omit the variables with 0 values and log (provide that this second approach gives a fair and true view of the data generating process).
2) in one of your previois posts, you stated

I have included investment squared...

Doesn't a squared term represent an interaction with the linear term and itself?:

Code:

c.investment##c.investment

Last edited by Carlo Lazzaro; 01 Jul 2020, 01:07.

Kind regards,
Carlo
(Stata 19.0)
Comment
Helen Hickmann

Join Date: May 2020

Posts: 24
#9

01 Jul 2020, 02:03

Dear Carlo,

yes I had investment squared to test it’s significance in the original model (before logging) and it was significant. But then I went for the log model instead and when I go for the log model I cannot not-log investment and add it as its normal value or/and as a square term or can I? If that was possible I wouldn’t have to cut my sample if I can avoid logging investment...

Helen
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#10

01 Jul 2020, 02:22

Helen:
yes, squared investment term was in your first code. I took it as ana example to switch to -fvvarlist- notation.
You may want to investigate what happens when you do not log investment in your logged model. Actually it is not mandatory to log all the continuous predictors (in addition to the logged regressand). Obviously, the more you mixed up logged and non-logged predictors, interpreting your results can be difficult.
As an aside, you can also square a logged term (although, taking a look at the 95% CI of -lninvest-, I doubt that a square term is helpful).

Kind regards,
Carlo
(Stata 19.0)
Comment
Helen Hickmann

Join Date: May 2020

Posts: 24
#11

01 Jul 2020, 03:30

Carlo,

That’s good to know. Thank you so much for all the advice! I will try the different versions with the original data set and see what fits best.

have a great day,

Helen
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#12

01 Jul 2020, 03:47

Thanks, you too.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#13

01 Jul 2020, 07:49

I'll try to respond more fully later, but you should not drop the zeros. That is selecting your sample on the basis of y. You need to have a very good reason to do that.

You can always use a linear model to start with, even if your outcome variable is fraction. On top of that, you might try the fractional correlated random effects approach in Papke and Wooldridge (2008, Journal of Econometrics). It is better to use a fraction in a linear model, or a CRE model, than to drop any observation with y = 0. And R-squared has nothing to do with this. You can't compare R-squareds across different dependent variables.

You might even try xtpoisson with the fe and vce(robust) options to allow the zeros, with lntotal as an explanatory variable. This is fully robust, as has been discussed on this site many times. The exponential mean is not ideal because it doesn't impose that skilled workers <= total workers, but if the difference is usually large, it shouldn't matter much. And your linear model doesn't impose that, either.

BTW, the reason "west" drops out is because it doesn't vary over time. You don't need it in the model when you do FE, whether you use a linear model or xtpoisson.

As Carlo said, you shouldn't be testing for normality or heteroskedasticity or even serial correlation. Cluster your standard errors. Your N is plenty large for the asymptotics.

JW
1 like
Comment
Helen Hickmann

Join Date: May 2020

Posts: 24
#14

01 Jul 2020, 09:09

Dear Jeff,

thank you very much for the extensive and very helpful reply! I will do my research on the literature and approaches you suggest to apply them to my data.

best,

Helen
Comment

Helen Hickmann

Join Date: May 2020
Posts: 24

#15

03 Jul 2020, 04:04

Dear Jeff,

I have now done some further research. Many people using a log transformation with zero values in the variable seem to simply add a low constant (mostly 1) to and up with log(1)=0, but I have also read that this can produce bias in the coefficients, which is why I am not eager to do it. Also people are indecisive about just adding one in the cases where the value is zero or to add one to all values. As the variables that I am using which have zero values are my DV high skill (number of high skilled employees) as well as two of my IVs, investment (in EUR) and exportshare (between zero and 1), I do not see big issues with adding 1 EUR etc, to my observations but I am not an expert and therefore would prefer a more acceptable solution.

I have followed your advice trying -xtpoisson, fe vce(robust)- and have received the following result (i have replaced i. year with i.industry due to the result of -testparm- for both):

Code:

 xtpoisson highskill investict product_inno process_inno total i.industry collective exportshare investment rnd tech, f
> e vce(robust)
note: 12 groups (12 obs) dropped because of only one obs per group
note: 111 groups (1143 obs) dropped because of all zero outcomes

Iteration 0:   log pseudolikelihood = -11853.823  
Iteration 1:   log pseudolikelihood = -11548.262  
Iteration 2:   log pseudolikelihood = -11547.692  
Iteration 3:   log pseudolikelihood = -11547.691  

Conditional fixed-effects Poisson regression    Number of obs      =      5247
Group variable: idnum                           Number of groups   =       534

                                                Obs per group: min =         2
                                                               avg =       9.8
                                                               max =        11

                                                Wald chi2(18)      =     65.87
Log pseudolikelihood  = -11547.691              Prob > chi2        =    0.0000

                                                                           (Std. Err. adjusted for clustering on idnum)
-----------------------------------------------------------------------------------------------------------------------
                                                      |               Robust
                                            highskill |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------------------------------------------+----------------------------------------------------------------
                                            investict |    .060396   .0261166     2.31   0.021     .0092085    .1115835
                                         product_inno |   .0268025   .0478654     0.56   0.576    -.0670118    .1206169
                                         process_inno |  -.0248983   .0242651    -1.03   0.305    -.0724571    .0226604
                                                total |   .0013654   .0003804     3.59   0.000     .0006198     .002111
                                                      |
                                             industry |
Mining & quarrying, electricity, gas and water sup..  |  -.7953135   .3598805    -2.21   0.027    -1.500666   -.0899607
                                       Manufacturing  |  -.7077856   .4969981    -1.42   0.154    -1.681884    .2663128
                                        Construction  |  -.1095389   .6873729    -0.16   0.873    -1.456765    1.237687
                                               Trade  |  -.8627884   .2321539    -3.72   0.000    -1.317802   -.4077751
                                           Transport  |  -.2272573    .286954    -0.79   0.428    -.7896769    .3351622
                       Information and Communication  |  -.5411883   .2737929    -1.98   0.048    -1.077812    -.004564
                                  Financial services  |          0  (omitted)
                                      Other services  |  -.2763747   .2497819    -1.11   0.269    -.7659382    .2131888
                   Education, Health and Social Work  |  -.2910454   .3001761    -0.97   0.332    -.8793797    .2972889
                                       Public sector  |  -.3389178    .290392    -1.17   0.243    -.9080756    .2302399
                                                      |
                                           collective |  -.0291069   .0453624    -0.64   0.521    -.1180156    .0598018
                                          exportshare |  -.0539176   .0734882    -0.73   0.463    -.1979519    .0901167
                                           investment |   9.57e-09   4.03e-09     2.37   0.018     1.67e-09    1.75e-08
                                                  rnd |   -.072425   .0405402    -1.79   0.074    -.1518824    .0070324
                                                 tech |  -.0569154   .0234294    -2.43   0.015    -.1028362   -.0109946
-----------------------------------------------------------------------------------------------------------------------

The results look okay, but I do not understand why so many of my observations get dropped here as well? It is half of my total observations. I would really appreciate advice on how to handle it or where my mistake in the application lies. I am also not sure about the interpretation of the results (or are they the same as when using a linear regression, comparing units?) but I guess i will be able to find that out using Google

Also the before discussed misspecification test using fitted values turns out 0.000 again indicating misspecification, but I am not sure this one is applicable for the xtpoisson regression...

Helen

Last edited by Helen Hickmann; 03 Jul 2020, 04:12.

Announcement

Importance of misspecification test vs. R-sq. and consequences of xtsktest

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment