r(2000) no observations when running a regression

Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#16

13 Dec 2016, 13:43

Well, first of all, your -describe- command does not tell you anything about the number of non-missing observations. It only gives information about storage type, labeling, and formatting. To get a handle on missing values in your data, a first step might be -summarize-.

In any case, evidently your data is not what you think it is. When Stata says "no observations," I have known it to be wrong about that. But since it appears that your -count if !missing- command says you have 234 observations for which whatever variable you counted has non-missing values, let me suggest another possibility. It may be that one of your regression variables is stored as a string variable. If so, for the purposes of a regression command, it counts as having all values missing. Here the -describe- command is helpful. Run it with all your regression variables and see if one of them is a string. If that is the case, you will need to convert it to numeric.

Last edited by Clyde Schechter; 13 Dec 2016, 13:44. Reason: Correct grammatical error.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35693
#17

13 Dec 2016, 13:44

Tell us about the results of typing

Code:

tsset

and show us the exact regression command you used and give us an example of your data using dataex (SSC). See also FAQ Advice #12.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#18

13 Dec 2016, 22:53

Randa:
welcome to the list.
Please act on Friedrich's helpful advice.

Kind regards,
Carlo
(Stata 19.0)
Comment
Pete Yeager

Join Date: Mar 2018

Posts: 6
#19

09 Mar 2018, 15:00

Good afternoon,

I am posting here due to an r(2000) error. Please advise if you recommend I post this question elsewhere. Thank you.

I am new to Stata, but I think my problem has more to do with being new to statistics.

As a class assignment I am attempting to repeat the regression results from an article. In this case, the article is: Erik Gartzke, "The Relevance of Power in International Relations," unpublished, 20 November 2009, accessed 28 February 2018, https://bc.sas.upenn.edu/system/file...e_03.04.10.pdf. Gartzke argues that geography matters when determining whether states go to war, suggesting that "weak states are less likely to fight in distant dyads, while capable countries do the opposite, increasing conflict behavior as distance increases." He uses the software program EUGene to draw on the Correlates of War and Polity III datasets to create a unique dataset of country dyads for all years 1816 to 2000. For each dyad in each year, the dataset reports the existence of a militarized interstate dispute and the associated level, or its absence. In his first regression, Gartzke uses a dichotomous dependent variable to indicate the presence or absence of a MID (mzmid). His independent variables include a measure of power for each state in each dyad for each year (cap_1 and cap_2), the logarithm of the distance between the capitals of the country dyads (logdistance), a measure of contiguity (contig--whether the states border one another and the extent to which they do). Because his DV is binary, he uses a probit regression.

Owing to concerns over temporal dependence, he attempts to control for this by creating a new variable as a cubic spline. The new variable measures the number of years between MIDs for each dyad. To do this, he relies on the advice of Neal Beck, Jonathon Katz, and Richard Tucker. "Taking Time Seriously: Time-series-Cross-section Analysis with a Binary Dependent Variable," American Journal of Political Science 42, No. 4, 1998. I adapted code suggested by Nick Cox to create the cubic spline variable, peaceyears:
egen countrypair = group(ccode*), label

tsset countrypair year

tsspell, cond(mzmid < 1)

generate peaceyears = _seq

mkspline knot = peaceyears, cubic

However, when I introduce "peaceyears" and its associated knots into the regression I receive the r(2000) error. This is my probit regression code:

probit mzmid cap_1 cap_2 logdistance contig peaceyears knot1 knot2 knot3 knot4

This is my output error:

outcome = peaceyears > 0 predicts data perfectly
r(2000);

I checked to see if r(2000) was the result of a string variable, but that is not the case:

. describe mzmid cap_1 cap_2 logdistance contig peaceyears knot1 knot2 knot3 knot4

storage display value
variable name type format label variable label
------------------------------------------------------------------------------------------------------------------
mzmid byte %8.0g
cap_1 float %9.0g
cap_2 float %9.0g
logdistance float %9.0g
contig byte %8.0g
peaceyears float %9.0g
knot1 float %9.0g
knot2 float %9.0g
knot3 float %9.0g
knot4 float %9.0g

I also eliminated peaceyears, which resulted in the same error for knot1. Dropping knot1 and rerunning the probit regression produces this:

. probit mzmid cap_1 cap_2 logdistance contig knot2 knot3 knot4

note: knot2 != 0 predicts failure perfectly
knot2 dropped and 596157 obs not used

The dataset has 656,415 observations, so dropping 596,157 is problematic.

I would be most grateful for someone pointing out where I have gone astray in this analysis. Please advise if additional information would be useful. Many, many thanks.

Pete Yeager
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#20

09 Mar 2018, 15:58

The way you defined peaceyears, it is always 0 when mzid = 1, and it is always > 0 when mzid = 0. Consequently, you can predict with 100% certainty that if peaceyears > 0, mzid = 0, and when peaceyears = 0, mzid = 1. So peaceyears is a perfect predictor of mzid. Now, in a probit (or logistic) regression, the maximum likelihood estimate of the coefficient of a variable that predicts perfectly like this is infinity. This is a well known limitation of these regression models. So, Stata (and most other software packages) resolve this problem by eliminating all the observations that can be perfectly predicted and dropping the variable from the model. The problem is that in this case every observation can be perfectly predicted! That's why there are no observations left in the estimation sample.

Bottom line is that you cannot use peaceyears as a predictor for your model. The same logic applies to the cubic spline variables built on peacyears: as the spline variables themselves are just transforms of peaceyears, they, too, are perfect predictors of mzid.

I think your definition of peaceyears isn't appropriate for your purposes in any case. I think what you want is to set peaceyears to equal the number of years without mzid = 1 that preceded the current observation since the last observation with mzid = 1. Otherwise put, you want your spells to begin not in a year of war but in the non-year immediately following a war. I've made up some data to illustrate the technique here:

Code:

clear input float(mzid year countrypair) 0 1991 1 0 1992 1 0 1993 1 0 1994 1 1 1995 1 1 1996 1 1 1997 1 0 1998 1 0 1999 1 0 2000 1 0 2001 1 0 2002 1 0 2003 1 0 2004 1 0 2005 1 1 2006 1 1 2007 1 0 2008 1 0 2009 1 0 2010 1 end tsset countrypair year by countrypair (year), sort: gen spell_of_peace = sum(mzid == 0 & mzid[_n-1] == 1) by countrypair spell_of_peace (year), sort: gen years_of_peace = _n-1 replace years_of_peace = 0 if mzid == 1 & mzid[_n-1] == 1

You may need to play around with the definition of years_of_peace to get just what you need. It isn't obvious how to define this variable during war time. Here I simply decided that it should be zero for all but the first year of war. But you should think about that and, if necessary, change the code.

In the future, when requesting help with a problem it is best to show example data so that those who want to help you can replicate your problem and then try to fix your code. The way to do that is with the -dataex- command, as I have done in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Pete Yeager

Join Date: Mar 2018

Posts: 6
#21

10 Mar 2018, 05:23

Thank you, Dr. Schechter. I am grateful for the insight and the advice. I will use -dataex- to post my data as soon as I work through your recommendation.

To explain a bit--I am using EUGene to try to reproduce Gartzke's original dataset. I assume you are familiar with it, but in the event you are not, it is a software program that offers a number of options to customize a dataset that is drawn from databases such as COW, Polity III, etc. In EUGene, you can suppress the years of a MID that follow the first year, so I had EUGene drop all dyad years with an ongoing conflict (MID or Crisis). I did this thinking that I was interested in the years of peace between MIDs, not the years of war. This seems to be what you are driving at in your example as well, though presumably with the expectation that the dataset is complete (no suppressed dyad years). So do you think it is better to generate a new dataset with all years, and then try your approach? Or should it work with the years of war already suppressed?

That is why I tried the mzmid < 1, since I figured that any value greater than zero was a MID year (including any missing value, which I understand is greater than any positive number), therefore I thought Stata would tally the number of zeroes in each dyad, adding one for each year, then rest when it arrived at a MID ("1), starting over with the next observation, which should be a zero.

One question as relates to your code: what distinction are you making between years_of_peace and spell_of_peace? peaceyears was meant to be the running tally of years between MIDs.

One question relating to stata, do you think tsspell is inappropriate for what I am trying to do, or do you just not prefer it?

Thank you again for your kind reply.
Regards, Pete
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#22

10 Mar 2018, 06:18

Randa:
what's the format of your observations? Are they all numerical? Do you have -string-s?

Kind regards,
Carlo
(Stata 19.0)
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#23

10 Mar 2018, 06:28

Pete:
I was wondering whether a (tweaked) gravity model (https://en.wikipedia.org/wiki/Gravity_model_of_trade) was feasible with your data

Kind regards,
Carlo
(Stata 19.0)
Comment

Pete Yeager

Join Date: Mar 2018
Posts: 6

#24

10 Mar 2018, 07:53

Dr. Schechter:

Thanks again. The probit regression worked without error after adjusting the code. Here's my adaptation of your recommended code:

tsset countrypair year
by countrypair (year), sort: gen peaceyears = sum(mzmid == 0 & mzmid[_n-1] == 1)
by countrypair peaceyears (year), sort: gen years_of_peace = _n-1
replace years_of_peace = 0 if mzmid == 1 & mzmid[_n-1] == 1
mkspline knot = years_of_peace, cubic
probit mzmid cap_1 cap_2 logdistance contig years_of_peace knot1 knot2 knot3 knot4
probit mzmid cap_1 cap_2 logdistance contig knot1 knot2 knot3 knot4

The new spline variable knot1 is perfectly collinear with years_of_peace, so stata dropped knot1 from the first regression (line 6). I dropped years_of_peace since Gartzke mentions four spline variables. This makes sense now. Here's the dataex output (is 100 lines overkill?):

Code:

clear
input byte mzmid float(cap_1 cap_2 logdistance) byte contig float(knot1 knot2 knot3 knot4)
0 .289566 .010128        . 1  0           0           0           0
0 .253229 .010542        . 1  1           0           0           0
0 .255595 .008408        . 1  2 .0001524158           0           0
0 .272078 .009861        . 1  3 .0012193263           0           0
0 .253591 .008895        . 1  4  .004115226           0           0
0 .254046 .008704        . 1  5   .00975461           0           0
0 .263268  .00924        . 1  6  .019051975           0           0
0 .238627  .00937        . 1  7   .03292181           0           0
0 .240072 .009702        . 1  8   .05227862           0           0
0 .239793 .009803        . 1  9   .07803688           0           0
0  .22811 .009788        . 1 10   .11111111 .0001524158           0
0 .216016 .009185        . 1 11    .1524158 .0012193263           0
0 .197177 .008633        . 1 12    .2028654  .004115226           0
0  .20514 .008431        . 1 13   .26337448   .00975461           0
0 .197443 .008908        . 1 14    .3348575  .019051975           0
0 .193246 .008736        . 1 15    .4182289   .03292181           0
0 .205985 .008648        . 1 16    .5144033   .05227862           0
0   .2009 .008926        . 1 17    .6242951   .07803688           0
0 .170771 .008717        . 1 18    .7488188   .11111111           0
0 .181971 .009091        . 1 19    .8888889    .1524158           0
0 .201907 .012059        . 1 20   1.0454199    .2028654 .0001524158
0 .244495 .015079        . 1 21   1.2193264   .26337448 .0012193263
0 .285455 .017144        . 1 22   1.4115226    .3348575  .004115226
0 .345632 .018521        . 1 23   1.6229234    .4182289   .00975461
0 .350642 .016975        . 1 24    1.854443    .5144033  .019051975
0 .383864 .017042        . 1 25   2.1069958    .6242951   .03292181
0 .363988 .015644        . 1 26   2.3814967    .7488188   .05227862
0 .309942 .013188        . 1 27     2.67886    .8888889   .07803688
0  .29466 .012593        . 1 28           3   1.0454199   .11111111
0 .273166 .012813        . 1 29   3.3458314   1.2193264    .1524158
0 .284443 .012605        . 1 30    3.717269   1.4115226    .2028654
0 .319499 .013535        . 1 31   4.1152263   1.6229234   .26337448
0 .311367 .014204        . 1 32    4.540619    1.854443    .3348575
0 .311158  .01422        . 1 33    4.994114   2.1067734    .4180369
0  .28094 .013341        . 1 34    5.475391   2.3797164    .5128669
0 .266422 .013189        . 1 35    5.983883    2.672852    .6191099
0 .260614 .013736        . 1 36    6.519024    2.985758     .736528
0 .255049 .013208        . 1 37    7.080247   3.3180156    .8648834
0  .23447 .012597        . 1 38    7.666984    3.669203   1.0039384
0 .228965 .012828        . 1 39    8.278667   4.0388994   1.1534553
0 .215444 .012445        . 1 40    8.914733    4.426685   1.3131962
0 .210655 .012568        . 1 41    9.574611    4.832139   1.4829234
0 .209911  .01263        . 1 42   10.257735    5.254839    1.662399
0 .207966 .012449        . 1 43    10.96354    5.694367   1.8513855
0 .202983 .012759        . 1 44   11.691456      6.1503    2.049645
0 .201539 .012747        . 1 45   12.440918    6.622219   2.2569394
0 .208669 .012185        . 1 46   13.211358    7.109703   2.4730315
0 .208548 .012023        . 1 47    14.00221     7.61233    2.697683
0 .203926 .012102        . 1 48   14.812906    8.129682    2.930657
0 .197547 .011406        . 1 49    15.64288    8.661335    3.171715
0 .179844 .012009        . 1 50   16.491566    9.206871    3.420619
0 .168882 .012175        . 1 51   17.358393    9.765868    3.677132
0 .163286 .012475        . 1 52   18.242798   10.337906    3.941015
0 .157198 .012432        . 1 53    19.14421   10.922564    4.212032
1 .151321  .01258        . 1 54    20.06207    11.51942   4.4899435
1 .142502  .01246        . 1  0           0           0           0
0 .140885 .012246        . 1  0           0           0           0
0 .140172 .012242        . 1  1           0           0           0
0 .138162 .012156        . 1  2 .0001524158           0           0
1 .135674 .012248        . 1  3 .0012193263           0           0
0 .131705 .012263        . 1  0           0           0           0
0 .136528 .012239        . 1  1           0           0           0
0 .128161 .011648        . 1  2 .0001524158           0           0
0 .131504 .011828        . 1  3 .0012193263           0           0
0  .13134 .011837        . 1  4  .004115226           0           0
0 .133253  .01141        . 1  5   .00975461           0           0
0 .132095 .011508        . 1  6  .019051975           0           0
0 .131279 .011733        . 1  7   .03292181           0           0
0 .132756 .011867        . 1  8   .05227862           0           0
1 .146693 .012388        . 1  9   .07803688           0           0
0 .139376 .011606        . 1  0           0           0           0
1 .135599 .012365        . 1  1           0           0           0
0 .146081 .012441        . 1  0           0           0           0
0  .15271 .012425        . 1  1           0           0           0
0 .144825  .01202        . 1  2 .0001524158           0           0
0 .140641 .011727        . 1  3 .0012193263           0           0
0 .138339 .011687        . 1  4  .004115226           0           0
1  .13966   .0116        . 1  5   .00975461           0           0
0 .141734 .011851        . 1  0           0           0           0
0 .142888 .011898        . 1  1           0           0           0
0 .142951  .01182        . 1  2 .0001524158           0           0
0 .157198 .000321 6.863803 4  0           0           0           0
0 .151321  .00027 6.863803 4  1           0           0           0
0 .142502 .000261 6.863803 4  2 .0001524158           0           0
0 .140885 .000212 6.863803 4  3 .0012193263           0           0
0 .140172 .000316 6.863803 4  4  .004115226           0           0
0 .138162 .000202 6.863803 4  5   .00975461           0           0
0 .135674 .000179 6.863803 4  6  .019051975           0           0
0 .131705 .000233 6.863803 4  7   .03292181           0           0
0 .136528 .000192 6.863803 4  8   .05227862           0           0
0 .128161 .000207 6.863803 4  9   .07803688           0           0
0 .131504 .000186 6.863803 4 10   .11111111 .0001524158           0
0  .13134 .000122 6.863803 4 11    .1524158 .0012193263           0
0 .133253 .000053 6.863803 4 12    .2028654  .004115226           0
0 .132095 .000034 6.863803 4 13   .26337448   .00975461           0
0 .131279 .000034 6.863803 4 14    .3348575  .019051975           0
0 .132756 .000046 6.863803 4 15    .4182289   .03292181           0
0 .146693 .000058 6.863803 4 16    .5144033   .05227862           0
0 .139376 .000051 6.863803 4 17    .6242951   .07803688           0
0 .135599 .000043 6.863803 4 18    .7488188   .11111111           0
end

Here's the regression output:
Probit regression
Prob > chi2 = 0.0000
LR chi2(8) = 3555.77
Number of obs = 637,602
Log likelihood = -6396.0719
Pseudo R2 = 0.2175
------------------------------------------------------------------------------
mzmid | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cap_1 | 4.372506 .1248869 35.01 0.000 4.127732 4.61728
cap_2 | 5.866176 .1964357 29.86 0.000 5.481169 6.251183
logdistance | -.1284588 .0125706 -10.22 0.000 -.1530967 -.1038208
contig | -.3083733 .0145139 -21.25 0.000 -.33682 -.2799266
knot1 | -.0819593 .0057221 -14.32 0.000 -.0931745 -.0707441
knot2 | 1.449203 .1758871 8.24 0.000 1.104471 1.793936
knot3 | -2.726909 .3932674 -6.93 0.000 -3.497699 -1.956119
knot4 | 1.369469 .2883075 4.75 0.000 .8043966 1.934541
_cons | .1521329 .0931749 1.63 0.103 -.0304866 .3347524
------------------------------------------------------------------------------

Now I need to figure out why my results are so different than Gartzke's! My coefficients are greater than his except for contig by about 17.5%.

Regards,
Pete

Comment

Pete Yeager

Join Date: Mar 2018

Posts: 6
#25

10 Mar 2018, 08:06

Hi Carlo:

From what I read it sounds very similar to Gartzke's view. What Gartzke discovered is that the likelihood of a MID decreases with distance, except in the case of a great power. Great powers, it seems, have global interests and the capacity to act on those interests. Weaker powers "suffer what they must," to borrow from Thucydides' Melian dialogue. The gravity model assumes that trade diminishes with distance owing to the costs to move goods. Today, though, both models may suffer from the extent to which goods and services, or power, are expressed through digital means, which do not rely so much on proximity for their effectiveness.

I think the gravity model could be applied to this dataset, though with the exchange of economic mass for power. Gartzke looks beyond distance and contiguity, as well, though, adding explanatory variables in a series of iterative regressions. The one above is just his first. Some of the additional factors he evaluates include contiguity (of borders, if any), whether the states share an alliance, and the level of democracy in each state. But EUGene could easily produce a dyadic dataset with these and other variables to explore your insight.

Regards,
Pete
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#26

10 Mar 2018, 08:20

Pete:
if you eventually decide to go gravity, I would recommend you to take a look at Joao Santos Silva's replies on gravity model-related topic

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#27

10 Mar 2018, 10:31

One question as relates to your code: what distinction are you making between years_of_peace and spell_of_peace? peaceyears was meant to be the running tally of years between MIDs.

spell_of_peace is a variable the counts of consecutive eras of continuous peace. years_of_peace counts up the running years of peace within each spell. So, in the demonstration code shown in #20, the years 1991 through 1997 are a continuous spell of peace, and being the first such in the demonstration data, they are spell #0. Then within that spell, years_of_peace counts up from 0 in 1991 through 1995, at which point a war begins, so from 1996 through the end of that war, years_of_peace dropped to 0.

One question relating to stata, do you think tsspell is inappropriate for what I am trying to do, or do you just not prefer it?

There is nothing wrong with -tsspell-. In fact, it's a great program. It's just that I personally find it easier to do what it does from "first principles" than to remember all of its options, their syntax, and how they work. Your mileage may vary.

More generally, I'm not qualified to advise you about how to structure your model. I'm an epidemiologist. I've never heard of many of the things you mention in describing your problem, and I have no ideas worth sharing about the determinants of armed conflict. Those are content areas. Hopefully you have colleagues in your discipline you can turn to for advice about those aspects. I'm offering advice only on statistics and using Stata. There may be other people in your discipline who are active on this Forum and can offer you guidance about the content areas. But if you would like that to happen, then I think you need to divert this to a new thread. The title of this thread involves the -r(2000) no observations- error message. That topic is not likely to draw the interest of somebody who wants to engage in a discussion of the determinants of war and peace. So if there are Forum users who could be helpful about those aspects, they are probably skipping over this post, unaware of the opportunity.
1 like
Comment
ayse demir

Join Date: Dec 2018

Posts: 2
#28

10 Dec 2018, 09:25

Hi, I have 7 regional dummies, I am running 3sls and trying to do analysis at regional level, when do regression for region 1(if regionn==region1) there is no problem, however for other 6 regions I am getting this error : no observations. I have checked data there is no missing values, how can I sort this issue
Comment
ayse demir

Join Date: Dec 2018

Posts: 2
#29

10 Dec 2018, 09:28

Hi, I have 7 regional dummies, I am running 3sls and trying to do analysis at regional level, when do regression for region 1(if regionn==region1) there is no problem, however for other 6 regions I am getting this error : no observations. I have checked data there is no missing values, how can I sort this issue?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30091
#30

10 Dec 2018, 09:49

I doubt anybody will be able to help you without seeing the exact code you are running and an example of your data the illustrates the problem. I have never once known Stata to be wrong when it says "no observations," so if you are not seeing missing observations directly, then there is something in your code that is leading to the exclusion of all observations in the later regions. Without seeing the actual code and data, one can only guess what that might be.

Before reposting, please be sure to review FAQ #12 for guidance on the best ways to show Stata code and results here (code delimiters) and the best way to show example data (-dataex-).
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment