statsby problem

Euslaner

Join Date: Apr 2014

Posts: 219
#1

statsby problem

03 Dec 2015, 09:05

I am trying to use statsby to collect regression coefficients for a data set organized by country over time:

My command is
statsby, by (year): reg Algeria-Zimbabwe

I get a data set of all missing data. I attach the data set. Any help on what I am doing wrong would be appreciated.

year int %8.0g
_stat_1 float %9.0g _b[o.Angola]
_b_Argentina float %9.0g _b[Argentina]
_b_Australia float %9.0g _b[Australia]
_b_Austria float %9.0g _b[Austria]
_stat_5 float %9.0g _b[o.Bangladesh]
_stat_6 float %9.0g _b[o.Belgium]
_stat_7 float %9.0g _b[o.Benin]
_stat_8 float %9.0g _b[o.Brazil]
_stat_9 float %9.0g _b[o.Bulgaria]
_stat_10 float %9.0g _b[o.Cameroon]
_stat_11 float %9.0g _b[o.Canada]
_b_Chile float %9.0g _b[Chile]
_stat_13 float %9.0g _b[o.China]
_stat_14 float %9.0g _b[o.Costa_Rica]
_stat_15 float %9.0g _b[o.Cote_d_Ivoire]
_b_Cuba float %9.0g _b[Cuba]
_stat_17 float %9.0g _b[o.Denmark]
_stat_18 float %9.0g _b[o.Dominican_Republic]
_stat_19 float %9.0g _b[o.Egypt]
_stat_20 float %9.0g _b[o.El_Salvador]
_stat_21 float %9.0g _b[o.Ethiopia]
_stat_22 float %9.0g _b[o.Finland]
_b_France float %9.0g _b[France]
_stat_24 float %9.0g _b[o.Ghana]
_stat_25 float %9.0g _b[o.Greece]
_stat_26 float %9.0g _b[o.Guatemala]
_stat_27 float %9.0g _b[o.Honduras]
_stat_28 float %9.0g _b[o.Hungary]
_stat_29 float %9.0g _b[o.India]
_stat_30 float %9.0g _b[o.Indonesia]
_stat_31 float %9.0g _b[o.Iran]
_stat_32 float %9.0g _b[o.Iraq]
_stat_33 float %9.0g _b[o.Ireland]
_stat_34 float %9.0g _b[o.Italy]
_b_Jamaica float %9.0g _b[Jamaica]
_b_Japan float %9.0g _b[Japan]
_stat_37 float %9.0g _b[o.Kenya]
_stat_38 float %9.0g _b[o.Madagascar]
_stat_39 float %9.0g _b[o.Malawi]
_stat_40 float %9.0g _b[o.Malaysia]
_stat_41 float %9.0g _b[o.Mali]
_stat_42 float %9.0g _b[o.Mexico]
_stat_43 float %9.0g _b[o.Morocco]
_stat_44 float %9.0g _b[o.Mozambique]
_stat_45 float %9.0g _b[o.Myanmar]
_stat_46 float %9.0g _b[o.Netherlands]
_b_New_Zealand float %9.0g _b[New_Zealand]
_stat_48 float %9.0g _b[o.Nicaragua]
_stat_49 float %9.0g _b[o.Niger]
_stat_50 float %9.0g _b[o.Nigeria]
_stat_51 float %9.0g _b[o.Norway]
_b_Pakistan float %9.0g _b[Pakistan]
_stat_53 float %9.0g _b[o.Panama]
_stat_54 float %9.0g _b[o.Paraguay]
_stat_55 float %9.0g _b[o.Peru]
_stat_56 float %9.0g _b[o.Philippines]
_stat_57 float %9.0g _b[o.Portugal]
_b_Russia float %9.0g _b[Russia]
_stat_59 float %9.0g _b[o.Senegal]
_stat_60 float %9.0g _b[o.Sierra_Leone]
_b_South_Africa float %9.0g _b[South_Africa]
_b_South_Korea float %9.0g _b[South_Korea]
_stat_63 float %9.0g _b[o.Spain]
_stat_64 float %9.0g _b[o.Sudan]
_stat_65 float %9.0g _b[o.Sweden]
_stat_66 float %9.0g _b[o.Switzerland]
_stat_67 float %9.0g _b[o.Syria]
_b_Thailand float %9.0g _b[Thailand]
_stat_69 float %9.0g _b[o.Tunisia]
_stat_70 float %9.0g _b[o.Turkey]
_stat_71 float %9.0g _b[o.UK]
_stat_72 float %9.0g _b[o.USA]
_stat_73 float %9.0g _b[o.Uganda]
_stat_74 float %9.0g _b[o.Uruguay]
_stat_75 float %9.0g _b[o.Venezuela]
_stat_76 float %9.0g _b[o.West_Germany]
_stat_77 float %9.0g _b[o.Zimbabwe]
_b_cons float %9.0g _b[_cons]

Typical distbribution:

. clist _b_Russia

_b_Russia
1. .
2. .
3. .
4. .
5. .
6. .
7. .
8. .
9. .
10. .
11. .
12. .
13. .
14. .
15. .

Attached Files

morrisoneducationlong.dta (81.3 KB, 1 view)
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

03 Dec 2015, 09:16

So, I see a couple of different problems here.

First, your data set contains only a single observation per year, so each of the yearly regressions you have asked -statsby- to do has only 1 observation to work with. If you were to run any one of these regressions, say, -regress Algeria-Zimbabwe if year == 2010-, you will see that you get an error message, "insufficient observations" and no regression output. In order to run a regression you need to have more observations than variables!

Next, although I don't know what your data are about, -regress Algeria-Zimbabwe- seems like a strange analysis to do (even if you have enough observations). You are regressing the value of the Algeria variable (as outcome) against the values of the Angola through Zimbabwe variables. I suppose that is possibly meaningful, but it does seem odd. More likely, you intended to have some other outcome variable regressed against all of Algeria through Zimbabwe, no? (I must confess, however, that looking at the names of the variables in your data, none of them immediately suggests itself as a candidate for outcome variable.)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#3

03 Dec 2015, 09:18

The usual rules about variable lists imply that you are regressing the first named variable Algeria against all the others up to Zimbabwe, namely Angola to Zimbabwe.

As you also want to do that separately for each year, that implies regressions each with 1 observation, dozens of predictors.

I guess that you don't want to predict Algeria from everywhere else even in principle.

It's thus clear that you can't want what you are asking Stata to do.

The dataset needs reshaping and (if needed) you need to tell us what you really want.

Last edited by Nick Cox; 03 Dec 2015, 09:21.
Comment
Euslaner

Join Date: Apr 2014

Posts: 219
#4

03 Dec 2015, 09:47

I am not going to get into a discussion with Clyde about my thoeretical argument. Suffice it to say that I know what I want to do; The data set has one observation per year for each country. What I am trying to do is to run a set of regressions for each country with year (there are 15 years) and then to collect the coefficidents for each country, so the outcome would be a regression coefficient for each country over time (the 15 years in the data set). I do not want to regress Algeria against Zimbabwe (or any other country), but to regress each country against year and then to collect the coefficients.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#5

03 Dec 2015, 10:00

So, then what you want to do requires rather different code. Your data set needs to be reorganized into long layout first.

Code:

rename Algeria-Zimbabwe value= reshape long value, i(year) j(country) string statsby, by(country): regress value year

By the way, I have no issues at all with what you want to do. I was just pointing out that the code you gave in #1 does something that looks rather strange. As it turns out that is because that code did not reflect your intent, not because your intent requires some exotic theoretical justification.
Comment
Euslaner

Join Date: Apr 2014

Posts: 219
#6

03 Dec 2015, 10:05

When I used Clyde's statsby command after the rename and reshape commands I got the error message:

no; data in memory would be lost
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#7

03 Dec 2015, 10:12

You should therefore save the current version of the dataset first.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#8

03 Dec 2015, 10:15

Right, sorry about that. If you want to replace the data currently in memory with the dataset of regression coefficients, make that last command -statsby, by(country) clear: regress value year-. If you want to keep the original data in memory, then you need to save the coefficient data in another file. For that, it's -statsby, by(country) saving(filename, replace): regress value year-. (Substitute your desired name for filename.) Then when you want to finally bring those into memory, -use filename, clear-.
Comment
Euslaner

Join Date: Apr 2014

Posts: 219
#9

03 Dec 2015, 12:15

Thanks much, both Nick and esp. Clyde. One more question: I have three variables now: country, yearcoeff, and constantcoeff (the latter two renamed). I want to integrate these data into the original data set, organized by country. I can't do a simple reshape, since reshape wide (----),i(yearcoeff constantcoeff) j(year) doesn't have anything for me to put in the (---). I know that this should be elementary but I can't figure it out.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36053

#10

03 Dec 2015, 12:21

As I understand it, starting with the dataset you posted, the sequence should be something like

Code:

rename Algeria-Zimbabwe value=
reshape long value, i(year) j(country) string
save maindata 
statsby, by(country): regress value year  
rename (_b_y _b_c) (yearcoeff constantcoeff) 
merge 1:m country using maindata

Comment

Euslaner

Join Date: Apr 2014

Posts: 219
#11

03 Dec 2015, 12:36

Thanks, NIck, but that gives me data in long format. I need the data in wide format to add to the data set originally posted. I need, as I posted, to reshape wide, not long
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#12

03 Dec 2015, 13:02

Indeed. You did say that. My prejudice that the original data structure was too awkward to be useful was obscuring your request.

So, if I understand you correctly:

1. The original data are organized with a variable for each country.

2. You want the new information to be added consistently with that structure.

3. That means adding the two coefficients in two new observations aligned with the countries, spreadsheet style.

As you say, the names in the original dataset are a long way from the results dataset. But the problem is just to add a 78 x 2 matrix of coefficients in the right places. Here's a way to do all that:

Code:

use morrisoneducationlong, clear rename Algeria-Zimbabwe value= reshape long value, i(year) j(country) string save maindata, replace statsby, by(country): regress value year rename (_b_y _b_c) (yearcoeff constantcoeff) mkmat *coeff in 1/78 , matrix(coeff) use morrisoneducationlong.dta , clear local N = _N + 2 local Nm1 = `N' - 1 set obs `N' unab myvars: Alge-Zim tokenize "`myvars'" quietly forval j = 1/78 { replace ``j'' = coeff[`j', 1] in `Nm1' replace ``j'' = coeff[`j', 2] in L }
Comment
Euslaner

Join Date: Apr 2014

Posts: 219
#13

03 Dec 2015, 14:33

Thanks much, Nick.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment