Extract regression results using loop

Sonnen Blume

Join Date: Aug 2018
Posts: 342

Extract regression results using loop

20 Oct 2021, 10:52

Hi,

I am running the following analysis for 18 countries, and I want to make a chart that shows the ORs and 95%CIs for each country. How can I extract the ORs and 95%CIs by country as a list?

Code:

levelsof country, local(country)
foreach i of local country {
  ologit happiness i.age if country == `i'
 
}

Code:

* Example generated by -dataex-. To install: ssc install    dataex
clear
input long(happiness age country)
. 2 22
3 3  8
1 3 14
3 2 15
3 4 15
2 6  4
. 1 17
3 6 23
. . 16
. 7 17
. . 17
1 5 14
. 6  7
1 1  3
2 6  2
. 5  7
. 5  7
1 3 16
. 3 17
. 4 22
4 5 14
2 6 14
. 2 22
2 5 23
2 3 21
3 4  9
1 6 14
2 4  2
2 5  2
3 4 15
. 1 12
2 4 15
. 6 22
3 1 19
. 2 17
1 3  4
1 1  4
2 1 18
. 5  7
2 7  2
2 3  2
2 5 19
. 1  7
. 1 12
2 1 14
. 4 17
2 1 19
2 5 14
1 4 14
1 7 11
4 5 15
3 5  2
. . 14
5 7 19
3 2  8
2 5 14
. 5  6
2 2 16
2 4 18
2 4  4
5 7  4
2 2 14
3 3  8
1 1 16
2 3 13
. 2 22
2 1  4
2 3 14
. 5 12
. . 23
. 7 17
1 1 19
. . 14
1 3 21
. 7  7
4 2 20
1 1  9
. 3  7
. .  8
3 5 23
. 7  6
. 7  7
3 7  9
1 7  8
. 5  7
2 6 14
2 3 15
1 1 14
2 1 18
2 7 23
5 4  2
1 6 14
4 7 19
1 6  5
2 1 14
1 4 11
. 7  7
2 2 14
. 5 17
2 5 14
end
label values happiness LS1
label def LS1 1 "Very Happy", modify
label def LS1 2 "Somewhat Happy", modify
label def LS1 3 "Neither Happy Nor Unhappy", modify
label def LS1 4 "Somewhat Unhappy", modify
label def LS1 5 "Very Unhappy", modify
label values Age WAGE
label def WAGE 1 "15-19", modify
label def WAGE 2 "20-24", modify
label def WAGE 3 "25-29", modify
label def WAGE 4 "30-34", modify
label def WAGE 5 "35-39", modify
label def WAGE 6 "40-44", modify
label def WAGE 7 "45-49", modify
label values country cn
label def cn 2 "Chad", modify
label def cn 3 "Gambia", modify
label def cn 4 "Ghana", modify
label def cn 5 "Guinea Bissau", modify
label def cn 6 "Guyana", modify
label def cn 7 "Honduras", modify
label def cn 8 "Lesotho", modify
label def cn 9 "Nepal", modify
label def cn 11 "Tonga", modify
label def cn 12 "Turkemenistan", modify
label def cn 13 "Tuvalu", modify
label def cn 14 "Bangladesh", modify
label def cn 15 "Congo", modify
label def cn 16 "Iraq", modify
label def cn 17 "Madagascar", modify
label def cn 18 "Palestine", modify
label def cn 19 "Sierra Leon", modify
label def cn 20 "Suriname", modify
label def cn 21 "Togo", modify
label def cn 22 "Tunisia", modify
label def cn 23 "Zimbabwe", modify

copy up to and including the previous line - ----------------

Listed 100 out of 17320 observations
Use the count() option to list more

.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

20 Oct 2021, 13:12

Code:

capture program drop one_country program define one_country capture noisily ologit happines i.age, or if c(rc) == 0 { // SUCCESSFUL REGRESSION matrix M = r(table) levelsof age, local(ages) foreach a of local ages { gen or`a' = M["b", "happiness:`a'.age"] gen ll`a' = M["ll", "happiness:`a'.age"] gen ul`a' = M["ul", "happiness:`a'.age"] } } else { gen error_code = c(rc) } exit end runby one_country, by(country) status levelsof age, local(ages) foreach a of local ages { order or`a' ll`a' ul`a', last }

This code works poorly with the example data because for many of the countries there are either too few observations to do the regression at all, or many/most of the levels of the age variable are not instantiated in the country. There are also some where -ologit- fails to converge. So the results look ratty. But they are nonetheless correct given the data.

If in your real data there have similar problems, the results here contain a variable error_code which will give Stata's error code associated with ologit. That variable has missing values when the regression proceeds successfully.

-runby- is written by Robert Picard and me, and is available from SSC.
1 like
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#3

20 Oct 2021, 17:03

Originally posted by Clyde Schechter View Post

Code:

capture program drop one_country program define one_country capture noisily ologit happines i.age, or if c(rc) == 0 { // SUCCESSFUL REGRESSION matrix M = r(table) levelsof age, local(ages) foreach a of local ages { gen or`a' = M["b", "happiness:`a'.age"] gen ll`a' = M["ll", "happiness:`a'.age"] gen ul`a' = M["ul", "happiness:`a'.age"] } } else { gen error_code = c(rc) } exit end runby one_country, by(country) status levelsof age, local(ages) foreach a of local ages { order or`a' ll`a' ul`a', last }

This code works poorly with the example data because for many of the countries there are either too few observations to do the regression at all, or many/most of the levels of the age variable are not instantiated in the country. There are also some where -ologit- fails to converge. So the results look ratty. But they are nonetheless correct given the data.

If in your real data there have similar problems, the results here contain a variable error_code which will give Stata's error code associated with ologit. That variable has missing values when the regression proceeds successfully.

-runby- is written by Robert Picard and me, and is available from SSC.

Thanks so much Clyde! As always, worked like magic!

This code is beyond my comprehension, still tried (unsuccessfully) to limit the decimal places this way:

Code:

capture noisily ologit happines i.age, or cformat(%9.2f)

I have two other points to mention. Since age has 7 categories, there should 6 sets of ORs since one is a base category. So I am not sure why there are still 7 ORs.
Also, the ORs and LLs are getting ranked, so there is no way to know which country a certain value corresponds to. So, is it possible to avoid the ordering?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#4

20 Oct 2021, 17:48

Well, the first problem is simple. The -cformat()- option doesn't help you because it only affects the way the regression results are displayed on the Results window--but you don't even get to see that in this code. The way to do this is to use the -format- command on the end results.

The remaining questions are harder, and your data may not even make it possible to resolve this. There are not, in fact, 7 age groups for each country. In your example data, every country exhibits only some of the values of the age variable--no country has them all:

Code:

. foreach c of local countries { 2. quietly levelsof age if country == `c', local(ages) 3. display `"`:label (country) `c'': `ages'"' 4. } Chad: 3 4 5 6 7 Gambia: 1 Ghana: 1 3 4 6 7 Guinea Bissau: 6 Guyana: 5 7 Honduras: 1 3 5 6 7 Lesotho: 2 3 7 Nepal: 1 4 7 Tonga: 4 7 Turkemenistan: 1 5 Tuvalu: 3 Bangladesh: 1 2 3 4 5 6 Congo: 2 3 4 5 Iraq: 1 2 3 Madagascar: 1 2 3 4 5 7 Palestine: 1 4 Sierra Leon: 1 5 7 Suriname: 2 Togo: 3 Tunisia: 2 4 6 Zimbabwe: 5 6 7

If this is true also of your real data, then it is impossible to do what you want. The reason is that if there are no observations with some age group in a given country, then the indicator ("dummy") for that age group cannot be included in the regression for that country's data. Now, it is possible to assure that the variables are named consistently across the countries (which was not the case in the earlier code, but is accomplished in the code shown below.) But that only creates the appearance, not the reality, of a solution. Because even if we assure that the OR for age = 2 is consistently called or2 in the results, the fact is that because different countries' regressions involve different sets of age variables, the OR for age = 2 means different things from one country to the next. So the comparison of the OR for, say, age5 in Turkmenistan, where it is contrasted only with age 1 as the reference category, with the OR for age 5 in Chad, where it is contrasted with age 3 as the reference category. This type of inconsistency in the data makes it impossible to meaningfully do what you are asking.

If you can get more data that overcomes this patchiness and provides data on all ages in every country, then you can run this code instead:

Code:

levelsof age, local(ages) foreach a of local ages { gen age_`a' = `a'.age } capture program drop one_country program define one_country capture ologit happines age_*, or if c(rc) == 0 { // SUCCESSFUL REGRESSION matrix M = r(table) levelsof age, local(ages) foreach a of local ages { gen or`a' = M["b", "happiness:age_`a'"] gen ll`a' = M["ll", "happiness:age_`a'"] gen ul`a' = M["ul", "happiness:age_`a'"] } } else { gen error_code = c(rc) } exit end runby one_country, by(country) levelsof age, local(ages) foreach a of local ages { order or`a' ll`a' ul`a', last } format or* ll* ul* %9.2f
1 like
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#5

20 Oct 2021, 20:28

Originally posted by Clyde Schechter View Post

Well, the first problem is simple. The -cformat()- option doesn't help you because it only affects the way the regression results are displayed on the Results window--but you don't even get to see that in this code. The way to do this is to use the -format- command on the end results.

The remaining questions are harder, and your data may not even make it possible to resolve this. There are not, in fact, 7 age groups for each country. In your example data, every country exhibits only some of the values of the age variable--no country has them all:

Code:

. foreach c of local countries { 2. quietly levelsof age if country == `c', local(ages) 3. display `"`:label (country) `c'': `ages'"' 4. } Chad: 3 4 5 6 7 Gambia: 1 Ghana: 1 3 4 6 7 Guinea Bissau: 6 Guyana: 5 7 Honduras: 1 3 5 6 7 Lesotho: 2 3 7 Nepal: 1 4 7 Tonga: 4 7 Turkemenistan: 1 5 Tuvalu: 3 Bangladesh: 1 2 3 4 5 6 Congo: 2 3 4 5 Iraq: 1 2 3 Madagascar: 1 2 3 4 5 7 Palestine: 1 4 Sierra Leon: 1 5 7 Suriname: 2 Togo: 3 Tunisia: 2 4 6 Zimbabwe: 5 6 7

If this is true also of your real data, then it is impossible to do what you want. The reason is that if there are no observations with some age group in a given country, then the indicator ("dummy") for that age group cannot be included in the regression for that country's data. Now, it is possible to assure that the variables are named consistently across the countries (which was not the case in the earlier code, but is accomplished in the code shown below.) But that only creates the appearance, not the reality, of a solution. Because even if we assure that the OR for age = 2 is consistently called or2 in the results, the fact is that because different countries' regressions involve different sets of age variables, the OR for age = 2 means different things from one country to the next. So the comparison of the OR for, say, age5 in Turkmenistan, where it is contrasted only with age 1 as the reference category, with the OR for age 5 in Chad, where it is contrasted with age 3 as the reference category. This type of inconsistency in the data makes it impossible to meaningfully do what you are asking.

If you can get more data that overcomes this patchiness and provides data on all ages in every country, then you can run this code instead:

Code:

levelsof age, local(ages) foreach a of local ages { gen age_`a' = `a'.age } capture program drop one_country program define one_country capture ologit happines age_*, or if c(rc) == 0 { // SUCCESSFUL REGRESSION matrix M = r(table) levelsof age, local(ages) foreach a of local ages { gen or`a' = M["b", "happiness:age_`a'"] gen ll`a' = M["ll", "happiness:age_`a'"] gen ul`a' = M["ul", "happiness:age_`a'"] } } else { gen error_code = c(rc) } exit end runby one_country, by(country) levelsof age, local(ages) foreach a of local ages { order or`a' ll`a' ul`a', last } format or* ll* ul* %9.2f

Thanks so much, Clyde! This time or7 ( must be the base category) is all 1.

So if the ORs must get ordered, then I will just have to double-check them with the values for individual countries. Maybe, you can consider updating the -runby- package later, as it might be very useful in certain circumstances.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#6

21 Oct 2021, 09:29

So if the ORs must get ordered, then I will just have to double-check them with the values for individual countries. Maybe, you can consider updating the -runby- package later, as it might be very useful in certain circumstances.

The code in #4 does not scramble the naming of the odds ratios. In that code, or1 always refers to the odds ratio associated with age = 1, or 2 always refers to the odds ratio associated with age = 2, etc.

Also, the renaming of the odds ratios in the code in #2 didn't arise from -runby-: it arose because I didn't take into account the variation among the countries in which variables would be included.

Yes, OR7 is always 1. This actually relates to your earlier remark that you expected to have only 6 OR's with 7 levels of age. The code in #4 does not use factor variable notation: it uses 7 pre-created indicators ("dummies") for age instead. Now, in all of the regressions, those indicators that are included will be colinear and one will be dropped. Stata designates the one that is listed last as the base category, and when the wildcard age_* is expanded, age_7 is last. So age_7 is always the base category in any country where there are observations with age = 7. The omitted reference category in any regression always has a coefficient of 0 (or, in odds ratio metric, an odds ratio of 1). That's why age_7 is always 1, except in those countries where it is missing because there are no age = 7 observations. In the latter countries, some other OR will be 1. There is always some base category, and its OR is always 1.
1 like
Comment
Sonnen Blume

Join Date: Aug 2018

Posts: 342
#7

22 Oct 2021, 01:33

Originally posted by Clyde Schechter View Post

The code in #4 does not scramble the naming of the odds ratios. In that code, or1 always refers to the odds ratio associated with age = 1, or 2 always refers to the odds ratio associated with age = 2, etc.

Also, the renaming of the odds ratios in the code in #2 didn't arise from -runby-: it arose because I didn't take into account the variation among the countries in which variables would be included.

Yes, OR7 is always 1. This actually relates to your earlier remark that you expected to have only 6 OR's with 7 levels of age. The code in #4 does not use factor variable notation: it uses 7 pre-created indicators ("dummies") for age instead. Now, in all of the regressions, those indicators that are included will be colinear and one will be dropped. Stata designates the one that is listed last as the base category, and when the wildcard age_* is expanded, age_7 is last. So age_7 is always the base category in any country where there are observations with age = 7. The omitted reference category in any regression always has a coefficient of 0 (or, in odds ratio metric, an odds ratio of 1). That's why age_7 is always 1, except in those countries where it is missing because there are no age = 7 observations. In the latter countries, some other OR will be 1. There is always some base category, and its OR is always 1.

Thank you Clyde! This was a very informative thread.
Comment

Announcement

Extract regression results using loop

Comment

Comment

Comment

Comment

Comment

Comment