Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract regression results using loop

    Hi,

    I am running the following analysis for 18 countries, and I want to make a chart that shows the ORs and 95%CIs for each country. How can I extract the ORs and 95%CIs by country as a list?

    Code:
    levelsof country, local(country)
    foreach i of local country {
      ologit happiness i.age if country == `i'
     
    }


    Code:
    * Example generated by -dataex-. To install: ssc install    dataex
    clear
    input long(happiness age country)
    . 2 22
    3 3  8
    1 3 14
    3 2 15
    3 4 15
    2 6  4
    . 1 17
    3 6 23
    . . 16
    . 7 17
    . . 17
    1 5 14
    . 6  7
    1 1  3
    2 6  2
    . 5  7
    . 5  7
    1 3 16
    . 3 17
    . 4 22
    4 5 14
    2 6 14
    . 2 22
    2 5 23
    2 3 21
    3 4  9
    1 6 14
    2 4  2
    2 5  2
    3 4 15
    . 1 12
    2 4 15
    . 6 22
    3 1 19
    . 2 17
    1 3  4
    1 1  4
    2 1 18
    . 5  7
    2 7  2
    2 3  2
    2 5 19
    . 1  7
    . 1 12
    2 1 14
    . 4 17
    2 1 19
    2 5 14
    1 4 14
    1 7 11
    4 5 15
    3 5  2
    . . 14
    5 7 19
    3 2  8
    2 5 14
    . 5  6
    2 2 16
    2 4 18
    2 4  4
    5 7  4
    2 2 14
    3 3  8
    1 1 16
    2 3 13
    . 2 22
    2 1  4
    2 3 14
    . 5 12
    . . 23
    . 7 17
    1 1 19
    . . 14
    1 3 21
    . 7  7
    4 2 20
    1 1  9
    . 3  7
    . .  8
    3 5 23
    . 7  6
    . 7  7
    3 7  9
    1 7  8
    . 5  7
    2 6 14
    2 3 15
    1 1 14
    2 1 18
    2 7 23
    5 4  2
    1 6 14
    4 7 19
    1 6  5
    2 1 14
    1 4 11
    . 7  7
    2 2 14
    . 5 17
    2 5 14
    end
    label values happiness LS1
    label def LS1 1 "Very Happy", modify
    label def LS1 2 "Somewhat Happy", modify
    label def LS1 3 "Neither Happy Nor Unhappy", modify
    label def LS1 4 "Somewhat Unhappy", modify
    label def LS1 5 "Very Unhappy", modify
    label values Age WAGE
    label def WAGE 1 "15-19", modify
    label def WAGE 2 "20-24", modify
    label def WAGE 3 "25-29", modify
    label def WAGE 4 "30-34", modify
    label def WAGE 5 "35-39", modify
    label def WAGE 6 "40-44", modify
    label def WAGE 7 "45-49", modify
    label values country cn
    label def cn 2 "Chad", modify
    label def cn 3 "Gambia", modify
    label def cn 4 "Ghana", modify
    label def cn 5 "Guinea Bissau", modify
    label def cn 6 "Guyana", modify
    label def cn 7 "Honduras", modify
    label def cn 8 "Lesotho", modify
    label def cn 9 "Nepal", modify
    label def cn 11 "Tonga", modify
    label def cn 12 "Turkemenistan", modify
    label def cn 13 "Tuvalu", modify
    label def cn 14 "Bangladesh", modify
    label def cn 15 "Congo", modify
    label def cn 16 "Iraq", modify
    label def cn 17 "Madagascar", modify
    label def cn 18 "Palestine", modify
    label def cn 19 "Sierra Leon", modify
    label def cn 20 "Suriname", modify
    label def cn 21 "Togo", modify
    label def cn 22 "Tunisia", modify
    label def cn 23 "Zimbabwe", modify
    copy up to and including the previous line - ----------------

    Listed 100 out of 17320 observations
    Use the count() option to list more

    .


  • #2
    Code:
    capture program drop one_country
    program define one_country
        capture noisily ologit happines i.age, or
        if c(rc) == 0 { //  SUCCESSFUL REGRESSION
            matrix M = r(table)
            levelsof age, local(ages)
            foreach a of local ages {
                gen or`a' = M["b", "happiness:`a'.age"]
                gen ll`a' = M["ll", "happiness:`a'.age"]
                gen ul`a' = M["ul", "happiness:`a'.age"]
            }
        }
        else {
            gen error_code = c(rc)
        }
        exit
    end
    
    runby one_country, by(country) status
    levelsof age, local(ages)
    foreach a of local ages {
        order or`a' ll`a' ul`a', last
    }
    This code works poorly with the example data because for many of the countries there are either too few observations to do the regression at all, or many/most of the levels of the age variable are not instantiated in the country. There are also some where -ologit- fails to converge. So the results look ratty. But they are nonetheless correct given the data.

    If in your real data there have similar problems, the results here contain a variable error_code which will give Stata's error code associated with ologit. That variable has missing values when the regression proceeds successfully.

    -runby- is written by Robert Picard and me, and is available from SSC.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Code:
      capture program drop one_country
      program define one_country
      capture noisily ologit happines i.age, or
      if c(rc) == 0 { // SUCCESSFUL REGRESSION
      matrix M = r(table)
      levelsof age, local(ages)
      foreach a of local ages {
      gen or`a' = M["b", "happiness:`a'.age"]
      gen ll`a' = M["ll", "happiness:`a'.age"]
      gen ul`a' = M["ul", "happiness:`a'.age"]
      }
      }
      else {
      gen error_code = c(rc)
      }
      exit
      end
      
      runby one_country, by(country) status
      levelsof age, local(ages)
      foreach a of local ages {
      order or`a' ll`a' ul`a', last
      }
      This code works poorly with the example data because for many of the countries there are either too few observations to do the regression at all, or many/most of the levels of the age variable are not instantiated in the country. There are also some where -ologit- fails to converge. So the results look ratty. But they are nonetheless correct given the data.

      If in your real data there have similar problems, the results here contain a variable error_code which will give Stata's error code associated with ologit. That variable has missing values when the regression proceeds successfully.

      -runby- is written by Robert Picard and me, and is available from SSC.

      Thanks so much Clyde! As always, worked like magic!

      This code is beyond my comprehension, still tried (unsuccessfully) to limit the decimal places this way:
      Code:
       
         capture noisily ologit happines i.age, or cformat(%9.2f)
      I have two other points to mention. Since age has 7 categories, there should 6 sets of ORs since one is a base category. So I am not sure why there are still 7 ORs.
      Also, the ORs and LLs are getting ranked, so there is no way to know which country a certain value corresponds to. So, is it possible to avoid the ordering?


      Comment


      • #4
        Well, the first problem is simple. The -cformat()- option doesn't help you because it only affects the way the regression results are displayed on the Results window--but you don't even get to see that in this code. The way to do this is to use the -format- command on the end results.

        The remaining questions are harder, and your data may not even make it possible to resolve this. There are not, in fact, 7 age groups for each country. In your example data, every country exhibits only some of the values of the age variable--no country has them all:
        Code:
        . foreach c of local countries {
          2.     quietly levelsof age if country == `c', local(ages)
          3.     display `"`:label (country) `c'': `ages'"'
          4. }
        Chad: 3 4 5 6 7
        Gambia: 1
        Ghana: 1 3 4 6 7
        Guinea Bissau: 6
        Guyana: 5 7
        Honduras: 1 3 5 6 7
        Lesotho: 2 3 7
        Nepal: 1 4 7
        Tonga: 4 7
        Turkemenistan: 1 5
        Tuvalu: 3
        Bangladesh: 1 2 3 4 5 6
        Congo: 2 3 4 5
        Iraq: 1 2 3
        Madagascar: 1 2 3 4 5 7
        Palestine: 1 4
        Sierra Leon: 1 5 7
        Suriname: 2
        Togo: 3
        Tunisia: 2 4 6
        Zimbabwe: 5 6 7
        If this is true also of your real data, then it is impossible to do what you want. The reason is that if there are no observations with some age group in a given country, then the indicator ("dummy") for that age group cannot be included in the regression for that country's data. Now, it is possible to assure that the variables are named consistently across the countries (which was not the case in the earlier code, but is accomplished in the code shown below.) But that only creates the appearance, not the reality, of a solution. Because even if we assure that the OR for age = 2 is consistently called or2 in the results, the fact is that because different countries' regressions involve different sets of age variables, the OR for age = 2 means different things from one country to the next. So the comparison of the OR for, say, age5 in Turkmenistan, where it is contrasted only with age 1 as the reference category, with the OR for age 5 in Chad, where it is contrasted with age 3 as the reference category. This type of inconsistency in the data makes it impossible to meaningfully do what you are asking.

        If you can get more data that overcomes this patchiness and provides data on all ages in every country, then you can run this code instead:
        Code:
        levelsof age, local(ages)
        foreach a of local ages {
            gen age_`a' = `a'.age
        }
        
        capture program drop one_country
        program define one_country
            capture ologit happines age_*, or
            if c(rc) == 0 { //  SUCCESSFUL REGRESSION
                matrix M = r(table)
                levelsof age, local(ages)
                foreach a of local ages {
                    gen or`a' = M["b", "happiness:age_`a'"]
                    gen ll`a' = M["ll", "happiness:age_`a'"]
                    gen ul`a' = M["ul", "happiness:age_`a'"]
                }
            }
            else {
                gen error_code = c(rc)
            }
            exit
        end
        
        runby one_country, by(country)
        levelsof age, local(ages)
        foreach a of local ages {
            order or`a' ll`a' ul`a', last
        }
        format or* ll* ul* %9.2f

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, the first problem is simple. The -cformat()- option doesn't help you because it only affects the way the regression results are displayed on the Results window--but you don't even get to see that in this code. The way to do this is to use the -format- command on the end results.

          The remaining questions are harder, and your data may not even make it possible to resolve this. There are not, in fact, 7 age groups for each country. In your example data, every country exhibits only some of the values of the age variable--no country has them all:
          Code:
          . foreach c of local countries {
          2. quietly levelsof age if country == `c', local(ages)
          3. display `"`:label (country) `c'': `ages'"'
          4. }
          Chad: 3 4 5 6 7
          Gambia: 1
          Ghana: 1 3 4 6 7
          Guinea Bissau: 6
          Guyana: 5 7
          Honduras: 1 3 5 6 7
          Lesotho: 2 3 7
          Nepal: 1 4 7
          Tonga: 4 7
          Turkemenistan: 1 5
          Tuvalu: 3
          Bangladesh: 1 2 3 4 5 6
          Congo: 2 3 4 5
          Iraq: 1 2 3
          Madagascar: 1 2 3 4 5 7
          Palestine: 1 4
          Sierra Leon: 1 5 7
          Suriname: 2
          Togo: 3
          Tunisia: 2 4 6
          Zimbabwe: 5 6 7
          If this is true also of your real data, then it is impossible to do what you want. The reason is that if there are no observations with some age group in a given country, then the indicator ("dummy") for that age group cannot be included in the regression for that country's data. Now, it is possible to assure that the variables are named consistently across the countries (which was not the case in the earlier code, but is accomplished in the code shown below.) But that only creates the appearance, not the reality, of a solution. Because even if we assure that the OR for age = 2 is consistently called or2 in the results, the fact is that because different countries' regressions involve different sets of age variables, the OR for age = 2 means different things from one country to the next. So the comparison of the OR for, say, age5 in Turkmenistan, where it is contrasted only with age 1 as the reference category, with the OR for age 5 in Chad, where it is contrasted with age 3 as the reference category. This type of inconsistency in the data makes it impossible to meaningfully do what you are asking.

          If you can get more data that overcomes this patchiness and provides data on all ages in every country, then you can run this code instead:
          Code:
          levelsof age, local(ages)
          foreach a of local ages {
          gen age_`a' = `a'.age
          }
          
          capture program drop one_country
          program define one_country
          capture ologit happines age_*, or
          if c(rc) == 0 { // SUCCESSFUL REGRESSION
          matrix M = r(table)
          levelsof age, local(ages)
          foreach a of local ages {
           gen or`a' = M["b", "happiness:age_`a'"]
          gen ll`a' = M["ll", "happiness:age_`a'"]
          gen ul`a' = M["ul", "happiness:age_`a'"]
          }
          }
          else {
          gen error_code = c(rc)
          }
          exit
          end
          
          runby one_country, by(country)
          levelsof age, local(ages)
          foreach a of local ages {
          order or`a' ll`a' ul`a', last
          }
          format or* ll* ul* %9.2f
          Thanks so much, Clyde! This time or7 ( must be the base category) is all 1.

          So if the ORs must get ordered, then I will just have to double-check them with the values for individual countries. Maybe, you can consider updating the -runby- package later, as it might be very useful in certain circumstances.

          Comment


          • #6
            So if the ORs must get ordered, then I will just have to double-check them with the values for individual countries. Maybe, you can consider updating the -runby- package later, as it might be very useful in certain circumstances.
            The code in #4 does not scramble the naming of the odds ratios. In that code, or1 always refers to the odds ratio associated with age = 1, or 2 always refers to the odds ratio associated with age = 2, etc.

            Also, the renaming of the odds ratios in the code in #2 didn't arise from -runby-: it arose because I didn't take into account the variation among the countries in which variables would be included.

            Yes, OR7 is always 1. This actually relates to your earlier remark that you expected to have only 6 OR's with 7 levels of age. The code in #4 does not use factor variable notation: it uses 7 pre-created indicators ("dummies") for age instead. Now, in all of the regressions, those indicators that are included will be colinear and one will be dropped. Stata designates the one that is listed last as the base category, and when the wildcard age_* is expanded, age_7 is last. So age_7 is always the base category in any country where there are observations with age = 7. The omitted reference category in any regression always has a coefficient of 0 (or, in odds ratio metric, an odds ratio of 1). That's why age_7 is always 1, except in those countries where it is missing because there are no age = 7 observations. In the latter countries, some other OR will be 1. There is always some base category, and its OR is always 1.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              The code in #4 does not scramble the naming of the odds ratios. In that code, or1 always refers to the odds ratio associated with age = 1, or 2 always refers to the odds ratio associated with age = 2, etc.

              Also, the renaming of the odds ratios in the code in #2 didn't arise from -runby-: it arose because I didn't take into account the variation among the countries in which variables would be included.

              Yes, OR7 is always 1. This actually relates to your earlier remark that you expected to have only 6 OR's with 7 levels of age. The code in #4 does not use factor variable notation: it uses 7 pre-created indicators ("dummies") for age instead. Now, in all of the regressions, those indicators that are included will be colinear and one will be dropped. Stata designates the one that is listed last as the base category, and when the wildcard age_* is expanded, age_7 is last. So age_7 is always the base category in any country where there are observations with age = 7. The omitted reference category in any regression always has a coefficient of 0 (or, in odds ratio metric, an odds ratio of 1). That's why age_7 is always 1, except in those countries where it is missing because there are no age = 7 observations. In the latter countries, some other OR will be 1. There is always some base category, and its OR is always 1.
              Thank you Clyde! This was a very informative thread.

              Comment

              Working...
              X