Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins with categorical variables, consistency

    Hi Folks,

    In Stata 14, following an -xtreg- regression, I am generating -margins- in two different ways, but getting different results. I have a categorical variable, "race", and I want to understand the margins over a range of a continuous variable called "count". If I consider margins separately for each race category, I get different results than when I consider them as part of the i.race command. Why is this? Am I not considering something? A selection of my data is here also, generated with -dataex-. Thanks for any input.


    Code:
    tsset id mdate, delta(4)
    
    xtreg quartergpa count_oncampus  i.race financial year 
     
    margins, at(race==0) at(count=(0(2)10)) saving(file_race0, replace)
    margins, at(race==1) at(count=(0(2)10)) saving(file_race1, replace)
    margins, at(race==2) at(count=(0(2)10)) saving(file_race2, replace)
    
     margins i.race, at(count=(0(2)10)) saving(file_race_count, replace)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(quartergpa count race financial year)
      3.7 11 2 0 2013
     3.07 11 2 0 2013
      3.7 11 2 0 2013
        4 11 2 0 2014
      3.3 11 2 0 2014
        4 11 2 0 2014
      3.3 11 2 0 2015
      2.7 11 2 0 2015
        4 11 2 0 2015
        4 11 2 0 2016
        4 11 2 0 2016
        4 10 0 0 2012
        4 10 0 0 2012
      3.7 10 0 0 2014
        3 10 0 0 2014
     3.85 10 0 0 2014
      3.3 10 0 0 2015
     3.15 10 0 0 2015
     3.16 10 0 0 2015
     3.53 10 0 0 2016
     3.93 10 0 0 2016
      3.3 16 0 0 2012
      3.7 16 0 0 2012
      2.7 16 0 0 2012
        3 16 0 0 2013
      3.2 16 0 0 2013
     2.65 16 0 0 2013
     3.57 16 0 0 2014
     2.01 16 0 0 2014
     3.66 16 0 0 2014
      3.7 16 0 0 2015
     3.35 16 0 0 2015
     2.98 16 0 0 2015
        4 16 0 0 2016
     3.12 16 0 0 2016
     2.96 16 0 0 2016
     2.77 16 0 0 2017
     3.85  3 0 0 2012
      3.7  3 0 0 2012
     3.65  3 0 0 2012
      3.7  3 1 1 2012
        3  3 1 1 2012
     3.42  3 1 1 2012
     3.76  6 2 1 2013
        0 16 0 0 2012
        0 16 0 0 2012
        0 16 0 0 2012
        2 16 0 0 2013
        2 16 0 0 2013
        3 16 0 0 2013
        4 16 0 0 2014
        4 16 0 0 2014
        4 16 0 0 2014
        4 16 0 0 2015
     3.55 16 0 0 2015
     3.85 16 0 0 2015
     3.27 16 0 0 2016
     3.86 16 0 0 2016
     3.85 16 0 0 2016
      2.7 16 0 0 2017
        3  8 0 0 2012
        2  8 0 0 2012
      3.3  8 0 0 2012
        4  8 0 0 2013
        4  8 0 0 2013
      3.8  8 0 0 2013
      3.7  8 0 0 2014
      1.3  8 0 0 2014
     3.35  8 2 1 2012
      3.5  8 2 1 2012
      3.9  8 2 1 2012
     3.13  8 2 1 2013
      3.5  8 2 1 2013
     3.85  8 2 1 2013
        3  8 2 1 2014
     3.47  8 2 1 2014
    3.783  3 2 1 2012
     3.95  3 2 1 2012
        4  3 2 1 2012
        4 12 0 0 2012
        4 12 0 0 2012
        4 12 0 0 2013
        4 12 0 0 2013
        4 12 0 0 2014
        4 12 0 0 2014
     3.85 12 0 0 2014
        4 12 0 0 2015
        4 12 0 0 2015
        4 12 0 0 2015
        3 12 0 0 2016
      3.9 12 0 0 2016
        4  2 2 1 2016
     3.85  2 2 1 2017
        2  4 0 1 2012
    1.185  4 0 1 2012
        2  4 0 1 2012
        0  4 0 1 2013
    3.086  5 0 0 2012
    2.666  5 0 0 2012
     2.53  5 0 0 2012
    end










  • #2
    Unfortunately, your data example is not useful because it does not contain all the variables needed to run the code that is giving you problems.

    Nevertheless, you are misapplying the at() option. If you look closely at the output for
    Code:
    margins, at(race==0) at(count=(0(2)10)) saving(file_race0, replace)
    you will see that the first row of output is an overall margin for race = 0, having nothing to do with count (or, more accurately, averaged over count). The remaining rows of output are for values of count ranging from 0 by 2 to 10, but not specific to race == 0. These rows are averaged over all values of race.

    By contrast, if you scrutinize the output for
    Code:
     margins i.race, at(count=(0(2)10)) saving(file_race_count, replace)
    you will find that each row in that long table represents a combination of a specific value of race and a specific value of count. So there is no reason that these results should look like the results from the earlier -margins- commands (which are averaged over race, not specific by race).

    If you want to get -margins- to give you just the results for count = 0 by 2 to 10 but specific to race == 0, the code for that is:

    Code:
    margins, at(race == 0 count = (0(2)10))
    Note that both race and count are specified in the same at() option here.


    Comment


    • #3
      Thanks Clyde, this is helpful. I am reposting the dataex sample and code that should work to create the original problem, in case anyone wants it.

      -Steve

      Code:
      xtset id yq
      xtreg quartergpa count ///
       i.race financial year
       
      margins, at(race==0) at(count=(0(2)10)) saving(file_race0, replace)
      margins, at(race==1) at(count=(0(2)10)) saving(file_race1, replace)
      margins, at(race==2) at(count=(0(2)10)) saving(file_race2, replace)
      
         
      margins i.race, at(count=(0(2)10)) saving(file_race_count, replace)
      est store margins_race
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float(quartergpa count race financial year id yq)
        3.7 11 2 0 2013  1 212
       3.07 11 2 0 2013  1 213
        3.7 11 2 0 2013  1 214
          4 11 2 0 2014  1 216
        3.3 11 2 0 2014  1 217
          4 11 2 0 2014  1 218
        3.3 11 2 0 2015  1 220
        2.7 11 2 0 2015  1 221
          4 11 2 0 2015  1 222
          4 11 2 0 2016  1 224
          4 11 2 0 2016  1 225
          4 10 0 0 2012  3 208
          4 10 0 0 2012  3 209
        3.7 10 0 0 2014  3 216
          3 10 0 0 2014  3 217
       3.85 10 0 0 2014  3 218
        3.3 10 0 0 2015  3 220
       3.15 10 0 0 2015  3 221
       3.16 10 0 0 2015  3 222
       3.53 10 0 0 2016  3 224
       3.93 10 0 0 2016  3 225
        3.3 16 0 0 2012  6 208
        3.7 16 0 0 2012  6 209
        2.7 16 0 0 2012  6 210
          3 16 0 0 2013  6 212
        3.2 16 0 0 2013  6 213
       2.65 16 0 0 2013  6 214
       3.57 16 0 0 2014  6 216
       2.01 16 0 0 2014  6 217
       3.66 16 0 0 2014  6 218
        3.7 16 0 0 2015  6 220
       3.35 16 0 0 2015  6 221
       2.98 16 0 0 2015  6 222
          4 16 0 0 2016  6 224
       3.12 16 0 0 2016  6 225
       2.96 16 0 0 2016  6 226
       2.77 16 0 0 2017  6 228
       3.85  3 0 0 2012  8 208
        3.7  3 0 0 2012  8 209
       3.65  3 0 0 2012  8 210
        3.7  3 1 1 2012 10 208
          3  3 1 1 2012 10 209
       3.42  3 1 1 2012 10 210
       3.76  6 2 1 2013 12 214
          0 16 0 0 2012 13 208
          0 16 0 0 2012 13 209
          0 16 0 0 2012 13 210
          2 16 0 0 2013 13 212
          2 16 0 0 2013 13 213
          3 16 0 0 2013 13 214
          4 16 0 0 2014 13 216
          4 16 0 0 2014 13 217
          4 16 0 0 2014 13 218
          4 16 0 0 2015 13 220
       3.55 16 0 0 2015 13 221
       3.85 16 0 0 2015 13 222
       3.27 16 0 0 2016 13 224
       3.86 16 0 0 2016 13 225
       3.85 16 0 0 2016 13 226
        2.7 16 0 0 2017 13 228
          3  8 0 0 2012 14 208
          2  8 0 0 2012 14 209
        3.3  8 0 0 2012 14 210
          4  8 0 0 2013 14 212
          4  8 0 0 2013 14 213
        3.8  8 0 0 2013 14 214
        3.7  8 0 0 2014 14 216
        1.3  8 0 0 2014 14 217
       3.35  8 2 1 2012 17 208
        3.5  8 2 1 2012 17 209
        3.9  8 2 1 2012 17 210
       3.13  8 2 1 2013 17 212
        3.5  8 2 1 2013 17 213
       3.85  8 2 1 2013 17 214
          3  8 2 1 2014 17 216
       3.47  8 2 1 2014 17 217
      3.783  3 2 1 2012 19 208
       3.95  3 2 1 2012 19 209
          4  3 2 1 2012 19 210
          4 12 0 0 2012 20 209
          4 12 0 0 2012 20 210
          4 12 0 0 2013 20 213
          4 12 0 0 2013 20 214
          4 12 0 0 2014 20 216
          4 12 0 0 2014 20 217
       3.85 12 0 0 2014 20 218
          4 12 0 0 2015 20 220
          4 12 0 0 2015 20 221
          4 12 0 0 2015 20 222
          3 12 0 0 2016 20 224
        3.9 12 0 0 2016 20 225
          4  2 2 1 2016 21 226
       3.85  2 2 1 2017 21 228
          2  4 0 1 2012 22 208
      1.185  4 0 1 2012 22 209
          2  4 0 1 2012 22 210
          0  4 0 1 2013 22 212
      3.086  5 0 0 2012 23 208
      2.666  5 0 0 2012 23 209
       2.53  5 0 0 2012 23 210
      end

      Comment


      • #4
        Thanks for reposting that.

        Comment


        • #5
          Hi Clyde, although your script works for my truncated data. When I use it with the full data set (some 300,000 observations), I get an error. Let me show it below the code I am using (State 14, -margins- following an -xtreg-).

          Code:
          margins, at(race==0 count=(0(5)20))
          default prediction is a function of possibly stochastic quantities other than e(b)

          Comment


          • #6
            I don't understand how that can happen. That error message is seen when the -margins- command has been asked to calculate something that simply cannot be calculated for that model. But if it works for a subset of the data, it should work for the whole thing; it shouldn't depend on the particular data set. Moreover, -margins- after -xtreg- has xb as its default prediction, and that quite clearly does not depend on stochastic quantities other than e(b). I have no idea what's going on here. Are you sure you didn't use a different regression command?

            Comment


            • #7
              Hi Clyde, I think it was a mistake on my end. I did duplicate the error, but then retyping the code, it worked. So, must have been a typo. Thanks.

              Comment

              Working...
              X