Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Marginal effects of dummy variables

    Hello everyone,

    I have a little bit complicated question. I hope it will be understandable. Please, excuse my English, I am not native speaker.

    ---------------------------------------------
    I have a dynamic linear panel regression model. I use panel data from the 34 OECD countries in 1995 – 2014 period (annual data). I divide OECD countries into three groups (A, B, C) – according to certain characteristics.

    Empirical model has the following form:
    Yit=αYit-1 + βXit + γGROWTHit + δele_ait + δele_bit + δele_cit + μi + εit
    • dependent variable (Yit) is the government budget deficit (def)
    • independent variables are GDP growth (GROWTH) and independent variables contained in row vector Xit
    • however, most important independent variables are dummy variables (ele_a, ele_b, ele_c)
      • ele_a takes the value 1 in the year of elections in the countries A, otherwise it takes value 0
      • ele_b takes the value 1 in the year of elections in the countries B, otherwise it takes value 0
      • ele_c takes the value 1 in the year of elections in the countries C, otherwise it takes value 0
    • we can also consider a dummy variable ELE, which takes the value 1 in the year of elections for all countries (no matter the group), otherwise it takes value 0. I do not want to use this variable, but it is useful for further explanation.
    I estimate the model on the whole data sample, i.e. I do not estimate three models (one for each group of countries). If I use the dummy variable ELE, it shows me (its regression coefficient) how the budget deficit differs, in average, in the time of elections compared with the non-election period. The variable ELE is often used in these types of models and an interpretation is clear.
    However, if I use the dummy variable ele_a (it is analogous for ele_b and ele_c), it shows me how the budget deficit in the countries A differs, in average, in the time of elections compared with the non-election periods of countries A, B and C.
    • ele_a thus does not mean marginal effect of election years in the countries in the group A - relatively to non-election years in the countries in A, but marginal effect of election years in the countries in the group A – relatively to non-election years in the all countries.
    • I can not interpret the dummy variable A as a variable which shows me differences in the budget deficits in the election and non-election periods in the countries included in the group A.
    This is my question: What should I do if I want ele_a to show me the differences in the budget deficits in the election and non-election periods in the countries included only in the group A.
    • I do not want to estimate three separate models, each for every one of the groups A,B C (econometrical reasons – too few individual units (say 34/3) compared to periods (20).
    • In my opinion, if I somehow capture fixed effects in my model (OLS FE or e.g. diff.GMM), the independent variable would be „demeaned“ and the effect on ele_a should not be given by e.g. the fact, that countries in the group A have in the average (both, in the elections and non-election periods) higher budget deficits. Moreover, I use other independent variables, which ensure that their effects are not erroneously assigned to effects of elections.

    I send subsample of my data as an attachment.

    Thank You very much for any reply
    Attached Files

  • #2
    This will be much easier if you use Stata's factor variable notation to create interaction terms. See -help fvvarlist- and the associated manual section.

    I assume your data is panel data, identified by country and year. I also assume there is a variable, I'll call it election_year, which is 1 if it is an election year in that particular country, and 0 otherwise. Also, your group variable needs to be numeric (1, 2, 3), not string. If you want to assign value labels to it, that's fine, but the underlying stored variable needs to be integers.

    Code:
    xtset country year
    // CODE TO DEFINE COUNTRY GROUPS GOES HERE
    gen group = 1 if whatever
    replace group = 2 if whatever else
    replace group = 3 if whatever else
    
    label define group 1 "A" 2 "B" 3 "C"
    label values group group
    
    xtreg y L.y i.group##i.election_year growth, fe // OR re--you don't say which
    margins i.group, dydx(election_year)
    The output of the -margins- command will give you the correct marginal effect of election_year in each group.

    If you are not familiar with the -margins- command, I suggest you start learning about it by reading https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. After you have done that, (much) more information is available in the -margins- section of the online user manual.

    Added: The reason I show code for a hypothetical data set rather than adapting it specifically to your example is that you posted an Excel spreadsheet. That's a bad idea for several reasons. First, some of the most frequent responders on this forum do not use Microsoft Office. Second, because Office documents can contain active malware, some who, like me, do use Office, will not download Office documents from strangers. Third, spreadsheets are sometimes missing important metadata, such as the difference between value-labeled numeric variables and strings. The helpful way to give example data is to run the -dataex- command on your Stata data set. You can install the -dataex- command by running -ssc install dataex-. Read the simple instructions by running -help dataex-. In the future, please be helpful to those who want to help you by always using -dataex- to show example data.



    Last edited by Clyde Schechter; 22 Feb 2017, 09:13.

    Comment


    • #3
      Clyde, thank You for Your answer. I have read something from the literature that you had recommended me. Now, I am at least a little bit oriented in the factor variable issue.
      However, still, I have some troubles with my model.
      1. After the command –margins- I received an error message: “default prediction is a function of possibly stochastic quantities other than e(b)”. I have read something about it and the problem is in the inclusion of both fixed and random effects in mixed effects models. I do not get it, because the model I have specified includes only fixed effects…
      2. Factor variables are not allowed for dynamic panel data (DPD) estimators as sys.GMM (xtdpdsys) or diff.GMM (xtabond), which are crucial for my estimation. I think that this second problem is fatal.

      Comment


      • #4
        Regarding 1. It is unclear to me why this happened. Please post an example of your data (using the -dataex- command!)*, and also show the exact code you used for your regression command and Stata's output for that, as well as the exact code you used for your -margins- command and Stata's output from that (including error messages). Read FAQ #12 for instructions on how to install the -dataex- command, and how to post code and Stata output in a clean, easily readable way.

        Regarding 2. This does seem to preclude using this approach. There is an older command, -mfx- to calculate marginal effects post-estimation. It has been largely superseded by -margins-, but one of its remaining advantages is that you do not need to use factor-variable notation in the regression in order to use it. I haven't used -mfx- in a long time and don't remember its syntax (which is rather different from that of -margins-) well enough to guide you here. The other thought I have to offer is that there is an -xtabond2- command; I do not know if it supports factor-variable notation, but it might be worth a quick try.

        Comment


        • #5

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int year str3 country double(def_imf growth) byte exchange double(trade NAIRU pr_15_74) byte(ele_a ele_b ele_c ELE)
          1995 "AUS" -2.031 3.549 13 31.089616711801337 7.987 67.002 0 0 0 0
          1996 "AUS"  -.887 3.893 13  31.32741681790684 7.692 67.249 1 0 0 1
          1997 "AUS"   .063  3.81 13 29.570819209699362 7.466 66.978 0 0 0 0
          1998 "AUS"   .342 5.141 13 30.184032197513417 7.142  66.97 1 0 0 1
          1999 "AUS"  1.281 4.028 13   32.2220162742667 6.747 66.883 0 0 0 0
          2000 "AUS"  1.853 3.383 13  32.63004257425143 6.409 67.491 0 0 0 0
          2001 "AUS"  -.037 2.613 13  33.62977492162032 6.536  67.91 1 0 0 1
          2002 "AUS"   .143 3.967 13  34.93754108108969 6.302 68.154 0 0 0 0
          2003 "AUS"    .95 3.006 13  34.19362870916176 5.925 68.503 0 0 0 0
          2004 "AUS"  1.327 3.957 13  31.98139760467279 5.516 68.332 1 0 0 1
          2005 "AUS"  1.712 3.242 13  33.38426701769876 5.279 69.267 0 0 0 0
          2006 "AUS"  1.775 2.688 13  35.17166368087704 5.135 69.475 0 0 0 0
          2007 "AUS"  1.476 4.525 13  35.95254372406202 4.968 69.889 1 0 0 1
          2008 "AUS" -1.101 2.515 13  36.74807579012871 4.969 70.327 0 0 0 0
          2009 "AUS" -4.559 1.633 13 34.514843564677115  5.33 70.393 0 0 0 0
          2010 "AUS" -5.108 2.317 13  36.26815778605286 5.277 70.188 1 0 0 1
          2011 "AUS" -4.476 2.638 13  36.98688393174185  5.21 70.123 0 0 0 0
          2012 "AUS" -3.482  3.67 13  33.65012709408026 5.276 69.869 0 0 0 0
          2013 "AUS" -2.808 2.049 13 31.636239951775586 5.513 69.698 1 0 0 1
          2014 "AUS" -2.809 2.736 13  32.84777385359695 5.711 69.481 0 0 0 0
          1995 "AUT" -6.148 2.668  4  51.55794135993593 3.846  64.03 0 1 0 1
          1996 "AUT" -4.357 2.398  4   53.5344433112661 3.915  63.52 0 0 0 0
          1997 "AUT" -2.389 2.451  4  59.11876049069809 3.985 63.657 0 0 0 0
          1998 "AUT" -2.724 3.539  4 61.368604253963646 4.045 63.728 0 0 0 0
          1999 "AUT" -2.597  3.42  1 63.389775687867655 4.073 63.358 0 1 0 1
          end
          format %ty year
          Code:
          egen countryid = group(country)
          xtset countryid year, yearly
          
          gen group = 1 if dumm_a==1
          replace group = 2 if dumm_b==1
          replace group = 3 if dumm_c==1
          
          
          
          . xtreg def_imf L1.def_imf growth exchange D1.trade D1.NAIRU D1.pr_15_74 i.group##i.ELE, fe
          note: 2.group omitted because of collinearity
          note: 3.group omitted because of collinearity
          
          Fixed-effects (within) regression               Number of obs      =       613
          Group variable: countryid                       Number of groups   =        34
          
          R-sq:  within  = 0.5854                         Obs per group: min =        13
                 between = 0.9490                                        avg =      18.0
                 overall = 0.7480                                        max =        19
          
                                                          F(9,570)           =     89.43
          corr(u_i, Xb)  = 0.5676                         Prob > F           =    0.0000
          
          ------------------------------------------------------------------------------
               def_imf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               def_imf |
                   L1. |   .5605367   .0284029    19.74   0.000     .5047496    .6163238
                       |
                growth |   .2941955   .0361518     8.14   0.000     .2231885    .3652025
              exchange |   .0063628   .0510776     0.12   0.901    -.0939604     .106686
                       |
                 trade |
                   D1. |   .0272501   .0152501     1.79   0.074    -.0027032    .0572033
                       |
                 NAIRU |
                   D1. |  -1.098206   .3485213    -3.15   0.002    -1.782748   -.4136629
                       |
              pr_15_74 |
                   D1. |     .51238   .1479419     3.46   0.001     .2218021    .8029578
                       |
                 group |
                    2  |          0  (omitted)
                    3  |          0  (omitted)
                       |
                 1.ELE |  -.0373249   .3108095    -0.12   0.904    -.6477965    .5731468
                       |
             group#ELE |
                  2 1  |  -.1231955   .4362761    -0.28   0.778    -.9801005    .7337095
                  3 1  |  -.8229748   .4555678    -1.81   0.071    -1.717771    .0718218
                       |
                 _cons |  -1.630928     .35872    -4.55   0.000    -2.335502   -.9263532
          -------------+----------------------------------------------------------------
               sigma_u |  1.4351218
               sigma_e |  1.9860636
                   rho |  .34303241   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0:     F(33, 570) =     4.46             Prob > F = 0.0000
          
          . margins i.group, dydx(ELE)
          default prediction is a function of possibly stochastic quantities other than e(b)
          r(498);


          formerly, i did it in this way
          Code:
           xtreg def_imf L1.def_imf growth exchange D1.trade D1.NAIRU D1.pr_15_74 ele_a ele_b ele_c, fe
          
          Fixed-effects (within) regression               Number of obs      =       613
          Group variable: countryid                       Number of groups   =        34
          
          R-sq:  within  = 0.5854                         Obs per group: min =        13
                 between = 0.9490                                        avg =      18.0
                 overall = 0.7480                                        max =        19
          
                                                          F(9,570)           =     89.43
          corr(u_i, Xb)  = 0.5676                         Prob > F           =    0.0000
          
          ------------------------------------------------------------------------------
               def_imf |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
               def_imf |
                   L1. |   .5605367   .0284029    19.74   0.000     .5047496    .6163238
                       |
                growth |   .2941955   .0361518     8.14   0.000     .2231885    .3652025
              exchange |   .0063628   .0510776     0.12   0.901    -.0939604     .106686
                       |
                 trade |
                   D1. |   .0272501   .0152501     1.79   0.074    -.0027032    .0572033
                       |
                 NAIRU |
                   D1. |  -1.098206   .3485213    -3.15   0.002    -1.782748   -.4136629
                       |
              pr_15_74 |
                   D1. |     .51238   .1479419     3.46   0.001     .2218021    .8029578
                       |
                 ele_a |  -.0373249   .3108095    -0.12   0.904    -.6477965    .5731468
                 ele_b |  -.1605203   .3068916    -0.52   0.601    -.7632967     .442256
                 ele_c |  -.8602996   .3334425    -2.58   0.010    -1.515225   -.2053738
                 _cons |  -1.630928     .35872    -4.55   0.000    -2.335502   -.9263532
          -------------+----------------------------------------------------------------
               sigma_u |  1.4351218
               sigma_e |  1.9860636
                   rho |  .34303241   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0:     F(33, 570) =     4.46             Prob > F = 0.0000

          Comment


          • #6
            Originally posted by Jan Janku View Post
            • however, most important independent variables are dummy variables (ele_a, ele_b, ele_c)
              • ele_a takes the value 1 in the year of elections in the countries A, otherwise it takes value 0
              • ele_b takes the value 1 in the year of elections in the countries B, otherwise it takes value 0
              • ele_c takes the value 1 in the year of elections in the countries C, otherwise it takes value 0
            • ele_a thus does not mean marginal effect of election years in the countries in the group A - relatively to non-election years in the countries in A, but marginal effect of election years in the countries in the group A – relatively to non-election years in the all countries.
            • I can not interpret the dummy variable A as a variable which shows me differences in the budget deficits in the election and non-election periods in the countries included in the group A.
            I do not understand the initial problem. If the dummy variable ele_a takes on the value 1 in election years in the country group A and 0 otherwise, then its coefficient DOES measure the marginal effect of election years in the group A relative to non-election years in group A (irrespective of whether there has been an election in groups B or C). You CAN interpret it as the differences in the budget deficits in election and non-election periods in group A, conditional on everything else that is included in the model.
            https://www.kripfganz.de/stata/

            Comment


            • #7
              Sebestian, thank You for Your response, but i am not sure about that.

              I use panel data and i have three groups of countries.
              ele_a takes the value 1 in the year of elections in the countries A, otherwise it takes the value 0 (it also means that it takes the value 0 in the countries B and C in all years, with elections or without elections). You can see it from my post above (ele_a takes values 0 and 1 for AUS but always zeroes for AUT). Simply put, there is the 1 in the election years in some countries (i denote these countries "A") against zeroes in the nonelection years in countries A, election years in the countries B,C and nonelections years in the countries B,C as well.

              Originally, I thought that there is not a problem (as you say), but my supervisor told me that there is the problem with my interpretation.

              Comment


              • #8
                Based on your initial model, the coefficient of ele_a has the following formal interpretation:
                \[\delta = E[Y_{it} | ele_{a,it} = 1; Y_{i,t-1}, \mathbf{X}_{it}, Growth_{it}, ele_{b,it}, ele_{c,it}] - E[Y_{it} | ele_{a,it} = 0; Y_{i,t-1}, \mathbf{X}_{it}, Growth_{it}, ele_{b,it}, ele_{c,it}]\]

                Because the value of your dummy variable ele_a never changes for countries in groups B and C and you are conditioning on the dummies ele_b and ele_c, the coefficient \(\delta\) measures the effect of a change from 0 to 1 in the dummy ele_a on your outcome variable, and this change just happens in election years in group A. The dummy is 0 for countries in groups B and C both when there are elections in group A and when there are no elections in group A.
                https://www.kripfganz.de/stata/

                Comment


                • #9
                  Yes, thank You, now I understand.

                  Comment

                  Working...
                  X