Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about using areg (fixed effects model) and two sets of dummy variables--please help!

    Hi Everyone,

    I am trying to estimate a model that has a dummy variable income (1=high income, 0= low income) and many country dummy variables. I am trying to do something like this:

    var1= B1*var2 + B2*var3 + B3*income +B4*var2*income +B3*var3*income + B4*country1 + B5*country2 +...+ B6*country1*income + B7*country2*income +...

    Of course, there are more than two country dummy variables. I was thinking that if I created a new variable country_inc=country*income, I could do something like this:

    Code:
    areg var1 var2 var3 income var2*income var3*income, absorb(country country_inc)
    However, absorb only takes one variable. Are there options for getting around this?

    Thank you!!
    Krista

  • #2
    So, if I understand your question, your units of analyses are the high and low income strata within countries, and you want to do a fixed effects model using those strata.

    Code:
    egen stratum = group(income country)
    areg var1 (var2 var3)##income, absorb(stratum)
    Note: This assumes var2 and var3 are discrete variables. If they are continuous then you need to replace -(var2 var3)- by -c.(var2 var3)-.

    Comment


    • #3
      Each country is specified either as a high-income country or a low-income country. Will the method you suggested still work?

      I'm not familiar with using ## to create interaction variables. If var2 is binary and var3 is continuous (there are actually about 10 variables in my model and some are continuous while others are not), would I do something like this?

      Code:
       areg var1 (var2)##income c(var3)##income, absorb(stratum)
      Is that equivalent to:

      Code:
      gen var2_inc = var2*income
      gen var3_inc = var3*income
      areg var1 var2 var3 var2_inc var3_inc, absorb(stratum)
      Thanks again for your help. I really appreciate it.

      Comment


      • #4
        If each country is either high or low income then the interaction of country and income makes no sense. Your unit of analysis here is just the country.

        Regarding the ## notation, read -help fvvarlist- and the corresponding manual section on factor variable notation and interactions. These are among the workhorses of Stata and your life will be easier if you learn to use them.

        Code:
        areg var2##income c.var3##income, areg(country)
        And, yes that will be equivalent to the model you outline in your second code block. (The main effect of income that comes with the use of the ## notation will automatically be dropped because it is collinear with the absorbed country effect.)

        Comment


        • #5
          Yes, I had major collinearity problems when I tried to run the model with all of the interaction terms. I was trying to interact income with all of my variables (including the country dummies) because I ran a Chow test comparing the low and high income countries and rejected the null that the coefficients in the regressions for low income and high income countries were the same. I was then trying to get at the statistical significance of the differences between the coefficients of each variable (between low and high income countries) by looking at the p value of each interaction term.

          Thanks for the tip about -help fvvarlist-. It looks like it could be a huge time saver.

          Comment


          • #6
            I have a quick follow up question. I've decided to add year*month fixed effects. I've created a year*month variable. Would it be appropriate for me to combine country and year*month into a single categorical variable like you did with stratum?

            Thanks again,
            Krista

            Comment


            • #7
              First, don't create a year*month variable. If you want an interaction term between year and month, use factor variable notation in your regression command. That is, include a year##month term. But I'd be surprised if that's really what you want.

              Are you clear on how you are modeling time? There are several things to consider. One that jumps out immediately is whether your data contains more than one observation for each country in each year and month. If country, year, and month jointly single out unique observations, then your "fixed effects" will soak up all the degrees of freedom and you will not be able to estimate the effect of anything else. If you have multiple observations within each country-year-month stratum, then you may be able to proceed.

              If you do proceed with modeling time, and you have data at monthly intervals, then the simplest approach is just to have a variable that ticks off the months. See -help datetimes- for creating and using monthly dates in Stata. If you then want to adjust your analysis for a linear time trend, you would include that monthly date variable as a single continuous covariate. If you really want to treat it as a "fixed effect" with each month being, in effect, an idiosyncratic shock to the system you are modeling, then you would include it as a factor variable with the i. prefix. Note that, so far, I have mentioned nothing about year: the Stata monthly variable incorporates both month and year in a single variable.

              On the other hand, in some contexts, it is important to control for seasonal variation effects that repeat themselves over the years. That is when it is appropriate to include both a month (which would just run from 1 to 12 and would not be a Stata monthly date variable) and a year variable in your model. Here year represents the longer time trends and might be continuous or represented as a factor variable with an i. prefix depending on how the system you are modeling works. Month would definitely be entered as a factor variable with the i. prefix. to capture the cyclic effects that occur within the calendar year.

              No role, really, for year##month interactions here. If you were to include this, it would, in effect, be a kludgy way of modeling time with every single calendar month as an idiosyncratic shock to the system. As indicated above, it is simpler to do that with just a Stata monthly date variable. If you were not planning to absorb this as a fixed effect, there might conceivably be some virtue in a kludgy year##month interaction representation as you might observe some kind of interplay between seasonal effects and longer-term trends. But if you're going to absorb this in an -areg- model, you won't see any of that output anyway, so you might as well stick to the simpler Stata monthly date approach.

              Comment

              Working...
              X