Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Getting error while creating interaction dummy with year and another variable

    Hello,

    I'm trying to create distance and time interaction dummy variables with my sample dataset given below. Here wanted is the distance variable between two counties in miles.

    I want to create an interaction dummy of this wanted variable with the year-month dummy of my data which is presented by ym.

    Even though, I tried creating the dummy with the following code it's returning me an error. May I know why ? Also wanted have a lot of missing observations. I don't know if it's relevant or not. Just letting you know

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(wanted ym year)
    10 440 1996
    10.96 438 1996
    15 442 1996
    20 436 1996
    21.34 441 1996
    12 438 1996
    10 441 1997
    15 440 1997
    16 432 1997
    20 436 1998
    21.22 437 1998
    31 438 1998
    41 442 1999
    50 443 1999
    end
    Code:
    g want_year = c.wanted##i.ym
    invalid matrix stripe;
    c.wanted##i.ym
    r(198);
    
    end of do-file
    
    r(198);

  • #2
    Tariq, try

    Code:
    xi i.ym*wanted

    Comment


    • #3
      The advice in #2 should work. But why do you want to create this variable? If your purpose is to include an interaction between them in a regression model, then not only is there no need to do this, it is actually counter-productive to do so. You can just do:

      Code:
      regression_command outcome_variable i.ym##c.wanted perhaps_other_variables
      Not only does this save you a step, but when it comes time to interpret the results of this interaction model, you will be able to use the -margins- command, which will save you time, and more importantly, large amounts of pain and confusion. If you are not familiar with this approach, I recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats2/l53.pdf.

      You will not get an error message from this syntax. The reason you got an error message in what you tried in #1 is that you were trying to generate a single variable, but i.ym#c.wanted is not a single variable: it is a family of variables: one for each value of ym that appears in the data.

      That said, I have some concerns about this approach altogether. In your example data, the range of values for ym runs between 436 and 443. While not every value in between is instantiated in the example, if they are in the full data set, then there are 8 distinct values of ym. Perhaps in the full data set there are even more. Do you really intend to estimate a model in which the marginal effect of the variable wanted not only differs in every month, but does so in an arbitrary, non-systematic way? Do you really want to ignore the fact that ym is in fact at least an ordinal-level variable, or even an interval-level one? Maybe you really want c.ym##c.wanted?

      Comment


      • #4
        So kind and generous of both of you for giving me the direction I was missing. Highly obliged for the meaningful advice.

        on the topic of c.ym, Mr. Schechter you are absolutely right. With my method I wasn’t going nowhere but as soon as I took the approach of c.ym it worked and gave me the result. So grateful for having such a kind community for novice students like us ! Have a good rest of the day everyone

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          The advice in #2 should work. But why do you want to create this variable? If your purpose is to include an interaction between them in a regression model, then not only is there no need to do this, it is actually counter-productive to do so. You can just do:

          Code:
          regression_command outcome_variable i.ym##c.wanted perhaps_other_variables
          Not only does this save you a step, but when it comes time to interpret the results of this interaction model, you will be able to use the -margins- command, which will save you time, and more importantly, large amounts of pain and confusion. If you are not familiar with this approach, I recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats2/l53.pdf.

          You will not get an error message from this syntax. The reason you got an error message in what you tried in #1 is that you were trying to generate a single variable, but i.ym#c.wanted is not a single variable: it is a family of variables: one for each value of ym that appears in the data.

          That said, I have some concerns about this approach altogether. In your example data, the range of values for ym runs between 436 and 443. While not every value in between is instantiated in the example, if they are in the full data set, then there are 8 distinct values of ym. Perhaps in the full data set there are even more. Do you really intend to estimate a model in which the marginal effect of the variable wanted not only differs in every month, but does so in an arbitrary, non-systematic way? Do you really want to ignore the fact that ym is in fact at least an ordinal-level variable, or even an interval-level one? Maybe you really want c.ym##c.wanted?
          Mr. Schechter, Looking forward to your guidance if your time allows it once more about the following issue.

          reg dependent male ismarried wasmarried age black asian hispanic lths hsdegree i.county c.ym##c.distance , cluster(county)

          Here, c.ym##c.distance is my IV where ym stands for year-month indicator. Distance is continuous variable in my data. With this IV I'm predicting dependent variable.

          However, when IV is interaction variable I'm struggling to do the first stage F-test.

          * F test on the excludability of c.ym##c.distance in the county from the first stage regression.

          test c.ym##c.distance

          I'm getting error message that there is problem of matrix.

          Would you kindly let me know how I can do the first stage F-test or is there something I'm doing wrong ?

          Comment


          • #6
            Code:
            test c.ym#c.distance // NOTE ONLY ONE #
            If you have already read -help fvvarlist-, re-read it so you understand the distinction between the ## and # operators and where each is used. c.x##c.y represents three variables: x, y, and the x#y interaction. The -test- command works with the contents of the e(b) matrix that estimation commands leave behind. In your case, that matrix has columns for ym, distance, and c.ym#c.distance. But there is no column in that matrix named c.ym##c.distance: there can't be because each matrix column must correspond to a single variable.

            So the test command I show at the top of this post will test the interaction between ym and distance. But there is no need for this because all it will end up doing showing you the exact same results that are already in the c.ym#c.distance row of the regression output table itself.

            If you wish to test the excludability of ym, distance, and their interaction, that is a different matter. That would be done with:
            Code:
            test ym distance c.ym#c.distance

            Comment


            • #7
              That's so clear and informative and now I know what I did wrong. Thanks again, Mr. Schechter for clarifying my misunderstanding of econometrics and Stata!

              Comment


              • #8
                I am happy to be able to clarify your understanding of Stata. But please let me be clear: I am not an economist, nor an econometrician. I am an epidemiologist with an emphasis on statistics and computation. You should definitely not be learning econometrics from me.

                Comment

                Working...
                X