Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Comparing change in R squared over time, two groups

    Dear all,

    I have a question regarding how to perform a specific linear regression in Stata 14.2. I would like to compare the change in adjusted R squared over a number of consecutive years for two groups, in order to be able to say something about the value-relevance of accounting information for these two groups. I know that this might be not the best statistical method to do so, though it seems to be an accepted method in the field ofaccounting research.

    The regression equation which I want to use is from this paper: Francis, J., & Schipper, K. (1999). Have Financial Statements Lost Their Relevance? Journal of Accounting Research, 37(2), 319-352. doi:10.2307/2491412. The equation for the linear OLS regression is as follows: ADJUSTEDR2i,t = k0 + k1HIGHi,t * t + k2LOWi,t * t + εi,t . In this equation, the variable HIGH has the value 1 if a company is classified as a hightech company and has the value 0 if a company is classiefied as a lowtech company. The opposite applies for the variable LOW. The variable t is an indicator for the year, which starts at 1 for the first year and ends in this paper at 43 for the final year. Please let me know if this description is too elaborate.

    As I am unfortunately unable to share my real data on the internet, here is some code to generate some 'fake' data.

    Code:
    clear
    set obs 24
    input fyear hightech lowtech
    1993 0 1
    1994 0 1
    1995 0 1
    1996 0 1
    1997 0 1
    1998 0 1
    1999 0 1
    2000 0 1
    2001 0 1
    2002 0 1
    2003 0 1
    2004 0 1
    1993 1 0
    1994 1 0
    1995 1 0
    1996 1 0
    1997 1 0
    1998 1 0
    1999 1 0
    2000 1 0
    2001 1 0
    2002 1 0
    2003 1 0
    2004 1 0
    set seed 12345
    gen adj_r2 = runiform()
    label variable fyear "Fiscal year"
    label variable hightech "Classified as a hightech company"
    label variable lowtech "Classified as a lowtech company"
    label variable adj_r2 "Adjusted R squared"
    gen yearid = fyear-1992
    label variable yearid "ID for the fiscal year"
    Based on this data I tried to to regress the equation in Stata using the following code:
    Code:
     regress adj_r2 i.hightech#i.yearid i.lowtech#i.yearid
    As the output is quite long I won't post it here, as my post is already a long one. However, after regressing Stata shows several notes that the i.hightech#i.yearid and i.lowtech#i.yearid terms have been omitted because of collinearity. As a result of that I only get values for the coefficients, but not for the standard error and t-values etc. Therefore I was wondering whether I am doing something wrong in Stata, or that this model is subject to the dummy variable trap (because of the collinearity). As both the hightech and lowtech indicators appear in the regression equation, the collinearity seems logical to me.

    Thanks in advance!


  • #2
    You will note that when lowtech equal 1 hightech equals 0 and vice versa. Thus one variable determines the other and you have perfect collinearity. One indicator variable, say hightech, contains all the relevant information. If you drop one of the two variables you should find that the equation runs.When I first read this, I assumed that the journal example you cited actually had three (or more) levels of tech involvement. If there were three, for example, highech and lowtech would provide contrasts to, say, midtech. On the other hand, if there are only two levels of tech, the equation you quote would not run, as you discovered in your example. All that said, I can't quite figure out what you mean with regard to increments to R-squared.
    Richard T. Campbell
    Emeritus Professor of Biostatistics and Sociology
    University of Illinois at Chicago

    Comment


    • #3
      Roel:
      welcome to the list.
      As an aside to Dick's helpful advice (and without being that clear with your code), please note that is quite unusual to plug in interaction among predictors without taking account of the main conditional effect of each predictor, too.
      Eventually, in case of need, -testparm i.yearid- can test whether that variable reaches statistical singnificance (with all the well known limitations that this statement usually brings about).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Dear Dick and Carlo,

        Thank you very much for your helpful replies!

        Originally posted by Dick Campbell View Post
        You will note that when lowtech equal 1 hightech equals 0 and vice versa. Thus one variable determines the other and you have perfect collinearity. One indicator variable, say hightech, contains all the relevant information. If you drop one of the two variables you should find that the equation runs.When I first read this, I assumed that the journal example you cited actually had three (or more) levels of tech involvement. If there were three, for example, highech and lowtech would provide contrasts to, say, midtech. On the other hand, if there are only two levels of tech, the equation you quote would not run, as you discovered in your example. All that said, I can't quite figure out what you mean with regard to increments to R-squared.
        Dick, I've just read the article ( Francis, J., & Schipper, K. (1999). Have Financial Statements Lost Their Relevance? Journal of Accounting Research, 37(2), 319-352. doi:10.2307/2491412) another time and think that your point about midtech companies might be true. The midterm companies would function as a kind of baselevel then, as dummy variables normally function? The main problem in that case is that I've used another method to select the hightech companies, and classified all the other firms as lowtech (or non-hightech), so I don't have midtech firms in my sample.

        To answer your question, value relevance means whether accounting numbers like the bookvalue per share and earnings per share can predict the price or returns of stocks. This concept is operationalized through R squared, what percentage of the variance in the price or returns of a stock can be predicted by accounting numbers. I would like to examine whether there is a significant difference in the change of value relevance between these two groups over a number of years (and thus the change in R squared).

        Originally posted by Carlo Lazzaro View Post
        Roel:
        welcome to the list.
        As an aside to Dick's helpful advice (and without being that clear with your code), please note that is quite unusual to plug in interaction among predictors without taking account of the main conditional effect of each predictor, too.
        Eventually, in case of need, -testparm i.yearid- can test whether that variable reaches statistical singnificance (with all the well known limitations that this statement usually brings about).
        Carlo, I was aware of that. Thank you for the code regarding how to test for statistical significance.
        Last edited by Roel Roeleveld; 13 Jul 2017, 03:15.

        Comment


        • #5
          Roel:
          thanks for providing further details.
          Just an aside: as other listers might be interested in the article you quoted, full reference would be highly welcomed (as recommended by the FAQ).
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            So, as your original code sort of indicated, you are, as near as I can tell, interested in an interaction, specifically do the effects of your independent variables on some outcome change over time. At least that is what is implied when you say "I would like to examine whether there is a significant difference in the change of value relevance between these two groups over a number of years (and thus the change in R squared)." I'm not quite sure what "change in value relevance" means, however, The statement seems to imply that you have other variables that you haven't told us about. You could do this by looking at equations within year and comparing regression coefficients and R2 statistics, but that is not the best way to go given my understanding of what you want to know. Instead, I think you want an interaction involving tech sector and time. The equation you showed in your first post sort of gets at that but, as Carlo indicated, it is incorrectly specified. Finally, you don't tell how many years are involved in your actual data, but I would guess it is more than you show in the example data. In any case, you probably want to treat year in linear form or perhaps in some other way such as quadratic (year2) but you almost certainly don't want to do i.year which tells Stata to construct an indicator variable for each year less one. You might clarify your thinking about all of this by sketching a graph or two that give you a visual guide to what you think is going on and then writing Stata code to properly capture the equations.
            Richard T. Campbell
            Emeritus Professor of Biostatistics and Sociology
            University of Illinois at Chicago

            Comment

            Working...
            X