Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scatter plot with fixed effects

    I am trying to draw a scatter plot of two variables, after having taken into account fixed effects (state and year). The idea is to analyse the correlation between gonorrhea rates and broadband diffusion so that I ran the following regression

    Code:
    tsset state year
    xtreg gonorrhea internet i.year, fe
    Then I collected the residuals using

    Code:
    predict gonorrhea_res, e
    and I plot the residuals against broadband by state running

    Code:
    scatter internet gonorrhea_res, by(state)
    Is this the right way to proceed? I tried to follow some other sources but I am not 100% sure about the code I wrote here.
    Last edited by Alessandro Sovera; 19 Jul 2019, 07:22.

  • #2
    UPDATE:

    I realized I probably made a mistake in my code above. After the regression, I think I should run

    Code:
    predict gonorrhea_hat, xb
    but in this case I would obtain the fitted values of gonorrhea, my dependent variable, computed as b_1*ratio + b_2*state_1+...+b_x*year_1+... Instead, my plan was to discount for the fixed effect and scatter plot ratio vs gonorrhea.

    How should I proceed?

    Comment


    • #3
      Alessandro:
      I've read your original post more than once but I'm still not clear with what you're after (And I suspect this is the reason why you did not receive any reply, so far).
      That said, I assume you want to visually inspect whether your regression model suffers from heteroskedasticity.
      Maybe something along the following lines can help:
      Code:
      . use "http://www.stata-press.com/data/r15/nlswork.dta"
      . xtreg ln_wage age, fe
      
      Fixed-effects (within) regression               Number of obs     =     28,510
      Group variable: idcode                          Number of groups  =      4,710
      
      R-sq:                                           Obs per group:
           within  = 0.1026                                         min =          1
           between = 0.0877                                         avg =        6.1
           overall = 0.0774                                         max =         15
      
                                                      F(1,23799)        =    2720.20
      corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
      
      ------------------------------------------------------------------------------
           ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
             _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
      -------------+----------------------------------------------------------------
           sigma_u |  .40635023
           sigma_e |  .30349389
               rho |  .64192015   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000
      
      . predict fitted, xb
      
      . predict idiosyncratic, e
      
      . scatter idiosyncratic fitted
      
      . scatter idiosyncratic age
      I would also check whether your model with two predictors only is correctly specified.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Thank you Carlo.

        The fact is that is not 100% clear to me too. I have been asked to plot a scatter for the two variables, x and y after taking into account state and year fixed effects. And that is what I thought about. But it is quite weird to me to run a regression and than to check for correltation.

        I think this should be some kind of preliminary analysis to see if it is worthy to move on with the project

        Comment


        • #5
          You might want to have a look at avplot, which I think does what you want. The only problem is that it only works after regress. But you can change
          Code:
          xtreg gonorrhea internet i.year, fe
          to
          Code:
          reg gonorrhea internet i.year i.state
          and then use avplot.

          Comment


          • #6
            Originally posted by Wouter Wakker View Post
            You might want to have a look at avplot, which I think does what you want. The only problem is that it only works after regress. But you can change
            Code:
            xtreg gonorrhea internet i.year, fe
            to
            Code:
            reg gonorrhea internet i.year i.state
            and then use avplot.
            I think this makes sense. Thanks for the advice!

            Comment


            • #7
              Just to clarify--what avplot is doing is equivalent to plotting the residuals of the y and x variables after removing the means with fixed effects. Remember that using fixed effects is equivalent to demeaning both the x and y values in the data by the fixed effect groups. So what you are after with such a plot is to examine the variation in the x and y data after they have been demeaned, which is the variation giving you the coefficient of interest.

              Here is a quick example illustrating how the avplot command works, and equivalent ways of getting the same graph with -reg- or -xtreg-
              Code:
              clear all
              webuse grunfeld
              xtset
              *0. Final model where "invest" is coefficient of interest
              xtreg mvalue invest i.year, fe 
              
              *1. avplot approach
              reg mvalue invest i.year i.company
              avplot invest, name(avplot)
              
              *2. manual avplot with reg
              reg mvalue i.year i.company  //get demeaned y
              predict y_res, res
              reg invest i.year i.company //get demeaned x 
              predict x_res, res
              scatter y_res x_res || lfit y_res x_res, name(not_avplot) ytitle("e( mvalue | X)") ytitle("e( invest | X)") legend(off) note("The same!")
              
              *3. manual avplot with xtreg
              xtreg mvalue i.year, fe  //get demeaned y
              predict y_resfe, e
              xtreg invest i.year, fe  //get demeaned x 
              predict x_resfe, e
              scatter y_resfe x_resfe || lfit y_resfe x_resfe, name(not_avplot_fe) ytitle("e( mvalue | X)") ytitle("e( invest | X)") legend(off) note("Still the same!")
              
              *Note all these graphs show the same thing!
              So Alessandro, you were on the right track with your first post--but the key is that you want to plot residuals for the x variable also. In other words, you want to know the variation in gonorrhea and internet after removing averages by year and state for both variables.

              Comment

              Working...
              X