Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Clyde, thank you again…

    As you recommended, I draw the scatter plot for ‘lconsum’ (the vertical line) and ‘heatscore’ (the horizontal line). This is what I obtained:

    twoway scatter lconsum heatscore
    Click image for larger version

Name:	1.png
Views:	1
Size:	15.2 KB
ID:	1478565

    twoway scatter lconsum heatscore if calday < td(02feb2018)
    Click image for larger version

Name:	3.png
Views:	1
Size:	16.3 KB
ID:	1478566

    It is hard to tell the kind of the relationship between these variables.
    Last edited by Katherine Adams; 13 Jan 2019, 08:32.

    Comment


    • #47
      Carlo, yes, I totally agree (#42).

      Comment


      • #48
        I think Carlo's response in #45 may well account for much of the disciplinary difference. I would add another possible source: randomized studies are uncommon in economics, but common in health care. In a randomized study, the independence of the error terms from the fixed effects is a very reasonable assumption, and under those circumstances the random effects model coefficient estimates are consistent.

        Comment


        • #49
          Re #46, because you graph has large numbers of points superimposed on it, I agree it is difficult to see what is going on here. One suggestion that might help is to try
          Code:
          lowess lconsum heatscore if calday < td(02feb2018)
          The lowess curve will probably be more informative.

          Comment


          • #50
            Clyde, thank you for your help! Unfortunately, I was not able to see the result of the 'lowess' command since it took too much time to run it.


            However, now I have another quick question. Suppose I do not use the full factorial in the following code (the code is the same as that in your reply #39; I just use 'areg' instead of 'xtreg', and I also use different names for some of the vars):

            areg lconsum i.randomgr##i.tp i.month c.calmonth, absorb(location) vce(cluster location)

            So, my new code will be:

            areg lconsum i.randomgr#i.tp i.month c.calmonth, absorb(location) vce(cluster location)


            What effect will this change have on my estimates?

            In both cases, I will have some omitted variables because of collinearity, which is OK (in the first case with ##, I will have i.randomgrp omitted; in the second case with #, I will have i.randomgrp#i.tp omitted). So, what is the difference?
            Last edited by Katherine Adams; 21 Jan 2019, 15:25.

            Comment


            • #51
              In your data, randomgrp is constant within location, and tp (I assume this is your new name for post_intervention) is constant within time (c.calmonth). So both of these variables will be omitted automatically by Stata. If you change the code from ## to #, as you propose, you are simply doing that work for Stata. There is no real harm in that: everything will come out the same. But I don't recommend it for two reasons.

              1. It is better to get in the habit of always specifying interactions with ## and not #, because when you use # it is all to easy to forget to also include the constituent effects. Any model that includes an interaction without also including the constituent effects is mis-specified, unless those constituent effects are omitted due to colinearity (as here). While it is perfectly OK to go to # when the colinearity is present, it is much too easy to mistakenly omit one of the constituent effects by accident in circumstances where colinearity is not present. So the use of ## is foolproof; # is not.

              2. Even though no damage is caused by using # instead of ##, when you know that the data design creates colinearity between constituents of the interaction and other model variables, when you read your output, if you see that the expected omissions did not occur, then you know that there is an error in your data. It is far better to find this out now than after you have blundered along farther in your analysis plan and invested time in creating spurious results. The sooner you discover problems with the data, the better. So ## provides you the added benefit of giving you a validity check on one aspect of your data.

              You mentioned that i.randomgrp#i.tp gets omitted. You really don't want that to happen, as that is the key variable for interpreting your results to answer your research question. (The results are equivalent either way, and, in principle, it would still be possible to recover the i.randomgrp#i.tp effect, but that calculation involves a bunch of matrix algebra that is easy to get wrong.) It means that i.randomgrp##i.tp is colinear with something else in the model. What you should do is eliminate that something else instead. The simplest way to do that is to rearrange the order in which the variables are listed in the regression command. Stata chooses to break colinearities based on the order of appearance of the variables, so if you don't like what it chose, rearranging will solve that.

              Comment


              • #52
                Dear all,

                I am joining this list to ask some advice as well. I have recently asked a question on Statalist which is related to this thread. Unfortunately I have got no response yet. I am new to Statalist, so I am still learning how to post properly. I apologize if my post is not appropriate under this conversation. I would be very glad if any of you could maybe have a look at my post as well. Thanks.

                Harold

                Comment


                • #53
                  I have another question (I hope it will be among the last ones) about my model (see post #38).

                  So, I have the following regression model:
                  xtreg lconsum i.heatscore##i.randomgr##i.tp i.month c.daymntemp##c.daymntemp c.calmonth, fe vce(cluster location)

                  Now, I need to modify it; in particular, my ‘new’ model should have month and household fixed effects, as well as the calendar month time trend. So, my code is now as follows:

                  reghdfelconsum i.heatscore##i.randomgr##i.tp c.calmonth, absorb(location month) vce(cluster location)

                  Also, I was asked to add month-of-sample, week-of-sample, and month by household fixed effects. Could you please tell me how I can do this? And, if I add these fixed effects, should I drop the month fixed effect I had before?

                  Thank you.


                  P.S. I am also struggling with an event study for the model (https://www.statalist.org/forums/for...study-analysis) - I will appreciate any help.

                  Comment


                  • #54
                    Katherine:
                    - you have too many interactions in your model(s); I guess that collinearity would be an issue with all those time variables and disseminating your results difficult.
                    Can't you simplify things a bit?
                    Kind regards,
                    Carlo
                    (Stata 18.0 SE)

                    Comment


                    • #55
                      Carlo,

                      Yes, I should have done this earlier…

                      I have panel data for 2017-2018. It is a RCT. The treatment (started on February 2, 2018) is actually a specific type of a bill sent to a household which includes a comparison between a household’s energy use and its neighbors. It is expected that the treatment will reduce the energy use of treated households.

                      Suppose, my simplified diff-in-diff model is:
                      xtreg lconsum i.randomgr##i.tp, fe vce(cluster location)

                      Now, I need to modify it; in particular, my ‘new’ model should have household fixed effects, as well as the calendar month time trend. So, my code is now as follows:
                      reghdfe lconsum i.randomgr##i.tp c.calmonth, absorb(location) vce(cluster location)

                      Then, I was asked to add month-of-sample, week-of-sample, and month by household fixed effects. How can I do this?

                      Code:
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input long location str9 date float(year month day calday calmonth lconsum) byte randomgrp float tp
                      500001 "01-JAN-17" 2017 1  1 20820 684  4.331219 0 0
                      500001 "02-JAN-17" 2017 1  2 20821 684  4.395176 0 0
                      500001 "03-JAN-17" 2017 1  3 20822 684 4.4484995 0 0
                      500001 "04-JAN-17" 2017 1  4 20823 684 4.4349075 0 0
                      500001 "05-JAN-17" 2017 1  5 20824 684 4.3300653 0 0
                      500001 "06-JAN-17" 2017 1  6 20825 684  3.984616 0 0
                      500001 "07-JAN-17" 2017 1  7 20826 684 4.2140527 0 0
                      500001 "08-JAN-17" 2017 1  8 20827 684 4.4064745 0 0
                      500001 "09-JAN-17" 2017 1  9 20828 684 4.2368575 0 0
                      500001 "10-JAN-17" 2017 1 10 20829 684 4.3243986 0 0
                      end
                      format %td calday
                      format %tm calmonth

                      Variables:
                      location; household’s location id
                      date
                      year
                      month
                      day
                      calday; day and year 01jan2017
                      calmonth; month and year 2017m1
                      lconsum; log of energy consumption
                      randomgr; treatment indicator; one of three treatment groups (can be 0,1,2,3)
                      tp; post-treatment variable; gen tp = (calday >= td(02feb2018))


                      P.S. A difficulty with an event study for this model...
                      https://www.statalist.org/forums/for...study-analysis
                      Last edited by Katherine Adams; 31 Jan 2019, 10:19.

                      Comment


                      • #56
                        Katherine:
                        I think that a different specification of your model is mandatory:
                        Code:
                        . reghdfe lconsum i.randomgr##i.tp c.calmonth, absorb(location) vce(cluster location)
                        (converged in 1 iterations)
                        note: 0.randomgrp omitted because of collinearity
                        note: 0.tp omitted because of collinearity
                        note: 0.randomgrp#0.tp omitted because of collinearity
                        note: calmonth omitted because of collinearity
                        
                        HDFE Linear regression                            Number of obs   =         10
                        Absorbing 1 HDFE group                            F(   0,      0) =       0.00
                        Statistics robust to heteroskedasticity           Prob > F        =          .
                                                                          R-squared       =    -0.0000
                                                                          Adj R-squared   =    -0.0000
                                                                          Within R-sq.    =     0.0000
                        Number of clusters (location) =          1        Root MSE        =     0.1386
                        
                                                       (Std. Err. adjusted for 1 clusters in location)
                        ------------------------------------------------------------------------------
                                     |               Robust
                             lconsum |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                         0.randomgrp |          0  (empty)
                                0.tp |          0  (empty)
                                     |
                        randomgrp#tp |
                                0 0  |          0  (empty)
                                     |
                            calmonth |          0  (omitted)
                        ------------------------------------------------------------------------------
                        
                        Absorbed degrees of freedom:
                        ---------------------------------------------------------------+
                         Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     |
                        -------------+-------------------------------------------------|
                            location |            0               1              1 *   |
                        ---------------------------------------------------------------+
                        * = fixed effect nested within cluster; treated as redundant for DoF computation
                        
                        .
                        Technically, you can add the other time-variable as predictors via -fvvarlist- notation (ie, -i-. prefix).
                        But Im afraid that they will only worsen the correlation issue.
                        Kind regards,
                        Carlo
                        (Stata 18.0 SE)

                        Comment


                        • #57
                          Carlo,

                          I generated an alterative treatment measure:
                          gen treatalt = (calday >= td(02feb2018) & randomgrp>0)

                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input long location str9 date float(year month day calday calmonth lconsum) byte randomgrp float(tp treatalt)
                          500001 "01-JAN-17" 2017 1  1 20820 684  4.322219 0 0 0
                          500001 "02-JAN-17" 2017 1  2 20821 684  4.386176 0 0 0
                          500001 "03-JAN-17" 2017 1  3 20822 684 4.4473995 0 0 0
                          500001 "04-JAN-17" 2017 1  4 20823 684 4.4338075 0 0 0
                          500001 "05-JAN-17" 2017 1  5 20824 684 4.3310753 0 0 0
                          500001 "06-JAN-17" 2017 1  6 20825 684  3.974716 0 0 0
                          500001 "07-JAN-17" 2017 1  7 20826 684 4.2140517 0 0 0
                          500001 "08-JAN-17" 2017 1  8 20827 684 4.4054755 0 0 0
                          500001 "09-JAN-17" 2017 1  9 20828 684 4.2358565 0 0 0
                          500001 "10-JAN-17" 2017 1 10 20829 684 4.3244976 0 0 0
                          end
                          format %td calday
                          format %tm calmonth
                          and ran the following regression:
                          reghdfe lconsum treatalt tp c.calmonth, absorb(location) vce(cluster location)


                          It should work this time...

                          Comment


                          • #58
                            Katherine:
                            I'm afraid that the answer is, again, no:
                            Code:
                            . reghdfe lconsum treatalt tp c.calmonth, absorb(location) vce(cluster location)
                            (converged in 1 iterations)
                            note: treatalt omitted because of collinearity
                            note: tp omitted because of collinearity
                            note: calmonth omitted because of collinearity
                            
                            HDFE Linear regression                            Number of obs   =         10
                            Absorbing 1 HDFE group                            F(   0,      0) =       0.00
                            Statistics robust to heteroskedasticity           Prob > F        =          .
                                                                              R-squared       =     0.0000
                                                                              Adj R-squared   =     0.0000
                                                                              Within R-sq.    =     0.0000
                            Number of clusters (location) =          1        Root MSE        =     0.1402
                            
                                                           (Std. Err. adjusted for 1 clusters in location)
                            ------------------------------------------------------------------------------
                                         |               Robust
                                 lconsum |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                                treatalt |          0  (omitted)
                                      tp |          0  (omitted)
                                calmonth |          0  (omitted)
                            ------------------------------------------------------------------------------
                            
                            Absorbed degrees of freedom:
                            ---------------------------------------------------------------+
                             Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     |
                            -------------+-------------------------------------------------|
                                location |            0               1              1 *   |
                            ---------------------------------------------------------------+
                            * = fixed effect nested within cluster; treated as redundant for DoF computation
                            
                            .
                            However, if Stata gave you back something better using the full sample, please share it with the list. Thanks.
                            Kind regards,
                            Carlo
                            (Stata 18.0 SE)

                            Comment


                            • #59
                              Strange... It works well on my data (the full dataset, I mean):

                              Attached Files

                              Comment


                              • #60
                                Katherine:
                                now it sounds good (on the whole sample, I mean).
                                Kind regards,
                                Carlo
                                (Stata 18.0 SE)

                                Comment

                                Working...
                                X