Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Maria:
    so the issue is to choose the regression model that offers the truest and fiarest view of the data generating process underlying the sample under investigation.
    Otherwise, you can report both -fe- and -re- specification, explaining in your paper the pros and cons of both.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #32
      Thank you. I'll do just that. I just have to know which version is the most true and fair
      Is it okay to take zB einen log from the DV and the IVs since they were heavily skewed to the right?
      what would that imply for my solutions? It since to matter, since it will change the model I'll use. Are there any downsides with taking ether the dv, ivs or both as log values? The dummy always stay dummies right?
      best regards

      Comment


      • #33
        Thank you. I'll do just that. I just have to know which version is the most true and fair
        Is it okay to take zB einen log from the DV and the IVs since they were heavily skewed to the right?
        what would that imply for my solutions? It since to matter, since it will change the model I'll use. Are there any downsides with taking ether the dv, ivs or both as log values? The dummy always stay dummies right?
        best regards

        Comment


        • #34
          Srry for the repeated replies. Keyboard stuck

          Comment


          • #35
            Maria:
            - usually, literature (not statistics) points you to the best specified model;
            - logging a dummy is technically meaningless, as numbers are really levels;
            - there are no downsides in logging right-skewed variable, but the interpretation of them changes. see any decent econometrics textbook.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #36
              Dear Carlo, how are you? I hope you had a wonderful christmas!
              I have another question for my regression, maybe you have an answer.
              I want to divide a variable measuring a fine into three categoreis: namely small fine, medum fine ad large fine. At first, I wanted to create three dummy variables. Then I used the
              Code:
               g Categorysize=recode( SIZE_AVERAGE_REVENUE,10000000000,50000000000,100000000000)
              function. Is that possble?
              Now I would want to see if the öevel of the fine had an effect on R&D Investment. In my regression, I have apre_fine post_fine dummy comparing the level of investments before and afte the fine.
              Can I just create an interaction variable: categorysize * post_fine_dummy?
              And, do I have to put
              Code:
               i.
              in front of this term?
              best regards

              Comment


              • #37
                Maria:
                thanks. I do hope the same for you and your dears.
                What you have in mind is feasible, as per the following toy-example (where -foreign- is basically replaced by -country_car- ):
                Code:
                . use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
                (1978 Automobile Data)
                
                . g country_car=recode(foreign ,0,1)
                
                . regress price i.country_car##i.rep78
                note: 1.country_car#1b.rep78 identifies no observations in the sample
                note: 1.country_car#2.rep78 identifies no observations in the sample
                note: 1.country_car#5.rep78 omitted because of collinearity
                
                      Source |       SS           df       MS      Number of obs   =        69
                -------------+----------------------------------   F(7, 61)        =      0.39
                       Model |    24684607         7  3526372.43   Prob > F        =    0.9049
                    Residual |   552112352        61  9051022.16   R-squared       =    0.0428
                -------------+----------------------------------   Adj R-squared   =   -0.0670
                       Total |   576796959        68  8482308.22   Root MSE        =    3008.5
                
                -----------------------------------------------------------------------------------
                            price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                ------------------+----------------------------------------------------------------
                    1.country_car |   2088.167   2351.846     0.89   0.378     -2614.64    6790.974
                                  |
                            rep78 |
                               2  |   1403.125   2378.422     0.59   0.557    -3352.823    6159.073
                               3  |   2042.574   2204.707     0.93   0.358    -2366.011    6451.159
                               4  |   1317.056   2351.846     0.56   0.578    -3385.751    6019.863
                               5  |       -360   3008.492    -0.12   0.905    -6375.851    5655.851
                                  |
                country_car#rep78 |
                             1 1  |          0  (empty)
                             1 2  |          0  (empty)
                             1 3  |  -3866.574   2980.505    -1.30   0.199    -9826.462    2093.314
                             1 4  |  -1708.278   2746.365    -0.62   0.536    -7199.973    3783.418
                             1 5  |          0  (omitted)
                                  |
                            _cons |     4564.5   2127.325     2.15   0.036      310.651    8818.349
                -----------------------------------------------------------------------------------
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #38
                  Dear Carlo,

                  thank you for the reply.

                  So my regression looks as follows:
                  Code:
                   xtreg RDlog POST_FINE_DUMMY LENIENCY_DUMMY post_len_inter fine_category fine_cat_inter i.year , fe vce(robust)
                  where POST_FINE_DUMMY compares the periods before and after the fine
                  LENIENCY_DUMMY is wether the firm was granted full leniency
                  post_len_inter the interaction of POST_FINE_DUMMY and LENIENCY_DUMMY
                  fine_category te new new variable including s,m,l fine
                  fine_cat_inter the inteaction of fine_category and the POST_FINE _DUMMY

                  Code:
                    xtreg RDlog POST_FINE_DUMMY LENIENCY_DUMMY post_len_inter fine_category fine_cat_inter i.year , fe vce(robust)
                  note: LENIENCY_DUMMY omitted because of collinearity
                  note: fine_category omitted because of collinearity
                  
                  Fixed-effects (within) regression               Number of obs     =      1,446
                  Group variable: ID                              Number of groups  =        145
                  
                  R-sq:                                           Obs per group:
                       within  = 0.0941                                         min =          7
                       between = 0.0547                                         avg =       10.0
                       overall = 0.0000                                         max =         19
                  
                                                                  F(22,144)         =       7.59
                  corr(u_i, Xb)  = -0.0541                        Prob > F          =     0.0000
                  
                                                        (Std. Err. adjusted for 145 clusters in ID)
                  ---------------------------------------------------------------------------------
                                  |               Robust
                            RDlog |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  ----------------+----------------------------------------------------------------
                  POST_FINE_DUMMY |   .1311123   .0464526     2.82   0.005     .0392952    .2229293
                   LENIENCY_DUMMY |          0  (omitted)
                   post_len_inter |  -.0608626   .0858443    -0.71   0.479    -.2305403    .1088152
                    fine_category |          0  (omitted)
                   fine_cat_inter |   .1857077   .7588183     0.24   0.807    -1.314154    1.685569
                                  |
                             year |
                            1997  |   .1408881   .0962422     1.46   0.145    -.0493418     .331118
                            1998  |   .2455628   .0897055     2.74   0.007     .0682532    .4228724
                            1999  |   .2301396   .0954123     2.41   0.017     .0415501    .4187291
                            2000  |   .4015061    .096725     4.15   0.000     .2103219    .5926903
                            2001  |   .3462281   .1049296     3.30   0.001     .1388269    .5536293
                            2002  |   .3215827   .1077026     2.99   0.003     .1087004     .534465
                            2003  |   .2631099   .1081642     2.43   0.016     .0493152    .4769046
                            2004  |   .2568348   .1106968     2.32   0.022     .0380343    .4756352
                            2005  |   .2527749   .1147696     2.20   0.029     .0259241    .4796257
                            2006  |     .27069   .1195602     2.26   0.025     .0343704    .5070096
                            2007  |   .2378767   .1273098     1.87   0.064    -.0137606     .489514
                            2008  |   .1761791   .1325034     1.33   0.186    -.0857237     .438082
                            2009  |   .1640074   .1349029     1.22   0.226    -.1026383    .4306531
                            2010  |   .2286983   .1406743     1.63   0.106    -.0493551    .5067516
                            2011  |   .2681223   .1481219     1.81   0.072    -.0246518    .5608963
                            2012  |   .3901673   .1577436     2.47   0.015     .0783753    .7019594
                            2013  |   .1361936   .1717045     0.79   0.429    -.2031932    .4755803
                            2014  |   .1637635    .181262     0.90   0.368    -.1945144    .5220414
                            2015  |   .5215185   .2107859     2.47   0.015     .1048844    .9381526
                                  |
                            _cons |   18.81203   .1144009   164.44   0.000     18.58591    19.03815
                  ----------------+----------------------------------------------------------------
                          sigma_u |  2.0577645
                          sigma_e |  .33642112
                              rho |  .97396728   (fraction of variance due to u_i)
                  ---------------------------------------------------------------------------------
                  
                  .
                  does that makes sense?

                  Comment


                  • #39
                    Maria:
                    I would strop at the first interaction and rewrite your code in more efficient way, relying on -fvvarlist- for creating categorical variables and interactions:

                    Code:
                    xtreg RDlog i.POST_FINE_DUMMY##i.fine_category  i.LENIENCY_DUMMY  i.year , fe vce(robust)
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #40
                      thank you carlo.
                      I have some trouble understanding the regression correctly. for clarification purposes:
                      1)
                      Code:
                       ##
                      accounts for main effects and interaction effects of the vaiable, right? so POST_FINE_DUMMY, fine_cat and their interaction?
                      2) why is the
                      Code:
                       i.
                      in front of the POST_FINE_DUMMY and the LENIENCY_DUMMY needed?
                      3) what happened to the interaction between POST_FINE_DUMMY and LENIENCY_DUMMY? was that just left out because my regression looked to confusing and you may have overlooked it? or does have a reason?
                      4) the dummys are already created. would I still need the
                      Code:
                       i.
                      i front? or would that be double? I mean, POST_FINE_DUMMY indicates wether it is pre or post fine. Would I still put the prefix
                      Code:
                      i.
                      in front?
                      thank you ver much for the help!

                      EDIT: unfortunately, the fine_categories include non-integer values since they are so small (0.02..) since te are ratios, so I cannot make a factor variable out of them i guess?
                      Last edited by Maria Kohnen; 28 Dec 2017, 09:06.

                      Comment


                      • #41
                        Maria:
                        1) yes, you're right. Double # allows both conditional main effect and interactions;
                        2) the -i.- operator tells Stata to treat the variables included in the interactions as categorical. You can omit it when dealing with a two-level categorical variable. However, I've pursued the habit to use it always, as it is one of the take-home message I've learnt from -help fvvarlist-.
                        3) your regression code looked a bit sparse, actually. Besides, if you include another interaction, I suspect that you will lose something along the way due to collinearity.
                        Anyway, assuming it makes sense in your research field, you may want to try:
                        Code:
                        xtreg RDlog i.POST_FINE_DUMMY##i.fine_category i.POST_FINE_DUMMY##i.LENIENCY_DUMMY i.year , fe vce(robust)

                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #42
                          Dear Carlo,

                          thank you for the reply. Unfortunately my fine_category variable, indicating the three different fine levels small,medium and large are non-integer values, because they are ratio numbers and very small. Should I rather then just indluce dummy variables for medium_fine and large_fine (small as a baseline), or is there something that can be done about the categorial variable?

                          Comment


                          • #43
                            Maria:
                            you can recode -fine_category-.
                            Perhaps the following toy-example can give you some hints about how to do it:
                            Code:
                            . set obs 100
                            number of observations (_N) was 0, now 100
                            
                            . g x=runiform()
                            
                            . pctile tertiles = x, nq(3)
                            
                            . tab tertiles
                            
                            percentiles |
                                   of x |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                               .3200437 |          1       50.00       50.00
                               .7167162 |          1       50.00      100.00
                            ------------+-----------------------------------
                                  Total |          2      100.00
                            
                            . su x
                            
                                Variable |        Obs        Mean    Std. Dev.       Min        Max
                            -------------+---------------------------------------------------------
                                       x |        100    .4973389      .30869   .0030522   .9874847
                            
                            . g index=0 if x<=.3200437
                            (67 missing values generated)
                            
                            . replace index=1 if x>.3200437 & x<=.7167162
                            (34 real changes made)
                            
                            . replace index=2 if x>.7167162 & x!=.
                            (33 real changes made)
                            
                            . label define index 0 "small" 1 "medium" 2 "large"
                            
                            . label val index index
                            
                            . tab index
                            
                                  index |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                  small |         33       33.00       33.00
                                 medium |         34       34.00       67.00
                                  large |         33       33.00      100.00
                            ------------+-----------------------------------
                                  Total |        100      100.00
                            
                            .
                            Kind regards,
                            Carlo
                            (Stata 19.0)

                            Comment


                            • #44
                              thank you very much Carlo, much appreciatet!

                              Comment


                              • #45
                                Dear Carlo,
                                I have a question about the
                                Code:
                                 xtoverid
                                function. I used it to choose between the fe and re model, since the hausman test cannot be down with robust SE.
                                1)when do i have to use robust SE? when I suscept heteroskedasticity? if there is none, is Hausman okay?
                                2) I want to describe why I used the xtoverid function, but need to base my explanation on literature. I looked through the Arellano paper and through Wooldridge but honestly I am not too familiar with econometrics to interprete these two in a proper manner. MAy you please explain to me shortly why xtoverid is used, what the test is based on (Sargan Hansen test? Arellano?...), and if there are any sources to quote that may be more easy to understand?
                                thank you very much

                                Comment

                                Working...
                                X