Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel data vs year dummies

    Hi,
    I am struggling what is the difference between using panel data and using year dummies? any advantages of using panel data over year dummies?

    Much appreciated.

  • #2
    I don't see that these are mutually exclusive or even equivalent in any sense. I suggest you firm up what precisely you have in mind as the meaning of "using panel data".

    Comment


    • #3
      Thanks , panel approach is time-serious- cross sectional in one regression.
      not sure if this explanation is enough.

      Comment


      • #4
        Presumably "serious" is a typo for "series", but I am still not sure nevertheless what contrast you have in mind.

        I won't be answering authoritatively and am just trying to elicit a sharper question.

        Comment


        • #5
          Jo, is this a theoretical question or are you trying to find out how to do something specific in Stata with a given set of data? If the latter, it would be helpful if you could describe your data and problem in more detail. The Statalist FAQ has good advice on how to ask questions.

          Comment


          • #6
            I reviewed my question and in more clear way I am looking for the difference between panel approach and pooled OLS? are they same?
            Another question, I have data for all manufacturing and service sectors. so how can I deal with Industry variable? can I create a new variable for manufacturing and another new variable for service, or should i keep Industry variable as dummy i.e 1 manufacturing, 0 service.

            Thanks very much

            Comment


            • #7
              Thanks Friedrich,
              it is a theoretical question.

              Comment


              • #8
                So, to be concrete, let's say you have panel data with panel variable id and time variable time. For linear regression, certain things are equivalent:

                Code:
                regress outcome predictors i.id
                
                // IS EQUIVALENT FOR SOME PURPOSES TO
                
                xtset id
                xtreg outcome predictors, fe
                By some purposes, I mean that both will give the same coefficients, standard errors, t-tests and p-values. The -xtreg, fe- model will give you three different R2 values (between, within, and overall) none of which is exactly the same as the R2 you will get from -regress-. -xtreg, fe- will also give you statistics about within- and between- variation in the outcome and an estimate of the intraclass correlation. The overall model F-tests for -regress- and -xtreg, fe- will not be the same. The differences in overall R2 and overall model F tests occur because -regress- counts the i.id indicators as predictors, whereas the -xtreg, fe- statistics exclude them.

                If you also want your model to include the possibility of time-dependent outcome shocks, and if you do not have repeated values of time within any values of id, then the following two will be equivalent in the same sense:

                Code:
                regress outcome predictors i.id i.time
                
                // EQUIVALENT FOR SOME PURPOSES TO
                
                xtset id time
                xtreg outcome predictors i.time, fe
                Note that you need to explicitly include the i.time indicators in the -xtreg- model even though -xtset- has already identified the time variable. The -fe- option to any of the -xt- commands will incorporate the panel variable fixed effects automatically, but not the time variable.

                Finally, for non-linear models, the analogous codes would not, in general, be equivalent.

                With regard to your second question, if the categories manufacturing and service exhaustively characterize your industries and are mutually exclusive, then only one variable is needed: a single 0/1 indicator. If you create a second indicator coded 1 in the opposite category, then the two variables will be collinear: manufacturing_indicator + service_indicator = 1. The second variable adds no new information, and if you specify both as predictors in any regression model, Stata will drop one of them.

                Comment


                • #9
                  Thanks very much Clyde.
                  your comments much helpful.
                  so you suggest that I use Industry dummy (1 if manufacturing, 0 if service firms).
                  If I do so and p value was not significant (more than 0.01), is this mean that there is no difference between manufacturing firms and services firms regarding their effect on dependent variable? and if p value was significant, is this mean that manufacturing firms have larger effect on dependent variable than service firms?

                  Comment


                  • #10
                    is this mean that there is no difference between manufacturing firms and services firms regarding their effect on dependent variable? and if p value was significant, is this mean that manufacturing firms have larger effect on dependent variable than service firms?
                    I'm not a big fan of that way of interpreting null hypothesis significance tests. I prefer a more nuanced interpretation. First, I wouldn't make a fetish of any particular p-value. It's nice to target 0.01 as a significance level, but I wouldn't get flustered if you got p = 0.011. Similarly I wouldn't get enormously excited over p = 0.009. There are many sources of error in empirical studies and the p-value captures only sampling variation.

                    Second, if I conclude that my coefficient for this variable is not statistically significant, I would not say that this means there is no difference. It means that the difference is small enough that relative to the data's precision to estimate it, we could not detect any difference with confidence. But absence of evidence of a difference should not be interpreted as evidence of absence of a difference.

                    Third, if you do get a p-value you consider significant, it tells you nothing about the direction of the effect. The p-values are two-tailed and can arise whether the manufacturing firms have higher outcomes than the service firms or the other way around. To see which direction the difference goes you have to look at the sign of the coefficient.

                    Fourth, whether you get significant findings or not, you will need to consider the limitations of your model as a representation of the real world: are there important variables that were omitted because you did not have data for them? Is the specification of the variables used in your model correct--or are there perhaps non-linear relationships that are not captured? How valid and reliable are your data as measurements of the intended underlying constructs: might some industries be misclassified in the manufacturing vs service variable, or is that dichotomy even just too crude with appreciable numbers of industries being "grey" cases? How valid is your outcome measure? How well does your model fit the data: have you looked at plots of predicted vs observed outcomes or other regression diagnostics? All of the usual things that need to be considered when interpreting any model apply here.

                    and if p value was significant, is this mean that manufacturing firms have larger effect on dependent variable than service firms?
                    Finally, and this is very important, in your model there is no such thing as the difference between the manufacturing effect and the service effect on the dependent variable. There is a single effect that distinguishes the levels of outcomes between manufacturing and service firms, and that one effect is estimated by the coefficient of your manufacturing vs service indicator variable. It would only be meaningful to speak of separate manufacturing and services effects (and one being larger than the other) if there were another reference category of industries to which these could be (separately) compared.

                    Comment


                    • #11
                      Thanks Clyde,

                      If I separate my industry further (manufacturing 40 firms allocated to 4 subcategory, and service firms are 62 separated also to 7 subcategory). do you think this will be better.
                      and how can I do this given that a single 0/1 indicator will be no longer applicable.
                      is it doable even some subcategory have only 3 or 4 firms.

                      Many thanks

                      Comment


                      • #12
                        If I separate my industry further (manufacturing 40 firms allocated to 4 subcategory, and service firms are 62 separated also to 7 subcategory). do you think this will be better.
                        I can't answer that because a) I don't know what your underlying research goals are here, and b) even if I did, I have no expertise in finance or econometrics, so I wouldn't be able to judge what is better and what is not.

                        how can I do this given that a single 0/1 indicator will be no longer applicable
                        So first create a single variable taking on 11 different values (0 through 10 is fine, or 1 through 11, but any 11 different non-negative integer values will do), each value corresponding to one of the 4 + 7 subcategories you refer to. Let's call this variable industry_type. To keep your sanity when you read your outputs, I recommend also creating a value label that maps those 11 numbers into 11 labels that have some mnemonic value. (See [D] label if you don't know how to create and apply value labels for variables. It's a very important data management skill that you will need to use in Stata.) Then in the regression model you can enter this variable using factor variable notation (-help fvvarlist-):

                        Code:
                        regress outcome other_predictors i.industry_type
                        Note that this also works with -xtreg- or almost any other Stata regression command. When you get your output remember that an n-category variable is always represented by n-1 indicator ("dummy") variables. Stata will, by default, use the lowest numbered level of industry_type as the omitted reference category. (If you want to override that default and select a different level for the reference category, use the ib. notation [again, -help fvvarlist- for more information].)

                        is it doable even some subcategory have only 3 or 4 firms
                        It is doable, yes, but usually not advisable. For one thing, when you run your regression, any observation that has missing values on any variable in the regression model is excluded. So if one of these small subcategories has an observation with some missing values, your 3 or 4 firms could easily drop down to 1 or 2, or even zero in the regression. But more important, if you have only 3 or 4 firms representing a category, your estimates of that category's effect will be very imprecise: the standard errors will be quite large. If possible, it is usually better to use a less fine-grained set of categories. Combining a small category with another one that is reasonably similar from the perspective of the meaning of the classification is often a good idea.

                        Comment


                        • #13
                          Thanks Clyde,
                          your comments are highly valuable.

                          Just you advised me to
                          (See [D] label
                          What's this?

                          Comment


                          • #14
                            It's the section on the -label- commands in the [D] (Data Management) section of the Stata user manual. The easiest way to find it is to first type -help label-. The help file for -label- will open in a Viewer window. Near the top of that you will find, in blue, [D] label. That's a link to the user manual: click on that and the -label- section of [D] will open in your PDF reader.

                            Comment


                            • #15
                              how to interpret the coeffiecient for i.Year??
                              Robust
                              FDI_1 Coef. Std. Err. t P>t [95% Conf. Interval]
                              MS_1 .8362559 .0846373 9.88 0.000 .6691876 1.003324
                              TO .0084677 .0009072 9.33 0.000 .0066769 .0102584
                              INFRA_1 .1229908 .0991032 1.24 0.216 -.0726323 .3186139
                              HC -1.824411 .3869833 -4.71 0.000 -2.58829 -1.060531
                              Year
                              1990 -.2391093 .279986 -0.85 0.394 -.7917831 .3135645

                              Comment

                              Working...
                              X