Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Time dummy variables do not work

    Hi, My name is Shane and i am an MSc Economics student. I have recently joined this forum so apologies in advance if i accidentally break any of the rules.

    I am currently doing my dissertation and have come across an issue with Stata. My dataset is a 4 year panel dataset of university tuition fees and their characteristics. There are 5 dummy variables which are fixed over the 4 years. Naturally, when using a fixed effect regression, stata will of course eliminate these as they are fixed overtime. However these dummies are crucial. I have been advised by my supervisor to run it as a cross section, not panel but instead put time dummies in them. i.e Year2016, Year2015, Year2014 etc.

    Unfortunately, i created Year2016, Year2015, Year2014 and Year2013. and included Year2016, Year2015, Year2014 in the regression to avoid multicolleanrty problem, but it comes up with the "no observation" error but there are data available so i'm baffled as to what the issue is.

    Thanks for reading this.

    Shane

    That's how i have created the dummies. There are 4 years in the dataset, 2016, 2015, 2014 and 2013

    gen Year2016=1 if year==2016
    replace Year2016=0 if year==2015

    gen Year2015=1 if year==2015
    replace Year2015=0 if year==2014

    gen Year2014=1 if year==2014
    replace Year2014=0 if year==2013

    gen Year2013=1 if year==2013
    replace Year2013=0 if year==2014

  • #2
    Shane:
    welcome to the list.
    First off, I would recommend you not to create categorical variables by hand, but to rely on the wonderful capabilities of -fvvarlist-, instead;
    second, I suspect there's something wrong with the way you created dummies by hand. I think it should be (2016 example fits all the remaining dummies):
    Code:
    gen Year_2016=1 if year==2016
    replace Year_2016=0 if year!=2016 & year!=.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      So Year2016 will be missing for 2013 and 2014, Year2015 will be missing for 2013 and 2016,and so forth. Put all your variables in, and there will be missings in every observation.

      I'd guess you'd be better off with (0, 1) variables, namely

      Code:
      forval y = 2013/2016 {
            gen year`y' = year == `y'
      }

      Comment


      • #4
        First of all, there is no need to create explicit year indicator ("dummy") variables. Use factor-variable notation instead and Stata will create "virtual" dummy variables for you on the fly. So your regression command would look like this:

        Code:
        regression_command outcome_variable predictors i.year
        Not only is it more convenient, it will be done correctly. Your code contains errors that are causing the problem you describe. Here's what's going on.

        Let's look first at the variable Year2016:
        Code:
        gen Year2016=1 if year==2016
        replace Year2016=0 if year==2015
        What does this do for any observation where year is 2013, or 2014? You don't get a value of 1, because it's not 2016. And you don't get a value of 0, because it's not 2015. So your Year2016 variable has missing values for any observation where year is 2013 or 2014. Each of your Year20* variables has this same problem--the only difference is which years lead to missing values.

        In any regression command, an observation can only enter the estimation sample if it has non-missing values for every variable in the regression. In your situation, because of the errors in the code for the Year20* variables, every observation will contain missing values for at least 2 of those. That's why your estimation sample is empty and you get the "no observations" error message.

        Added: Crossed with Carlo and Nick's responses.

        Comment


        • #5
          Clyde, Carlo and nick. Thank you so much for your detailed explanation, i really appreciate it. I totally understand why this function is better and makes sense. So that's how the regression should look then (Image attached)? Because its missing 2014 for some reason.
          Attached Files
          Last edited by Shane Jameson; 20 Jun 2017, 14:10.

          Comment


          • #6
            Maybe it's OK and maybe it's an error. Let's clarify some things about your data. My understanding is that:

            1. The year variable takes on exactly four values: 2013, 2014, 2015, and 2016.
            2. There are actually observations that contain non-missing values on all of the regression variables in each of those years.

            Is that correct? If so, we would ordinarily expect Stata to give you output for 2014, 2015, and 2016 (omitting 2013 as the reference category). There are two explanations that pop into my head for why you see nothing for 2014, and each one seems equally likely to me, so you'll have to check which is actually going on.

            A. You do not in fact have any observations for 2014 that are completely free of missing values on all of your regression variables. There is a simple way to detect this. Run -count if e(sample) & year == 2014-. If Stata says zero, then this is the source of the missing 2014 variable. Your next step would then be to determine why you don't have any suitable observations in 2014 (corrupted data set, error in data management while creating it, missing values in variables that shouldn't be missing?) and fix that problem.

            B. You have another variable in your model that serves, either intentionally or inadvertently, to distinguish time periods. For example, if there was some policy that went into effect in 2015, and you included a variable that distinguishes the pre-policy period from the post-policy period. In that case, your pre vs post variable would be colinear with the remaining time indicators and something would have to be omitted--Stata chose to omit 2014.year rather than one of the others or the pre- vs post- variable itself. If this is the case, and if it matters to you which variable Stata omits, there are ways to deal with that. (See -help fvvarlist- for details on specifying which level(s) of a factor variable to omit.) If your sense is that both this pre-post variable and all years 2014 through 2016 are crucial, then you are going to be sorely disappointed because it is mathematically impossible to have them all included when they are colinear.

            I advise you to read the FAQ, especially #12, for excellent advise on how to most effectively post on this Forum. Screen shots are highly discouraged. Yours happens to be readable (on my computer, at least), but many are not. And even though it is readable, it does not show the complete output, nor does it show the command that you gave to produce this. So clues about your situation that might have enabled me to guess which of A or B applies (or perhaps that neither does and something else is wrong) are not there. I don't know your sample size. I don't know what kind of regression this is. I don't even know the complete list of variables used as regressors. The most helpful way to post Stata results is to copy directly from the Stata Results window or your log file and then paste that, without any further editing, between code delimiters in the Forum editor. (FAQ #12 explains how to set up code delimiters.) When you provide better information, you get quicker and more accurate answers!

            Comment


            • #7
              Hi My name is Manuel, also new in the forum,
              Just a follow-up, I have the same problem when I create the i.year variable. I'm estimating a panel regression and I included - i.year, but in the output only one year (2005) comes up and not the others. I ran -count if e(sample) & year==2005 and Stata says zero; did the same for the remaining years and Stata also says zero.

              I appreciate any thoughts on this.
              Last edited by Manuel Pulido Velasquez; 05 Jul 2017, 11:11.

              Comment


              • #8
                Manuel.
                welcome to the list.
                Are you sure that -time- is not in -string- format?
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Hi Carlo,
                  Yes, I just double checked, it's in numeric format
                  Thanks a lot for your reply,

                  Regards,

                  Comment


                  • #10
                    Manuel:
                    as per FAQ, please post and example/excerpt of your dataset via -dataex- (type -search dataex- from within Stata to install it).
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Hi,
                      I explored a little more the dataset, I found out there is one variable serving unintentionally to distinguish time periods. I think I should check the -fvvarlist to deal with this problem (as suggested couple posts above).
                      Thanks for your help and the -dataex- command. I might be using it soon.

                      Kind regards,

                      Comment

                      Working...
                      X