Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sufficient control for seasonality?

    Hello,
    I have a question about how to remove seasonality in Panel data. I want to evaluate how sales of a specific product change as its price increases (due to a totally exogenous input reason). The price increase is gradual and ongoing for 2-3 years and the rate of change differs between stores (due to, among other things, how much they have in stock). I have monthly data from a few hundred stores, spread across the country, with information about current price, quantity sold, income level, store size etc.

    I have verified with Hausman test that I can use a random effects model (estimates are vertically identical to fixed effect model). And I control for heteroscedasticity and within-group autocorrelation by using Eicker–Huber–White standard errors (clustered standard errors at store level) which allows correlation between but not within stores.

    The problem is that sales are severely affected by seasonality and I want to make sure that I have sufficiently controlled for that. I have data from the month before the price begin to change (the exogenous event) until one year after all stores reach price unity, in total about 48 months. To control for seasonality, I have used monthly time dummies (i.month).

    My questions are:
    • Are monthly time dummies enough to remove seasonality?
    • What is the difference in interpretation between using i.month and the i.date command in Stata (creates 48 dummies, one for each individual month)?
    • Is the time from which I have data, about 4 years, enough to remove seasonality and to provide me with an accurate estimations of the effect size? I could include more observations at the end of the data-set (but there is no price variation so what would be the point?) but not before my current starting point.

    Help would really be appreciated = )



  • #2
    We really can't make promises or even confident predictions about what will be sufficient for your purposes.

    If the seasonal pattern is constant apart from random fluctuations of symmetric distribution, dummy variables for each month should work quite well.

    If the seasonal pattern is mixed up with trend or fluctuations on time scales of years, then you may need more terms to capture that as far as possible.

    Adding dozens of parameters isn't particularly parsimonious, but you may have enough data points not to feel any pain at that. Also, much depends on how far your other predictors catch time effects.

    Please note: The following request from >3 years ago still stands. Please contact the list administrators with a family name.

    01 Feb 2016, 12:22
    Mikael: Please change your registration to use a family name, not "STATA". http://www.statalist.org/forums/help#realnames explains our policy and how to fix this.

    It would be a good idea for you to read the entire document.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      We really can't ...
      Thank you Nick, I really appreciated your comments. The seasonal pattern was constant (apart from random fluctuation and price changes) so monthly dummy variables will be sufficient. I do have another problem though. I´m trying to calculate the prediction interval for when I do predictions using real values (so I can use them in a graph).

      I ran the model using: xreg y x x^2 z zx zx^2 w wx wx^2 i.month, re vce(cluster id)

      Then I ran:

      set obs 18001 //N+1
      replace x = 123... in 18001
      replace x^2 = 123... in 18001
      replace z = 123... in 18001
      replace zx = 123... in 18001

      ...
      //(123... is just the beta*specific values)

      predict newht
      generate tmult=invttail(18001,.025)
      predict CIstderror, stdp
      predict PIstderror, stdf
      <--- this command does not work.

      //The plan was to run these commands and get the higher and lower PI:
      generate lowerCI = newht - tmult*CIstderror
      generate upperCI = newht + tmult*CIstderror
      generate lowerPI = newht - tmult*PIstderror
      generate upperPI = newht + tmult*PIstderror

      list newht lowerCI upperCI lowerPI upperPI in 18001
      //This would give me the values I needed

      I think I know why the stdf-command does not work (Link) but my question is how exactly I calculate PLstderror so I can get the prediction interval for my predicted values. Help would really be appreciated.

      Comment


      • #4
        Do you mean xtreg? Not all of what is available after regress is available after xtreg.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Do you mean xtreg? Not all of what is available after regress is available after xtreg.
          Yes, xtreg (typo). Is their another way to calculate the prediction interval (or PLstderror)?

          Comment


          • #6
            I guess that whatever you calculate depends on your error structure. Stata's developers decided either that it didn't make sense or that it was something they might puzzle out later. I don't understand the theory well enough to know which is right. Undoubtedly some other people watching the forum know immensely more about this.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              I guess that whatever you calculate depends on your error structure. Stata's developers decided either that it didn't make sense or that it was something they might puzzle out later. I don't understand the theory well enough to know which is right. Undoubtedly some other people watching the forum know immensely more about this.
              Given the lack of responses, should I make the second question into a separate forum thread or should I just give this thread more time? Most people might just read the initial question.

              Comment


              • #8
                We give advice on that. See #1 in https://www.statalist.org/forums/help#adviceextras Asking the same question again is not usually a good idea. Trying to improve on the question may be.

                Comment

                Working...
                X