Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Standard deviation of a variable for the last 16 quarters

    Dear reader,

    For my thesis I want to include the standard deviation of sales (amongst others) over the past 16 quarters as a control variable. From WRDS I extracted the sales numbers per quarter from 2001 until 2015 as the data I use for my thesis is from 2005-2015. However, I do not know how to code stata to create a variable which has the SD of sales over the past 16 quarters. Would anyone happen to know how to do this?

    The variables I have are the GVKEY, fyear (fiscal year), fqtr (fiscal quarter - values 1-4), datafqtr (fyear and quarter combined - e.g. 2005Q2) and saleq (sales for that quarter).

    Additionally, this raises the question how to make this SD useful as it is now created per quarter and the rest of my data is all yearly per fiscal year. How do I change this to make it significant and useful? So in other words, how do I change the SD's of 4 quarters into 1 year to make it useful in my regression? Is that by taking the average of the SD's over the four quarters of 1 fiscal year?

    Many thanks for your help!

    Kind regards,

    Daniel

  • #2
    Please do study http://www.statalist.org/forums/help#stata as all members are asked to do before posting.

    Your post falls into specific and general parts. I comment here mostly on the specific part.

    Absent a data example in your post, I give below an example with similar structure.

    Adapting the code to your data would need to confront some ambiguities.

    1. datafqtr (fyear and quarter combined - e.g. 2005Q2) is ambiguous. That could mean

    1a. A numeric variable with appropriate display format.

    1b. A numeric variable with value labels.

    1c. A string variable.

    If 1c or even 1b, your variable probably needs surgery.

    2. Last 16 quarters: does that include or exclude the present value?

    Here's a reproducible example:

    Code:
    webuse grunfeld, clear  
    
    * this need only be done once
    ssc inst rangestat 
    
    rangestat (count) count=mvalue (sd) sd=mvalue, int(year -16 -1) by(company) 
    
    list mvalue count sd if company == 1 
    
         +----------------------------+
         | mvalue   count          sd |
         |----------------------------|
      1. | 3078.5       .           . |
      2. | 4661.7       1           . |
      3. | 5387.1       2   1119.4916 |
      4. | 2792.2       3   1180.5622 |
      5. | 4313.2       4   1247.4263 |
         |----------------------------|
      6. | 4643.9       5    1090.539 |
      7. | 4551.2       6    1005.432 |
      8. | 3244.1       7   930.51326 |
      9. | 4053.7       8   925.92143 |
     10. | 4379.3       9   866.17903 |
         |----------------------------|
     11. | 4840.9      10   822.08518 |
     12. | 4900.9      11   810.39589 |
     13. | 3526.5      12   800.44996 |
     14. | 3254.7      13   791.31723 |
     15. | 3700.2      14   799.69295 |
         |----------------------------|
     16. | 3755.6      15   778.05594 |
     17. |   4833      16   756.26783 |
     18. | 4924.9      16   730.01285 |
     19. | 6241.7      16     744.476 |
     20. | 5593.6      16   857.82234 |
         +----------------------------+
    At best your code might be one of

    Code:
    rangestat (count) count=saleq (sd) sd=saleq, int(datafqtr -16 -1) by(gvkey) 
    
    rangestat (count) count=saleq (sd) sd=saleq, int(datafqtr -15  0) by(gvkey)
    I certainly recommend keeping track of how many values participate in each sd.

    Your general part asks how to make that interesting and useful given other annual data. I am not even an economist and I am not familiar with your project and in any case aren't these questions for your advisor/supervisor? You should, I suggest, not average 4 sds produced in this way as they are not independent of each other. Perhaps you should use the SD for the last quarter of each year. You need substantive advice there.

    Comment


    • #3
      Thank you for your answer Nick! I decided that for the general part I will eventually only use the 4th quarter (end of year) values for my regression.


      Now for the specific part, datafqtr is a string variable and hence needs surgery. Does it become useful if I use the following?

      egen fyq=concat(fyearq fqtr)
      destring fyq, replace

      This yields the following data as an example:

      gvkey fyearq fqtr fyq saleq
      001004 2001 1 20011 202.993
      001004 2001 2 20012 144.889
      001004 2001 3 20013 143.457
      001004 2001 4 20014 147.382
      001004 2002 1 20021 151.165
      001004 2002 2 20022 153.051
      001004 2002 3 20023 156.992
      001004 2002 4 20024 145.129
      001004 2003 1 20031 152.114
      001004 2003 2 20032 156.287
      001004 2003 3 20033 159.233
      001004 2003 4 20034 176.835
      001004 2004 1 20041 163.773
      001004 2004 2 20042 176.448
      001004 2004 3 20043 197.701
      001004 2004 4 20044 209.926
      001004 2005 1 20051 199.588
      001004 2005 2 20052 212.879
      001004 2005 3 20053 223.398
      001004 2005 4 20054 249.653
      001013 2002 1 20021 293.5
      001013 2002 2 20022 298.3
      001013 2002 3 20023 235.1
      001013 2002 4 20024 220.8
      001013 2003 1 20031 194.6
      001013 2003 2 20032 110.5
      001013 2003 3 20033 129.8
      001013 2003 4 20034 154.5
      001013 2004 1 20041 136.7
      001013 2004 2 20042 148.3
      001013 2004 3 20043 224.9
      001013 2004 4 20044 263.5
      001013 2005 1 20051 241
      001013 2005 2 20052 288.8
      001013 2005 3 20053 305.6
      001013 2005 4 20054 294
      001013 2006 1 20061 272.8
      001013 2006 2 20062 358.1
      001013 2006 3 20063 343.6
      001013 2006 4 20064 307.4
      001013 2007 1 20071 297.2
      001013 2007 2 20072 349.4
      001013 2007 3 20073 346.1
      001013 2007 4 20074 329.5
      001013 2008 1 20081 329.1
      001013 2008 2 20082 393.2
      001013 2008 3 20083 381.8
      001013 2008 4 20084 352.3
      001013 2009 1 20091 254.3
      001013 2009 2 20092 275.1
      001013 2009 3 20093 283.4

      This is partial example data for the sales data. Does the code you provided work for these? And does it take into account that for certain periods there is no data available?

      Comment


      • #4



        You could have tried the code.

        Your date variable is not fit for purpose! Your "dates" imply gaps of 1,1,1,7 since the previous quarter as in particular they jump from (e.g.) 20084 to 20091. So, you can't use them for calculations.

        However, a fix follows from reading the help for dates and times, which really can't be skipped if you want to use Stata with dates.


        Code:
        * required reading 
        help datetime 
        
        
        clear 
        input gvkey fyearq fqtr fyq saleq
        001004 2001 1 20011 202.993
        001004 2001 2 20012 144.889
        001004 2001 3 20013 143.457
        001004 2001 4 20014 147.382
        001004 2002 1 20021 151.165
        001004 2002 2 20022 153.051
        001004 2002 3 20023 156.992
        001004 2002 4 20024 145.129
        001004 2003 1 20031 152.114
        001004 2003 2 20032 156.287
        001004 2003 3 20033 159.233
        001004 2003 4 20034 176.835
        001004 2004 1 20041 163.773
        001004 2004 2 20042 176.448
        001004 2004 3 20043 197.701
        001004 2004 4 20044 209.926
        001004 2005 1 20051 199.588
        001004 2005 2 20052 212.879
        001004 2005 3 20053 223.398
        001004 2005 4 20054 249.653
        001013 2002 1 20021 293.5
        001013 2002 2 20022 298.3
        001013 2002 3 20023 235.1
        001013 2002 4 20024 220.8
        001013 2003 1 20031 194.6
        001013 2003 2 20032 110.5
        001013 2003 3 20033 129.8
        001013 2003 4 20034 154.5
        001013 2004 1 20041 136.7
        001013 2004 2 20042 148.3
        001013 2004 3 20043 224.9
        001013 2004 4 20044 263.5
        001013 2005 1 20051 241
        001013 2005 2 20052 288.8
        001013 2005 3 20053 305.6
        001013 2005 4 20054 294
        001013 2006 1 20061 272.8
        001013 2006 2 20062 358.1
        001013 2006 3 20063 343.6
        001013 2006 4 20064 307.4
        001013 2007 1 20071 297.2
        001013 2007 2 20072 349.4
        001013 2007 3 20073 346.1
        001013 2007 4 20074 329.5
        001013 2008 1 20081 329.1
        001013 2008 2 20082 393.2
        001013 2008 3 20083 381.8
        001013 2008 4 20084 352.3
        001013 2009 1 20091 254.3
        001013 2009 2 20092 275.1
        001013 2009 3 20093 283.4
        end 
        replace fyq = yq(fyearq, fqtr) 
        rangestat (count) count=saleq (sd) sd=saleq, interval(fyq -15 0) by(gvkey) 
        
        l in 1/16 , sep(4) 
        
             | gvkey   fyearq   fqtr   fyq     saleq   count          sd |
             |-----------------------------------------------------------|
          1. |  1004     2001      1   164   202.993       1           . |
          2. |  1004     2001      2   165   144.889       2   41.085724 |
          3. |  1004     2001      3   166   143.457       3   33.967285 |
          4. |  1004     2001      4   167   147.382       4    28.92067 |
             |-----------------------------------------------------------|
          5. |  1004     2002      1   168   151.165       5   25.333886 |
          6. |  1004     2002      2   169   153.051       6    22.74839 |
          7. |  1004     2002      3   170   156.992       7   20.766436 |
          8. |  1004     2002      4   171   145.129       8    19.68882 |
             |-----------------------------------------------------------|
          9. |  1004     2003      1   172   152.114       9   18.454505 |
         10. |  1004     2003      2   173   156.287      10   17.402216 |
         11. |  1004     2003      3   174   159.233      11    16.55074 |
         12. |  1004     2003      4   175   176.835      12   16.918959 |
             |-----------------------------------------------------------|
         13. |  1004     2004      1   176   163.773      13   16.293005 |
         14. |  1004     2004      2   177   176.448      14   16.416247 |
         15. |  1004     2004      3   178   197.701      15   18.673997 |
         16. |  1004     2004      4   179   209.926      16   21.680573 |
             +-----------------------------------------------------------+
        You didn't indicate whether the present observation was included, so I jumped one way/

        Comment


        • #5
          This works! Thanks a lot!!!!

          Comment

          Working...
          X