Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can rangestat or asrol solve this problem? creating statistics within industry in a rolling fashion

    Dear All

    I have panel data that includes year, quarter, month and week, firm, industry, and (firm) profit
    The year ranges between 1981 to 2020
    The quarter takes values from 1 to 4 to represent each quarter within the year
    The month takes values from 1 to 3 to represent each month within the quarter
    The week takes values from 1 to 4 to represent each week within the month

    I want to convert this panel data into time-series data representing the mean profits of firms. Most importantly, these means should be calculated in an increasing rolling window fashion such that the mean of profits in week 1 represents the mean of profit in week 1, the mean of profits in week 2 represents the mean of profits in week 1 and 2, the mean of profits in week 3 represents the mean of profits in week 1,2 and 3; and so on.

    I did the following:

    Code:
    *generating "series" to take values from 1 to 12 representing data available at each week of the quarter
    gen real_month = 3*(quarter-1) + month
    gen quarter_date = qofd(mdy(real_month, 1, year))
    format quarter_date %tq
    gen series = 4*(month-1) + week
    
    * generating increasing window rolling means for the weekly profit data within each quarter of the year
    rangestat (mean) profit, interval(series 0 0) by(quarter_date ) casewise 
    
    * keep only one profit_mean observation for each series in the quarter (i.e. each week within the quarter)
    duplicates drop quarter_date series profit_mean, force
    
    sort series quarter_date
    xtset series quarter_date, quarterly
    Now my data becomes quarterly time-series while including mean profit at each week of the quarter. If I want, for example, to plot the time-series of profit when it was "available" in week 1 of any quarter, I will do the following:

    Code:
    tsline profit_mean if series==1
    If I instead want to plot it as it is available in week 7 for example, I will do the following:

    Code:
    tsline profit_mean if series==7

    And so on. In other words, any analysis I do will then be conditioned on the series as it represents the point (week) when this data was available.
    At this point, you probably see my aim. I want to have profit time series as they are available in real-time each week.


    My problem:

    1- First, I hope the code I have above will correctly produce what I have described?! I will be thankful if someone can confirm this.

    2- Second, I also want to create time-series of profits in an identical way but these time series should be for each industry. Specifically, the code above produces a time-series of profit calculated from ALL FIRMS' profit in the cross-section (at different weeks of the quarter). Now, if I have 10 industries, I would like to have 10 profit time-series variables in the same way as above (i.e. calculated at different weeks of the quarter).

    I really hope someone can help with that

    My data looks like:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long firmID byte industry float(year quarter month week) long profit
      5709 30 1981 1 2 3   354240
     10618 20 1981 1 2 3    77657
      2497 20 1981 1 2 4    51141
      8577 55 1981 1 3 1    99506
      7922 25 1981 1 3 1   130866
      6829 25 1981 1 3 1   205798
      1410 20 1981 1 3 2    63145
     11259 30 1981 1 3 2   554184
      4598 20 1981 1 3 3   152512
      4990 25 1981 1 3 3   125319
      5667 10 1981 1 3 3    93135
      7116 25 1981 1 3 3   205362
      7138 60 1981 1 3 3    57686
      9538 30 1981 1 3 3    90739
      6307 25 1981 1 3 3  4441594
      9818 25 1981 1 3 3  1274557
      3813 25 1981 1 3 3  1428899
      5018 20 1981 1 3 4   497461
      1913 15 1981 1 3 4   152901
      9906 55 1981 1 3 4   120843
      8579 25 1981 1 3 4    40898
      9599 45 1981 1 3 4     3545
      8151 10 1981 2 1 1   349232
      2817 20 1981 2 1 2  2190700
     10984 50 1981 2 1 2   534012
     11264 30 1981 2 1 2   489653
     10507 55 1981 2 1 3   555374
      1045 20 1981 2 1 3   952544
      6178 55 1981 2 1 3    87216
      1161 45 1981 2 1 2    80318
      3036 50 1981 2 1 3    81209
      5742 55 1981 2 1 3   609402
      4029 55 1981 2 1 3   554203
      3980 50 1981 2 1 3   235066
      9846 55 1981 2 1 3   908514
     14912 55 1981 2 1 3   908514
      7366 55 1981 2 1 3   588403
     10601 55 1981 2 1 3   108232
      4798 55 1981 2 1 2   604031
      5237 20 1981 2 1 3    14835
      5256 20 1981 2 1 3   182286
      7938 10 1981 2 1 3   125411
      4517 55 1981 2 1 2   604031
      7985 20 1981 2 1 3   435200
      7993 55 1981 2 1 3   111131
      8215 15 1981 2 1 2   963700
      9828 55 1981 2 1 3   176961
      8272 55 1981 2 1 2   775281
      4988 50 1981 2 1 3   304738
     10867 20 1981 2 1 3  1511519
     11161 45 1981 2 1 3     4716
     11185 20 1981 2 1 3    30813
     11555 55 1981 2 1 2   140134
     65095 55 1981 2 1 2   140134
     66591 55 1981 2 1 3   387850
     12612 55 1981 2 1 4   238925
     10860 55 1981 2 1 4   240450
      1440 55 1981 2 1 3  1099957
      3814 55 1981 2 1 4   238925
      7437 55 1981 2 1 4    95374
      1209 15 1981 2 1 3   413030
      1743 20 1981 2 1 4    90622
      2137 50 1981 2 1 4  1406957
      3170 30 1981 2 1 4  1326361
      3439 55 1981 2 1 4   822618
     13948 55 1981 2 1 4   822618
      7241 35 1981 2 1 4   469909
     12564 55 1981 2 1 4   229625
      3851 20 1981 2 1 4   881574
      4091 20 1981 2 1 4    43288
      4093 55 1981 2 1 4   498256
      2783 55 1981 2 1 4   349428
      4331 55 1981 2 1 4    22149
      8099 55 1981 2 1 4   308837
      5903 10 1981 2 1 4  1512000
      7260 50 1981 2 1 3   108026
      6867 55 1981 2 1 4    72538
      7017 10 1981 2 1 4  2281727
      7585 45 1981 2 1 4   858897
      7875 25 1981 2 1 4    37117
      8810 55 1981 2 1 4   982938
     12749 55 1981 2 1 4   982938
      8762 30 1981 2 1 4  2949001
      1075 55 1981 2 1 3   191582
      1742 55 1981 2 1 3   191582
      8455 55 1981 2 1 4   314529
    145348 55 1981 2 1 4   314529
     23465 55 1981 2 1 4   149435
      9698 20 1981 2 1 3    17389
      9324 55 1981 2 1 4     5764
     11012 15 1981 2 1 3   100253
     31596 55 1981 2 1 4   918607
      1104 20 1981 2 1 4     9966
     15448 55 1981 2 1 4   185578
      3580 20 1981 2 1 4   374885
      2991 10 1981 2 1 4 11352000
      3897 55 1981 2 1 4   497396
     65089 55 1981 2 1 4   497396
      3413 55 1981 2 1 4  1221300
      4241 55 1981 2 1 4    54400
    end
    Thank you
    Mike

  • #2
    Dear all

    Just wanted to clarify that it does not have to be rangestat or asrol as in the title of this post.

    If my code is correct in the sense that it creates the time-series data by calculating the rolling median for all firms in the cross section, a lengthy way to solve my question in (2) would be:


    Code:
    use profit.dta, clear
    
    keep if industry==10
    
    rangestat (mean) profit, interval(series 0 0) by(quarter_date ) casewise
    
    duplicates drop quarter_date series profit_mean, force
    
    rename profit_mean profit_mean_ind10
    
    sort series quarter_date
    
    xtset series quarter_date, quarterly
    
    save ind10.dat, replace
    Then I repeat the same steps for each industry and I merge them at the end?

    Is there anything more efficient than that (especially because the number of industries can be large)?

    Comment


    • #3
      Dear all
      This has been for some time now, so I bring it up again.
      Please ask me if there is anything unclear in my explanation so I can make it easier to understand which may increase the chance of me getting some help.
      Thank you
      Mike

      Comment


      • #4
        I have not read the full post, but finding median in rolling window for each industry can look something like this:
        Code:
        bys industry: asrol profit, window(quarter_date 10) stat(median) perc(.5)
        You can replace perc(.5) with any other value to find a given percentile. I have assumed a 10-period rolling window.
        Regards
        --------------------------------------------------
        Attaullah Shah, PhD.
        Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
        FinTechProfessor.com
        https://asdocx.com
        Check out my asdoc program, which sends outputs to MS Word.
        For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

        Comment


        • #5
          After reading your posts, it seems you are not asking for a rolling window. Therefore, here is another working example using asrol. The following code will do the calculation for all industries, requiring one line code of asrol. However, please note that even with your code in # 3, once you merge the data from different industries, you cannot xtset it, because there will be repeated values within each series quarter_date.
          Code:
          bys industry series: asrol profit , stat(mean) gen(prft_mean)
          More on asrol can be found here https://fintechprofessor.com/asrol-f...tics-in-stata/

          Here is the full code with the data:

          clear
          Code:
          input long firmID byte industry float(year quarter month week) long profit
            5709 30 1981 1 2 3   354240
           10618 20 1981 1 2 3    77657
            2497 20 1981 1 2 4    51141
            8577 55 1981 1 3 1    99506
            7922 25 1981 1 3 1   130866
            6829 25 1981 1 3 1   205798
            1410 20 1981 1 3 2    63145
           11259 30 1981 1 3 2   554184
            4598 20 1981 1 3 3   152512
            4990 25 1981 1 3 3   125319
            5667 10 1981 1 3 3    93135
            7116 25 1981 1 3 3   205362
            7138 60 1981 1 3 3    57686
            9538 30 1981 1 3 3    90739
            6307 25 1981 1 3 3  4441594
            9818 25 1981 1 3 3  1274557
            3813 25 1981 1 3 3  1428899
            5018 20 1981 1 3 4   497461
            1913 15 1981 1 3 4   152901
            9906 55 1981 1 3 4   120843
            8579 25 1981 1 3 4    40898
            9599 45 1981 1 3 4     3545
            8151 10 1981 2 1 1   349232
            2817 20 1981 2 1 2  2190700
           10984 50 1981 2 1 2   534012
           11264 30 1981 2 1 2   489653
           10507 55 1981 2 1 3   555374
            1045 20 1981 2 1 3   952544
            6178 55 1981 2 1 3    87216
            1161 45 1981 2 1 2    80318
            3036 50 1981 2 1 3    81209
            5742 55 1981 2 1 3   609402
            4029 55 1981 2 1 3   554203
            3980 50 1981 2 1 3   235066
            9846 55 1981 2 1 3   908514
           14912 55 1981 2 1 3   908514
            7366 55 1981 2 1 3   588403
           10601 55 1981 2 1 3   108232
            4798 55 1981 2 1 2   604031
            5237 20 1981 2 1 3    14835
            5256 20 1981 2 1 3   182286
            7938 10 1981 2 1 3   125411
            4517 55 1981 2 1 2   604031
            7985 20 1981 2 1 3   435200
            7993 55 1981 2 1 3   111131
            8215 15 1981 2 1 2   963700
            9828 55 1981 2 1 3   176961
            8272 55 1981 2 1 2   775281
            4988 50 1981 2 1 3   304738
           10867 20 1981 2 1 3  1511519
           11161 45 1981 2 1 3     4716
           11185 20 1981 2 1 3    30813
           11555 55 1981 2 1 2   140134
           65095 55 1981 2 1 2   140134
           66591 55 1981 2 1 3   387850
           12612 55 1981 2 1 4   238925
           10860 55 1981 2 1 4   240450
            1440 55 1981 2 1 3  1099957
            3814 55 1981 2 1 4   238925
            7437 55 1981 2 1 4    95374
            1209 15 1981 2 1 3   413030
            1743 20 1981 2 1 4    90622
            2137 50 1981 2 1 4  1406957
            3170 30 1981 2 1 4  1326361
            3439 55 1981 2 1 4   822618
           13948 55 1981 2 1 4   822618
            7241 35 1981 2 1 4   469909
           12564 55 1981 2 1 4   229625
            3851 20 1981 2 1 4   881574
            4091 20 1981 2 1 4    43288
            4093 55 1981 2 1 4   498256
            2783 55 1981 2 1 4   349428
            4331 55 1981 2 1 4    22149
            8099 55 1981 2 1 4   308837
            5903 10 1981 2 1 4  1512000
            7260 50 1981 2 1 3   108026
            6867 55 1981 2 1 4    72538
            7017 10 1981 2 1 4  2281727
            7585 45 1981 2 1 4   858897
            7875 25 1981 2 1 4    37117
            8810 55 1981 2 1 4   982938
           12749 55 1981 2 1 4   982938
            8762 30 1981 2 1 4  2949001
            1075 55 1981 2 1 3   191582
            1742 55 1981 2 1 3   191582
            8455 55 1981 2 1 4   314529
          145348 55 1981 2 1 4   314529
           23465 55 1981 2 1 4   149435
            9698 20 1981 2 1 3    17389
            9324 55 1981 2 1 4     5764
           11012 15 1981 2 1 3   100253
           31596 55 1981 2 1 4   918607
            1104 20 1981 2 1 4     9966
           15448 55 1981 2 1 4   185578
            3580 20 1981 2 1 4   374885
            2991 10 1981 2 1 4 11352000
            3897 55 1981 2 1 4   497396
           65089 55 1981 2 1 4   497396
            3413 55 1981 2 1 4  1221300
            4241 55 1981 2 1 4    54400
          end
          
          gen real_month = 3*(quarter-1) + month
          
          gen quarter_date = qofd(mdy(real_month, 1, year))
          
          format quarter_date %tq
          
          gen series = 4*(month-1) + week
          
          bys industry series: asrol profit , stat(mean) gen(prft_mean)
          
          duplicates drop quarter_date series profit_mean, force
          
          sort industry series quarter_date
          
           list industry series quarter_date prft_mean in 1/20
          
               +------------------------------------------+
               | industry   series   quarte~e   prft_mean |
               |------------------------------------------|
            1. |       10        1     1981q2      349232 |
            2. |       10        3     1981q2      125411 |
            3. |       10        4     1981q2   5048575.7 |
            4. |       10       11     1981q1       93135 |
            5. |       15        2     1981q2      963700 |
               |------------------------------------------|
            6. |       15        3     1981q2    256641.5 |
            7. |       15       12     1981q1      152901 |
            8. |       20        2     1981q2     2190700 |
            9. |       20        3     1981q2   449226.57 |
           10. |       20        4     1981q2      280067 |
               |------------------------------------------|
           11. |       20        7     1981q1       77657 |
           12. |       20        8     1981q1       51141 |
           13. |       20       10     1981q1       63145 |
           14. |       20       11     1981q1      152512 |
           15. |       20       12     1981q1      497461 |
               |------------------------------------------|
           16. |       25        4     1981q2       37117 |
           17. |       25        9     1981q1      168332 |
           18. |       25       11     1981q1   1495146.2 |
           19. |       25       12     1981q1       40898 |
           20. |       30        2     1981q2      489653 |
               +------------------------------------------+
          Regards
          --------------------------------------------------
          Attaullah Shah, PhD.
          Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
          FinTechProfessor.com
          https://asdocx.com
          Check out my asdoc program, which sends outputs to MS Word.
          For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

          Comment


          • #6
            Originally posted by Attaullah Shah View Post
            I have not read the full post, but finding median in rolling window for each industry can look something like this:
            Code:
            bys industry: asrol profit, window(quarter_date 10) stat(median) perc(.5)
            You can replace perc(.5) with any other value to find a given percentile. I have assumed a 10-period rolling window.
            Thanks a lot Attaullah. I always find asrol very helpful.
            However, it seems here to produce industry moving medians for a window of 10 quarters. This is not what I aim for. In fact, each quarter_date includes profit data for each week of the quarter and my aim is to calculate the moving medians for these weekly available data within the quarter. For example, median of week 1; median of week 1 and 2; median of week 1,2 and 3; and so on. Weeks are labeled as "series" in my data.

            So I made a small change to your code to give me increasing window moving medians for the 12 weeks within each quarter and for each industry as follows

            Code:
            bys quarter_date industry: asrol profit, window(series 12) stat(median) perc(.5)
            Note that I use 12 as there are 12 weeks within each quarter and I want to calculate the medians for each quarter using weekly data.

            But I still end up with the problem that it will create only one variable representing these medians while I need one variable for each industry to have industry time-series data.

            Ultimately I want my data after sorting on "series" to be as following:
            qaurter_date quarter month week series median12_profit_ind1 median_12_profit_ind2 median12_profit_ind3 median12_profit_ind4
            1981q1 1 1 1 1 value value value value
            1981q2 2 1 1 1 value value value value
            1981q3 3 1 1 1 value value value value
            1981q4 4 1 1 1
            1982q1 1 1 1 1
            1982q2 2 1 1 1
            1982q3 3 1 1 1
            1982q4 4 1 1 1
            1981q1 1 1 2 2
            1981q2 2 1 2 2
            1981q3 3 1 2 2
            1981q4 4 1 2 2
            Once I successfully arrive at this data structure, I can then run my analysis while conditioning on the series (which is the week of the quarter when the profit is available). For example:

            Code:
            reg median12_profit_ind1 median12_ind2 if series==2
            
            reg median12_profit_ind1 median12_ind2 if series==5

            But after asrol, the data I get will look be as below:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte industry long profit double median12_profit float(quarter_date series month week)
            10    93135     93135 84 11 3 3
            15   152901    152901 84 12 3 4
            20    63145     63145 84 10 3 2
            20    51141     64399 84  8 2 4
            20   152512     70401 84 11 3 3
            20   497461     77657 84 12 3 4
            20    77657     77657 84  7 2 3
            25  1428899    205798 84 11 3 3
            25   205362    205798 84 11 3 3
            25  1274557    205798 84 11 3 3
            25   205798    168332 84  9 3 1
            25   130866    168332 84  9 3 1
            25   125319    205798 84 11 3 3
            25  4441594    205798 84 11 3 3
            25    40898    205580 84 12 3 4
            30   554184    454212 84 10 3 2
            30    90739    354240 84 11 3 3
            30   354240    354240 84  7 2 3
            45     3545      3545 84 12 3 4
            55   120843  110174.5 84 12 3 4
            55    99506     99506 84  9 3 1
            60    57686     57686 84 11 3 3
            10    99375  144157.5 85 10 3 2
            10  2281727 1896863.5 85  4 1 4
            10   125411  237321.5 85  3 1 3
            10    77811    349232 85  6 2 2
            10    65739    160327 85  7 2 3
            10 11352000 1896863.5 85  4 1 4
            10   495635    495635 85  5 2 1
            10   127988    495635 85  5 2 1
            10   349232    349232 85  1 1 1
            10    84458    349232 85  6 2 2
            10   160327    495635 85  5 2 1
            10   108301    160327 85  7 2 3
            10  1512000 1896863.5 85  4 1 4
            10  2958498 1896863.5 85  4 1 4
            15    25086    100253 85 11 3 3
            15   168708  134480.5 85 12 3 4
            15   413030    413030 85  3 1 3
            15   963700    963700 85  2 1 2
            15    26094  256641.5 85  9 3 1
            15   100253    413030 85  3 1 3
            20    65497    162729 85 10 3 2
            20    30813    308743 85  3 1 3
            20    11822    162729 85  5 2 1
            20    82768    162729 85  8 2 4
            20    62237    182286 85  4 1 4
            20    90622    182286 85  4 1 4
            20   150842    162729 85  5 2 1
            20     9966    182286 85  4 1 4
            20   952544    308743 85  3 1 3
            20   435200    308743 85  3 1 3
            end
            format %tq quarter_date

            I hope it is clear now. Can I get help with that?



            Last edited by Mike Kraft; 26 Sep 2020, 04:41.

            Comment


            • #7
              Please see my reply in #5
              Regards
              --------------------------------------------------
              Attaullah Shah, PhD.
              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
              FinTechProfessor.com
              https://asdocx.com
              Check out my asdoc program, which sends outputs to MS Word.
              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

              Comment


              • #8
                Originally posted by Attaullah Shah View Post
                Please see my reply in #5
                Sorry when I wrote my message, it was not available.

                In #5 you have:
                Code:
                bys industry series: asrol profit , stat(mean) gen(prft_mean)
                In #6 I have:

                Code:
                bys quarter_date industry: asrol profit, window(series 12) stat(median) perc(.5)
                I am not sure if they are equivalent. But my aim is to calculate the median (or the mean) by rolling from series 1 to series 12 within the same quarter_date. That is why I did it as in #6 to achieve that.

                Unfortunately, both #5 and #6 will produce one median variable while I want a median variable for each industry as per the hypothetical table I presented in #6.

                Do I need to reshape the data or probably create new variables for each industry from the median variable after that?! I do not really know how?!
                Last edited by Mike Kraft; 26 Sep 2020, 04:47.

                Comment


                • #9
                  For median within an industry, this code seems good to me
                  Code:
                   
                   bys industry quarter_date : asrol profit, window(series 12) stat(median)
                  And yes, you would need to reshape the data.
                  Regards
                  --------------------------------------------------
                  Attaullah Shah, PhD.
                  Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
                  FinTechProfessor.com
                  https://asdocx.com
                  Check out my asdoc program, which sends outputs to MS Word.
                  For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

                  Comment


                  • #10

                    I tried

                    Code:
                    bys industry quarter_date : asrol profit, window(series 12) stat(median)
                    
                    duplicates drop quarter_date series industry median12_profit, force
                    
                    reshape wide median12_profit, i(quarter_date series)  j(industry)
                    But this gives me lots of missing values... I am not sure how to fix that?!


                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input float(quarter_date series year quarter month week) double(median12_profit10 median12_profit15 median12_profit20 median12_profit25 median12_profit30)
                    84  7 1981 1 2 3         .        .     77657         .    354240
                    84  8 1981 1 2 4         .        .     64399         .         .
                    84  9 1981 1 3 1         .        .         .    168332         .
                    84 10 1981 1 3 2         .        .     63145         .    454212
                    84 11 1981 1 3 3     93135        .     70401    205798    354240
                    84 12 1981 1 3 4         .   152901     77657    205580         .
                    85  1 1981 2 1 1    349232        .         .         .         .
                    85  2 1981 2 1 2         .   963700   2190700         .    489653
                    
                    end
                    format %tq quarter_date
                    As mentioned in #6, I want my data after sorting on "series" to be as follows.
                    qaurter_date quarter month week series median12_profit_ind1 median_12_profit_ind2 median12_profit_ind3 median12_profit_ind4
                    1981q1 1 1 1 1 value value value value
                    1981q2 2 1 1 1 value value value value
                    1981q3 3 1 1 1 value value value value
                    1981q4 4 1 1 1
                    1982q1 1 1 1 1
                    1982q2 2 1 1 1
                    1982q3 3 1 1 1
                    1982q4 4 1 1 1
                    1981q1 1 1 2 2
                    1981q2 2 1 2 2
                    1981q3 3 1 2 2
                    1981q4 4 1 2 2
                    How can I arrive at this data structure as per the previous hypothetical table?

                    Comment


                    • #11
                      It is a bit difficult to pick up this thread so late in the game. But it looks like Attaullah Shah is giving you good advice. You seem to have pretty much reached your goal at this point, although you are concerned about missing values. If the example data you show in #6 is representative of your whole data set, these missing values are just reflecting gaps in the data itself. When I cross-tabulate industry with series, we see that there are plenty of combinations that are not instantiated (at least in the example data):

                      Code:
                      . table industry series
                      
                      ----------------------------------------------------------------------------------
                                |                                 series                                
                       industry |    1     2     3     4     5     6     7     8     9    10    11    12
                      ----------+-----------------------------------------------------------------------
                             10 |    1           1     4     3     2     2                 1     1      
                             15 |          1     2                                   1           1     2
                             20 |                3     3     2           1     2           2     1     1
                             25 |                                                    2           5     1
                             30 |                                        1                 1     1      
                             45 |                                                                      1
                             55 |                                                    1                 1
                             60 |                                                                1      
                      ----------------------------------------------------------------------------------
                      Where there is no data to start with, there will be nothing corresponding in the results.

                      Comment


                      • #12

                        Thanks @attaullah shah

                        Thanks @Clyde Schechter #11

                        If my use of reshape is correct then the problem would be in my use of asrol. The way I implemented it seems to lead to missing median observations for the industry within the quarter when there are no firms reporting profit during a certain week.

                        Ideally, I want no missing observations for all subsequent weeks in the quarter once there are firms reporting profit in a given week. For example, suppose for industry number 10, there is profit data in week 1 of quarter 1981q1, then in week 2 of the same quarter, there is no profit for this industry in the cross section. In this case, I want in week 1 that median(or mean) profit is calculated based on week 1 cross section profit data; then in week 2, it will be calculated using week 1 and week 2 data which in this case will be that of week 1 as week 2 is missing, in week 3 t it should be calculated for week 1,2 and 3 cross section profit data regardless of the missing data; and so on. Therefore, there must be NO missing median observations in this case for all weeks from 1 to 12 for that quarter.

                        In another example, suppose for industry number 15, there is NO profit data in the cross section for this industry in week 1 and week 2 of quarter 1981q1, then in week 3 of the same quarter, there is profit for this industry in the cross section. In this case, the median profit for this industry will be only missing in week 1 and 2 but once we have profit data in week 3 for this industry, it should never be missing from that point until the end of the quarter (i.e. for all weeks from 3 to 12).

                        So, the idea is simply that I want to watch profit data in real-time when they become available in each week of the quarter and calculate moving medians in each week of the quarter to represent the median profit of the industry that is updated every week and never goes missing ONCE firm profit for the industry becomes available in a certain week of the quarter. Hope this helps to solve the problem. Recall that "series" represents the weeks in my data.

                        Does the use of asrol below mean that? Also does 12 in the window mean that the rolling window will be calculated for week 1; week 1 and 2; week 1,2, and3; then week 1,2,3,and 4; and so on up till 1-12? If so, how to prevent it from producing missing data as per my explanation above? If this is not possible; can rangesatat solve that?

                        Code:
                         
                         bys industry quarter_date : asrol profit, window(series 12) stat(median)

                        I look forward to hearing back


                        Comment


                        • #13
                          No, -rangestat- will not solve that. If a combination of industry and week does not exist in the starting data, -rangestat- will not create it. What you need to do is go back to the original data, and add additional observations so that every combination of industry and week is instantiated. Those observations can contain missing values for the profit, and for everything else. Once those are there, -rangestat-* would calculate the rolling medians or means the way you want. I do not myself know much about -asrol-, but I would guess that -asrol- would also do that.

                          To expand your data set in the way you need, the -fillin- command will do the job. Run -help fillin- for instructions and details

                          *Note: To use -rangestat- you have to get the coding right. What you attempted in #1 with -interval(series 0 0)- will not give you rolling statistics. For the kind of recursive rolling means/medians you are interested in it should be -interval(series . 0)-.

                          Comment


                          • #14
                            Dear Clyde Schechter, you advice is very helpful as always. I really thank you so much.

                            I think I am about to solve the issue now.

                            I checked -fillin- as you suggested and applied it before I use -asrol- Please see below

                            Code:
                            * create additional missing observations so every combination of industry and series is instantiated
                            fillin quarter_date series industry
                            
                            * Using asrol to create moving medians from series (week) 1 to series 12 for any given quarter for each industry
                            
                            bys industry quarter_date : asrol profit, window(series 12) stat(median)
                            
                            sort series quarter_date
                            Before I proceed with the remaining part of the code I thought to check the resulting dataset. I give an example below for industry 40:


                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input byte industry float(year quarter_date series) byte _fillin double median12_profit
                            40    . 91  1 1       .
                            40    . 91  2 1       .
                            40    . 91  3 1       .
                            40    . 91  4 1       .
                            40    . 91  5 1       .
                            40    . 91  6 1       .
                            40    . 91  7 1       .
                            40    . 91  8 1       .
                            40    . 91  9 1       .
                            40    . 91 10 1       .
                            40    . 91 11 1       .
                            40    . 91 12 1       .
                            40    . 92  1 1       .
                            40 1983 92  2 0   36934
                            40    . 92  3 1   36934
                            40    . 92  4 1   36934
                            40    . 92  5 1   36934
                            40    . 92  6 1   36934
                            40    . 92  7 1   36934
                            40    . 92  8 1   36934
                            40 1983 92  9 0 21384.5
                            40    . 92 10 1 21384.5
                            40    . 92 11 1 21384.5
                            40    . 92 12 1 21384.5
                            40 1983 93  1 0   41867
                            40    . 93  2 1   41867
                            40    . 93  3 1   41867
                            40    . 93  4 1   41867
                            40 1983 93  5 0   23993
                            40    . 93  6 1   23993
                            40    . 93  7 1   23993
                            40    . 93  8 1   23993
                            40    . 93  9 1   23993
                            40    . 93 10 1   23993
                            40    . 93 11 1   23993
                            40    . 93 12 1   23993
                            40    . 94  1 1       .
                            40    . 94  2 1       .
                            40 1983 94  3 0 28517.5
                            40 1983 94  3 0 28517.5
                            40    . 94  4 1 28517.5
                            40    . 94  5 1 28517.5
                            40    . 94  6 1 28517.5
                            40    . 94  7 1 28517.5
                            40    . 94  8 1 28517.5
                            40    . 94  9 1 28517.5
                            40    . 94 10 1 28517.5
                            40    . 94 11 1 28517.5
                            40    . 94 12 1 28517.5
                            end
                            format %tq quarter_date


                            It appears that the industry will have a median value for all remaining weeks of the quarter once one week has a median value. This is great.

                            Now, I need to drop the duplicates and reshape my data as before.

                            Code:
                            duplicates drop quarter_date series industry median12_profit, force
                            
                            keep quarter_date year quarter month week series industry median12_profit _fillin
                            
                            sort series quarter_date
                            
                            drop year quarter month week  _fillin // if I do not drop them I will have error when I reshape wide
                            
                            reshape wide median12_profit, i(series quarter_date)  j(industry)
                            Apparently, this seems to fix the problem.

                            I make these tests to check:

                            Code:
                            dataex if series==1 
                            
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input float(quarter_date series) double(median12_profit10 median12_profit15 median12_profit20 median12_profit25 median12_profit30 median12_profit35 median12_profit40 median12_profit45 median12_profit50 median12_profit55 median12_profit60)
                             84 1         .        .         .         .         .        .       .         .        .        .       .
                             85 1    349232        .         .         .         .        .       .         .        .        .       .
                             86 1    264591        .         .         .         .        .       .         .        .        .       .
                             87 1         .        .         .         .         .        .       .         .        .        .       .
                             88 1    244023        .         .         .         .        .       .         .        .        .       .
                             89 1    173752   169683 2479874.5         .         .        .       .      3277        .    46120       .
                             90 1    220786        .         .   1079383         .        .       .     32409        .        .   13836
                             91 1         .   170110         .         .         .        .       .         .        .        .       .
                             92 1    221632        .         .    141728    526570        .       .         .        .        .       .
                             93 1     44155        .         .         .         .        .   41867         .        .        .       .
                             94 1    185700        .     39231         .         .        .       .     28379        .        .       .
                             95 1         .        .         .         .     25185        .       .      6776        .        .       .
                             96 1         .        .     11582         .    613015        .       .         .        .        .       .
                             97 1     64310  54855.5         .     60393         .        .       .   97344.5        .        .       .
                             98 1    294875        .         .     77097         .        .       .    234266        .        .       .
                             99 1         .        .     11425         .     79010        .       .     12714        .        .       .
                            100 1         .        .         .   54442.5    392174        .       .         .        .        .       .
                            101 1     11591        .     27282    164667         .     4719       .         .        .        .       .
                            102 1         .        .         .         .         .        .       .         .        .        .       .
                            103 1     25783        .         .     28300         .        .       .     13670        .        .       .
                            104 1         .        .         .     90784    817740     7998       .         .        .        .       .
                            105 1    289629        .  858560.5     10378         .        .       .         .        .        .       .
                            106 1         .        .   1754000         .         .        .       .      2330        .        .       .
                            107 1         .        .  959127.5         .     55260        .       .     21863        .        .       .
                            108 1         .   139852   1038005         .         .        .       .         .        .        .       .
                            109 1         .   181251   1893000         .         .    14125    2692     13797        .    18640   72014
                            110 1         .        .    754725     26878         .        .       .         .        .        .       .
                            111 1         .   344435   1758000    205287         .        .       .     29499        .        .       .
                            112 1         .        .    102279      5894     91149        .       .     60306        .   187313       .
                            113 1      8381   373900    873179         .         .  1394247       .    635807    45688    99741   20714
                            114 1         .        .  979176.5     87081   1924542        .       .         .        .        . 1702500
                            115 1         .        .  975294.5     61912    746402        .       .    397892        .        .       .
                            116 1     15736        .   56019.5     80214         .    54745       .         .        .   184044       .
                            117 1     15822        .   56229.5     42250    149369   237564       .      3737        .   512400       .
                            118 1    143427  1027591     49098  179959.5    754588 966145.5       .         .        .   203278 1698000
                            119 1         .        .    178848     73909    768892   237542       .         .        .        .       .
                            120 1     16212        .    224273     41697         .        .       .     37703        .        .       .
                            121 1     30148        .     60742     60734  436157.5   279315       .    6001.5        .   179090  169615
                            end
                            format %tq quarter_date

                            This shows that in the first week of the quarter, there does not seem to be many firms within the industry reporting profit and hence the median is missing.
                            I do another check to see what happens to the data when we move to week 5 of the quarter for example.

                            Code:
                            dataex if series==5
                            
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input float(quarter_date series) double(median12_profit10 median12_profit15 median12_profit20 median12_profit25 median12_profit30 median12_profit35 median12_profit40 median12_profit45 median12_profit50 median12_profit55 median12_profit60)
                             84 5        .        .        .         .         .        .        .        .        .        .         .
                             85 5   495635   413030   162729 7880509.5  950430.5   469909   218809    80318   269902   240450         .
                             86 5   421406   387641   230862    282420    592600 466115.5   215385   901385   349532   237320         .
                             87 5   364446 211665.5   185508     42022    953480   681536   206854   876094   335595   258716     44609
                             88 5   244023   284244 364851.5    270129 1675094.5        .   205796    91036   294543   299134    210437
                             89 5 170244.5   205390   199544    598831    613900   625080        .    49878   338859   285737         .
                             90 5   157750   312766 247025.5    276197    593500   538468        .   142681   384407   242935     13836
                             91 5   126535   245589   131761    260854  566604.5   850256        .   316086   368145 285441.5         .
                             92 5   132375   186481 203418.5     93184  503031.5    36276    36934   214293   430157 263735.5    200474
                             93 5   103343   424018 139322.5   1407588  622268.5    53601    23993    56749   315675   299600      5914
                             94 5   118807   214739   113875     53286    593100    69365  28517.5    28379   358501   270909         .
                             95 5   139348   408572 116202.5     52567    594729    44755    29704  14629.5   331639   291983         .
                             96 5   102693   415871   123015     66464    593832     7679    42230 113124.5   302158   326744    233349
                             97 5   133583   104782   177549   99331.5  675182.5    84862    23037    46722   407300 396463.5      7414
                             98 5 224849.5   235764   137906  112326.5    685045    88833    23043    25377   483318   290457      4458
                             99 5   161355    21292   134756     56950    100413    42805    23968    34170 381165.5   317530      4346
                            100 5   104944   236926   142813   60206.5    655072     7651        .  55434.5   570084   307660    236111
                            101 5   395280 497159.5 147012.5   98657.5    875317  73530.5     8830    57590   482688 427604.5      6407
                            102 5 482685.5 126337.5   140594    215891    792644  72963.5        .    16117 451666.5 302594.5      5531
                            103 5   143073 466913.5   130591     60659   76790.5    42672        .    22818   442262   320078      4943
                            104 5   165769   315117   152386     39227  810256.5    26694        .    27850   667495   344501    227409
                            105 5 219893.5   200074   135862     54128   1002301    37640        .    27686   614741 324415.5    7894.5
                            106 5   166060   226509   120272  119371.5    913900    58206        .    23639   716177   295452    4987.5
                            107 5 229670.5   237337   130306     71034     59753    27466        .  27856.5   689059   341960      6224
                            108 5    90689   368507   159573     42737  540951.5   105237    61508    36382  1398110   335606    846926
                            109 5 186154.5   281866   158755     84429    895644    76519    29662    31903   556098   283998  844678.5
                            110 5   140055   342718   177225    210360  640523.5    98763   302200    30269   789761 281348.5   1769114
                            111 5   336500 344225.5   189075     96071    354857    56992    40066    26475   406459   252983     55951
                            112 5 110593.5   387072   244298     78311    229147    22673 434895.5    49118  2152700 350624.5    252724
                            113 5   137203   343140   139070    126987    855519   145931    33340    70078   264649   387064   25286.5
                            114 5 142079.5   381148 204449.5    207090    383122    51257    61782    34475 250100.5   338765   1702500
                            115 5   153423   342998 179954.5    157620  394412.5  28886.5    33692    36248   293242   317963   70016.5
                            116 5   183961   430593   207804   87862.5    611345    47688    90680  46348.5   620499   364921    285405
                            117 5   150765   329837   196577  102967.5    324082   141876    82996    53422   226212   361272     24557
                            118 5    89027 389516.5   141451    272267    367109   147025    87999  47102.5   241715   304240     40023
                            119 5   105714   369765   152513    124222  895483.5    48587  67134.5    43045   234947 317633.5   60247.5
                            120 5  74892.5   404732   239406   61474.5   1237502    62916    49522    46486   657449   382377   2516000
                            
                            end
                            format %tq quarter_date
                            Now I can see that the number of missing medians will be less as more firms within the industry are reporting profits. The number of missing values will be much lower when I move to week 11 for example.

                            I hope there are no mistakes in the way I applied the advice here.
                            In fact, I also give much credit to -asrol- of Attaullah Shah as it seems to be working well too.


                            Please @ Clyde Schechter and@attaullah Shah, let me know if my application of the commands you suggested is actually achieving what is meant to do in line with my objective. Of course, the question is open to everyone in the forum and I do appreciate any help


                            I look forward to hearing back

                            Comment


                            • #15
                              This looks right to me. I cannot vouch for the correct use of -asrol- because it is a program I don't myself use. But on its face, that command also appears correct, and, it is very similar to the syntax recommend to you by Attaullah Shah, so that is probably correct as well.

                              Comment

                              Working...
                              X