Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating a mean within a time period

    Dear Statalist members,

    I have a data sample in which I want to calculate the mean (among other things) within a time window of 1.5 years before and 1.5 years after the announcement date. The observations also have to be within the same industry (SIC code). My data sample looks like this:

    Code:
    clear
    input str4 tsic str10 date float(DateAnn FSPUebitdam FSPUtassets)
    "1021" "11-19-2006" 17124 5.8200002 15270.264
    "1041" "06-21-2000" 14782 14.81 615.62207
    "1041" "06-25-2001" 15151 15.37 1729.0763
    "1041" "02-04-2002" 15375 26.16 416.08307
    "1041" "09-27-2002" 15610 293.67001 9.1161404
    "1041" "04-27-2005" 16553 . 192.69046
    "1041" "07-15-2005" 16632 . .
    "1041" "08-31-2006" 17044 107.818 753.26996
    "1041" "02-05-2007" 17202 . 136.82915
    "1041" "03-05-2007" 17230 68.138 33.648335
    end
    In another post I found this, which enables to calculate the mean of the previous 1.5 year:

    Code:
    sort tsic DateAnn
    local more 1
    local i 0
    local vtype : type FSPUebitdam
    local window = 365 * 1.5
    while `more' {
        by tsic: gen `vtype' pv_`++i' = FSPUebitdam[_n-`i'] ///
            if (DateAnn - DateAnn[_n-`i']) < `window'
        qui count if !mi(pv_`i')
        local more = r(N)
    }
    However, as this does not include the 1.5 years after the announcement date I added this as another requirement:

    Code:
    | (DateAnn[_n+`i'] - DateAnn) < `window'
    It also doesn't include the value of the current observation, that's why I reformulated the code above to:

    Code:
    by tsic: gen `vtype' pv_`++i' = FSPUebitdam[_n-`i'] + FSPUebitdam///
    However, both my adjustment unfortunately do not work. Optionally, I would like to add another requirement, the assets have to be within a range of -1.5x +1.5x of the current observation.
    I hope someone can help me on this.

    Best,
    Max
    Last edited by Max Kruitwagen; 27 Jul 2015, 08:30.

  • #2
    Thanks for providing an easy to work with data example. You were almost there. Here's an example that looks both forward and backward for observations that are within the same sic code and within a +/- 1.5 year window from the current observation. The value is copied only if the assets are within +/-1.5 of those of the current observation. Once the temporary variables are created, you use egen row functions to calculate the mean (and whatever else is needed).

    Code:
    clear
    input str4 tsic float(DateAnn FSPUebitdam FSPUtassets)
    "1021" 17124 5.8200002 15270.264 
    "1041" 14782 14.81 615.62207 
    "1041" 15151 15.37 1729.0763 
    "1041" 15375 26.16 416.08307 
    "1041" 15610 293.67001 9.1161404 
    "1041" 16553 . 192.69046 
    "1041" 16632 . . 
    "1041" 17044 107.818 753.26996 
    "1041" 17202 . 136.82915 
    "1041" 17230 68.138 33.648335 
    end
    format %td DateAnn
    
    * the following solution assumes that there are no missing for these
    assert !mi(tsic,DateAnn)
    
    * look back in time
    sort tsic DateAnn
    local more 1
    local i 0
    local vtype : type FSPUebitdam
    local window = 365 * 1.5
    qui while `more' {
        by tsic: gen doit = (DateAnn - DateAnn[_n-`i']) < `window'
        count if doit
        if r(N) {
            by tsic: gen `vtype' p_`++i' = FSPUebitdam[_n-`i'] if doit & ///
                inrange(FSPUtassets[_n-`i'], -1.5*FSPUtassets, 1.5*FSPUtassets)
            local more 1
        }
        else local more 0
        drop doit
    }
    
    * look forward in time
    local more 1
    local i 0
    qui while `more' {
        by tsic: gen doit = (DateAnn[_n+`i'] - DateAnn) < `window'
        count if doit
        if r(N) {
            by tsic: gen `vtype' f_`++i' = FSPUebitdam[_n+`i'] if doit & ///
                inrange(FSPUtassets[_n+`i'], -1.5*FSPUtassets, 1.5*FSPUtassets)
            local more 1
        }
        else local more 0
        drop doit
    }
    
    * use egen row functions to calculate descriptive statistics; the
    * following includes the current observation
    egen mFSPUebitdam = rowmean(p_* FSPUebitdam f_*)

    Comment


    • #3
      Thanks a lot for you help Robert, it works perfectly!

      Comment


      • #4
        Dear Statalist,

        As I thought I had figured out my problem described above with the code of Robert there seems to be a small problem. With the variables I create from the code above I'm doing a regression analysis and I noticed when I reopened Stata the results were different. After analyzing where the problem was I found out that every time I run the code above I get a different result. It seems like I should sort the data or something like that. However, I've tried:
        Code:
        by tsic DateAnn, sort : egen total = rowtotal(p_* FSPUebitdam f_*)
        When I do this I get an error saying that: egen .... rowtotal() may not be combined with by. Also a simple sort above the code does not seem to work. Can someone help me out?

        Best,
        Max

        Comment


        • #5
          To adjust my comment #4, I found out that it has nothing to do with sorting the data. I used the code below to create a portfolio:
          Code:
          *ebitdam portfolio sic4*
          sort tsic DateAnn FSPUebitdam
          local more 1
          local i 0
          local vtype : type FSPUebitdam
          local window = 365 * 1.5
          qui while `more' {
              by tsic: gen doit = (DateAnn - DateAnn[_n-`i']) < `window'
              count if doit
              if r(N) {
                  by tsic: gen `vtype' p_`++i' = FSPUebitdam[_n-`i'] if doit
                  local more 1
              }
              else local more 0
              drop doit
          }
          
          local more 1
          local i 0
          qui while `more' {
              by tsic: gen doit = (DateAnn[_n+`i'] - DateAnn) < `window'
              count if doit
              if r(N) {
                  by tsic: gen `vtype' f_`++i' = FSPUebitdam[_n+`i'] if doit
                  local more 1
              }
              else local more 0
              drop doit
          }
          
          egen nvals = rownonmiss(p_* FSPUebitdam f_*)
          egen total = rowtotal(p_* FSPUebitdam f_*)
          gen mgross = total / nvals
          egen mean = rowmean(p_* FSPUebitdam f_*)
          egen median = rowmedian(p_* FSPUebitdam f_*)
          
          drop p_* f_*
          After doing this I repeat this step but instead of using sic4 I use sic3 (I relax my requirements). Code:
          Code:
          *ebitdam portfolio sic3*
          sort TSIC3 DateAnn FSPUebitdam
          local more 1
          local i 0
          local vtype : type FSPUebitdam
          local window = 365 * 1.5
          qui while `more' {
              by TSIC3: gen doit = (DateAnn - DateAnn[_n-`i']) < `window'
              count if doit
              if r(N) {
                  by TSIC3: gen `vtype' p_`++i' = FSPUebitdam[_n-`i'] if doit
                  local more 1
              }
              else local more 0
              drop doit
          }
          
          local more 1
          local i 0
          qui while `more' {
              by TSIC3: gen doit = (DateAnn[_n+`i'] - DateAnn) < `window'
              count if doit
              if r(N) {
                  by TSIC3: gen `vtype' f_`++i' = FSPUebitdam[_n+`i'] if doit
                  local more 1
              }
              else local more 0
              drop doit
          }
          
          egen nvals2 = rownonmiss(p_* FSPUebitdam f_*)
          egen total2 = rowtotal(p_* FSPUebitdam f_*)
          gen mgross2 = total / nvals
          egen mean2 = rowmean(p_* FSPUebitdam f_*)
          egen median2 = rowmedian(p_* FSPUebitdam f_*)
          
          drop p_* f_*
          First I compared the data if I run my dofile only for sic4 or sic3 (without the other) and the data was every time exactly the same. However, when I run both of the codes together, and I do this twice and compare the data it is not the same anymore. To be honest I'm not sure how this happens? I hope someone else does and can provide me an answer.

          Best,

          Max

          Edit:
          I'm noticing now that if I want to calculate something with, for example, the row mean, the results are inconsistent as wel. When I for example use this:
          Code:
          gen discount = 1 - (multiple / mean)
          Should I do something special when using rowmean, rowmedian etc? Is it caused by this?
          Last edited by Max Kruitwagen; 05 Aug 2015, 05:43.

          Comment


          • #6
            This is almost certainly due to the sort order that changes at every run. I don't understand what you mean when you say you run both of the codes together. What's different in what you do compared to when you process each sic codes separately?

            An quick way to isolate where the randomness of the sort order occurs is to do save a copy of the dataset up to that point and run again, this time checking for changes using the cf command. Something like:

            Code:
            clear
            input str4 tsic str10 date float(DateAnn FSPUebitdam FSPUtassets)
            "1021" "11-19-2006" 17124 5.8200002 15270.264
            "1041" "06-21-2000" 14782 14.81 615.62207
            "1041" "06-25-2001" 15151 15.37 1729.0763
            "1041" "02-04-2002" 15375 26.16 416.08307
            "1041" "09-27-2002" 15610 293.67001 9.1161404
            "1041" "04-27-2005" 16553 . 192.69046
            "1041" "07-15-2005" 16632 . .
            "1041" "08-31-2006" 17044 107.818 753.26996
            "1041" "02-05-2007" 17202 . 136.82915
            "1041" "03-05-2007" 17230 68.138 33.648335
            end
            
            sort FSPUtassets
            sort tsic
            save "deleteme.dta", replace
            
            clear
            input str4 tsic str10 date float(DateAnn FSPUebitdam FSPUtassets)
            "1021" "11-19-2006" 17124 5.8200002 15270.264
            "1041" "06-21-2000" 14782 14.81 615.62207
            "1041" "06-25-2001" 15151 15.37 1729.0763
            "1041" "02-04-2002" 15375 26.16 416.08307
            "1041" "09-27-2002" 15610 293.67001 9.1161404
            "1041" "04-27-2005" 16553 . 192.69046
            "1041" "07-15-2005" 16632 . .
            "1041" "08-31-2006" 17044 107.818 753.26996
            "1041" "02-05-2007" 17202 . 136.82915
            "1041" "03-05-2007" 17230 68.138 33.648335
            end
            
            sort FSPUtassets
            sort tsic
            
            cf _all using "deleteme.dta", all

            Comment


            • #7
              I found the issue, it had to do with sorting, tanks. I started looking into my dofile since every time I run my dofile and end with a regression, different results occur from this regression. I use a simple regress, robust regression and I've checked all variables and they seem consistent when running my dofile again. Do you have any idea how this is possible? (I know it's exactly a new threat).

              Comment


              • #8
                I don't understand, you found the issue but you still have a problem?

                Comment


                • #9
                  I'm sorry if I didn't make myself clear. I thought there was a problem with my (your) code but this was not true, so this is "solved". The problem is, when I perform a regression my results are like Regression1, which I added. However, when I run my dofile completely over again and do the same regression I get the results from Regression2. While when analyzing my variables they are exactly the same (same median, mean and percentiles). I was wondering how this is possible, or am I missing something?

                  By the way, I'm using regress, robust as test.

                  Comment


                  • #10
                    No, this is not right, these results are not even close except for the last column. Clearly, you are feeding something different to these regressions at each run of the do-file. You can insert the following command in several places in your do-file and compare the results at each run to identify where the data changes.

                    Code:
                    datasignature

                    Comment


                    • #11
                      When I use the datasignature it does not find any changes, also not with the strict option. However as I was analyzing my data I thought it could be due to the fact that some dates occur multiple times within the a sic code, which could have an effect.

                      When I analyzed my data I also noticed that the formula appears not be working like it is supposed to. For some reason not all time windows are working correctly, here are two examples:

                      Code:
                      clear
                      input str10 date float DateAnn str4 tsic float(FSPUebitdam nvals total mgross)
                      "06-01-2005" 16588 "3021" 7.0799999 2 19.369999 9.6849995
                      "08-02-2005" 16650 "3021" 12.29 2 19.369999 9.6849995
                      "05-06-2008" 17658 "3021" . 2 26.250999 13.1255
                      "07-05-2012" 19179 "3021" 13.961 1 13.961 13.961
                      "01-16-2013" 19374 "3021" . 1 13.961 13.961
                      "05-29-2000" 14759 "3052" . 0 0 .
                      end
                      In the third observation there are 2 values found which a total of 26.25. However, as the window is 365*1.5 I'm wondering how the previous observation (12.29) which is a 1000 days ago is included and the fourth observation as well (13.961) while the difference in days is 1500 days.

                      The same is found in the example below when looking at the fifth and sixth observation.
                      Code:
                      clear
                      input str10 date float DateAnn str4 tsic float(FSPUebitdam nvals total mgross)
                      "03-13-2000" 14682 "2892" 14.24 1 14.24 14.24
                      "07-26-2000" 14817 "2899" . 1 10.85 10.85
                      "08-30-2000" 14852 "2899" 10.85 1 10.85 10.85
                      "10-07-2004" 16351 "2899" . 3 44.646004 14.882001
                      "01-03-2006" 16805 "2899" 12.585 4 53.914001 13.4785
                      "03-08-2006" 16868 "2899" 21.211 4 53.914001 13.4785
                      "08-31-2006" 17044 "2899" . 3 43.064003 14.354668
                      "07-11-2008" 17724 "2899" 9.2679996 2 19.550999 9.7754993
                      "07-11-2011" 18819 "2899" 10.283 3 28.769999 9.5899992
                      "01-27-2012" 19019 "2899" 9.2189999 3 28.769999 9.5899992
                      end

                      Comment


                      • #12
                        You are indeed quite right that there was something wrong in the original solution, sorry about that. The tracking of the lag/lead observations via `i' was not correctly handled. The following should fix the problem (using the combined examples in the previous post).

                        Code:
                        clear
                        input str10 date float DateAnn str4 tsic float(FSPUebitdam nvals total mgross)
                        "06-01-2005" 16588 "3021" 7.0799999 2 19.369999 9.6849995
                        "08-02-2005" 16650 "3021" 12.29 2 19.369999 9.6849995
                        "05-06-2008" 17658 "3021" . 2 26.250999 13.1255
                        "07-05-2012" 19179 "3021" 13.961 1 13.961 13.961
                        "01-16-2013" 19374 "3021" . 1 13.961 13.961
                        "05-29-2000" 14759 "3052" . 0 0 .
                        "03-13-2000" 14682 "2892" 14.24 1 14.24 14.24
                        "07-26-2000" 14817 "2899" . 1 10.85 10.85
                        "08-30-2000" 14852 "2899" 10.85 1 10.85 10.85
                        "10-07-2004" 16351 "2899" . 3 44.646004 14.882001
                        "01-03-2006" 16805 "2899" 12.585 4 53.914001 13.4785
                        "03-08-2006" 16868 "2899" 21.211 4 53.914001 13.4785
                        "08-31-2006" 17044 "2899" . 3 43.064003 14.354668
                        "07-11-2008" 17724 "2899" 9.2679996 2 19.550999 9.7754993
                        "07-11-2011" 18819 "2899" 10.283 3 28.769999 9.5899992
                        "01-27-2012" 19019 "2899" 9.2189999 3 28.769999 9.5899992
                        
                        end
                        format %td DateAnn
                        
                        * the following solution assumes that there are no missing for these
                        assert !mi(tsic,DateAnn)
                        
                        * look back in time
                        sort tsic DateAnn
                        local more 1
                        local i 1
                        local vtype : type FSPUebitdam
                        local window = 365 * 1.5
                        qui while `more' {
                            by tsic: gen doit = (DateAnn - DateAnn[_n-`i']) < `window'
                            count if doit
                            if r(N) {
                                by tsic: gen `vtype' p_`i' = FSPUebitdam[_n-`i'] if doit
                                local ++i
                                local more 1
                            }
                            else local more 0
                            drop doit
                        }
                        
                        * look forward in time
                        local more 1
                        local i 1
                        qui while `more' {
                            by tsic: gen doit = (DateAnn[_n+`i'] - DateAnn) < `window'
                            count if doit
                            if r(N) {
                                by tsic: gen `vtype' f_`i' = FSPUebitdam[_n+`i'] if doit
                                local ++i
                                local more 1
                            }
                            else local more 0
                            drop doit
                        }
                        
                        * use egen row functions to calculate descriptive statistics; the
                        * following includes the current observation
                        egen mFSPUebitdam = rowmean(p_* FSPUebitdam f_*)

                        Comment


                        • #13
                          Thanks it works great now!

                          Comment

                          Working...
                          X