Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimating the mean of a variable for all other firms in the same industry excluding the current firm

    Dear all,

    I think I have a relatively simple question which is giving me a hard time. I have daily data with abnormal returns (ar) for a sample of companies for a number of years. id is the company identifier and ind is the industry identifier. yearq is the year-quarter.

    I would like to estimate the mean of ar for all other companies in the same industry, for every yearq. In other words, let's say for example that my first 10 companies (i.e., id=1, 2, ..., 10) belong to the first industry (ind==1). Given that my data is daily, I have daily abnormal returns for each of these companies for a number of years. When id=1, I would like to calculate the mean(ar) of the other 9 companies without taking this company into consideration. How can I do that? I would appreciate your help.

    Thank you in advance. Please let me know if you need additional information.

    Best,
    Nikos


  • #2
    Often asked and answered here.

    For an old-style answer, see https://www.stata.com/support/faqs/d...ng-properties/ (which was written largely in response to questions on Statalist)

    As that FAQ's most recent edit advises, a newer-style answer is to look at rangestat (SSC).

    You don't give a data example (describing your data is helpful but not as helpful as a real(istic) example: FAQ Advice #12), so consider this game with Stata's Grunfeld data.

    Code:
    . webuse grunfeld , clear
    
    . gen industry = company > 5
    
    . rangestat (mean) mvalue, int(year 0 0) by(industry) excludeself
    
    . sort year industry company
    
    . list in 1/5
    
         +------------------------------------------------------------------------+
         | company   year   invest   mvalue   kstock   time   industry   mvalue~n |
         |------------------------------------------------------------------------|
      1. |       1   1935    317.6   3078.5      2.8      1          0     777.05 |
      2. |       2   1935    209.9   1362.4     53.8      1          0   1206.075 |
      3. |       3   1935     33.1   1170.6     97.8      1          0   1254.025 |
      4. |       4   1935    40.29    417.5     10.5      1          0     1442.3 |
      5. |       5   1935    39.68    157.7    183.2      1          0    1507.25 |
         +------------------------------------------------------------------------+
    
    . di (mvalue[1] + mvalue[2] + mvalue[3] + mvalue[4])/4
    1507.25
    Using the option name excludeself as a search term will turn up several other examples.
    Last edited by Nick Cox; 23 Jun 2018, 08:53.

    Comment


    • #3
      Dear Nick,

      Thank you very much for your reply. Your code is very helpful. However, it is not doing exactly what I want. Let me share a data example:

      Code:
      id    date    yearq    year    ar    ind
      18    04jan2010    2010q1    2010    -.0015109    31
      18    06jan2010    2010q1    2010    -.016261    31
      18    07jan2010    2010q1    2010    .0010408    31
      28    20jan2010    2010q1    2010    .0119183    31
      28    21jan2010    2010q1    2010    .0009229    31
      28    22jan2010    2010q1    2010    -.0033737    31
      28    25jan2010    2010q1    2010    .0022921    31
      28    28jan2010    2010q1    2010    .0038215    31
      28    29jan2010    2010q1    2010    .003191    31
      28    01feb2010    2010q1    2010    -.0142662    31
      28    02feb2010    2010q1    2010    -.0153596    31
      42    02mar2010    2010q1    2010    .0092511    31
      42    03mar2010    2010q1    2010    .0035187    31
      42    04mar2010    2010q1    2010    .0025556    31
      134    01feb2010    2010q1    2010    .014595    32
      134    02feb2010    2010q1    2010    .0052227    32
      134    03feb2010    2010q1    2010    -.0042054    32
      134    04feb2010    2010q1    2010    -.0075812    32
      134    05feb2010    2010q1    2010    -.0158029    32
      134    08feb2010    2010q1    2010    .0052976    32
      134    09feb2010    2010q1    2010    .0251309    32
      134    10feb2010    2010q1    2010    .0231058    32
      134    11feb2010    2010q1    2010    .0061567    32
      134    12feb2010    2010q1    2010    .0014083    32
      If I use excludeself as you suggest, it only excludes the current observation. This is not exactly what I would like it to do because I am dealing with daily data. Specifically, I would like to estimate the mean(ar) for all other firms that belong to the same industry (and I want to do this for every year-quarter). So, I need to exclude from the calculations all observations that belong to the same id, and not just the current observation. In the above example, if we focus on industry==31, yearq==2010q1 and id==18, in the first 3 observations (i.e., the ones that belong to id==18) I need to estimate the mean of ar for firms 28 and 42 (i.e., the ids that belong to the same industry) WITHOUT taking into consideration the values of ar for id==18.

      Thank you.

      Best,
      Nikos



      Comment


      • #4
        A mean is just the total divided by the number of observations. To get what you want, calculate the total and number of observations within the desired group (ind yearq) and do the same for groups whose values you want to exclude (ind yearq id) and then calculate the desired mean from these components. Something like:

        Code:
        sort ind yearq id
        by ind yearq: egen tot_all = total(ar)
        by ind yearq: egen N_all   = count(ar)
        by ind yearq id: egen tot_self = total(ar)
        by ind yearq id: egen N_self   = count(ar)
        gen wanted = (tot_all - tot_self) / (N_all - N_self)

        Comment


        • #5
          Dear Robert,

          Thank you very much for reply. Yes, what you say makes sense and I have also thought about it. However, I was wondering whether there is another way of doing it, e.g., using rangestat as Nick Cox suggested above, because I need to do the same for the mean, sd, skewness and kurtosis. I haven't been able to achieve the desired outcome using rangestat unfortunately. The excludeself option that rangestat offers is very useful, but it does not solve my problem as I explain above, because I don't want to exclude just the current observations but all observations that belong to that firm. Your solution would work for the mean, but I am not sure about the sd, skewness and kurtosis. And I am not sure if I can do it with rangestat.

          Any help would be highly appreciated. Thank you.

          Best,
          Nikos

          Comment


          • #6
            This can be done with rangestat but would require you write your own Mata routines to exclude the desired observations. With your problem, it's simpler to use rangerun (also from SSC):
            Code:
            program drop _all
            program stat_exclude
                sum ar if id != rr_id, detail
                gen double ar_N    = r(N)    
                gen double ar_mean = r(mean)    
                gen double ar_sd   = r(sd)    
                gen double ar_skew = r(skewness)
                gen double ar_kurt = r(kurtosis)    
            end
            rangerun stat_exclude, by(ind yearq) interval(id . .) sprefix(rr_)

            Comment


            • #7
              Dear Robert,

              Thank you very much for your reply. It is very helpful indeed. I appreciate it a lot.

              Best,
              Nikos

              Comment

              Working...
              X