Estimating the mean of a variable for all other firms in the same industry excluding the current firm

Nikos Tsileponis

Join Date: Jul 2014

Posts: 72
#1

Estimating the mean of a variable for all other firms in the same industry excluding the current firm

23 Jun 2018, 07:57

Dear all,

I think I have a relatively simple question which is giving me a hard time. I have daily data with abnormal returns (ar) for a sample of companies for a number of years. id is the company identifier and ind is the industry identifier. yearq is the year-quarter.

I would like to estimate the mean of ar for all other companies in the same industry, for every yearq. In other words, let's say for example that my first 10 companies (i.e., id=1, 2, ..., 10) belong to the first industry (ind==1). Given that my data is daily, I have daily abnormal returns for each of these companies for a number of years. When id=1, I would like to calculate the mean(ar) of the other 9 companies without taking this company into consideration. How can I do that? I would appreciate your help.

Thank you in advance. Please let me know if you need additional information.

Best,
Nikos
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35696

23 Jun 2018, 08:50

Often asked and answered here.

For an old-style answer, see https://www.stata.com/support/faqs/d...ng-properties/ (which was written largely in response to questions on Statalist)

As that FAQ's most recent edit advises, a newer-style answer is to look at rangestat (SSC).

You don't give a data example (describing your data is helpful but not as helpful as a real(istic) example: FAQ Advice #12), so consider this game with Stata's Grunfeld data.

Code:

. webuse grunfeld , clear

. gen industry = company > 5

. rangestat (mean) mvalue, int(year 0 0) by(industry) excludeself

. sort year industry company

. list in 1/5

     +------------------------------------------------------------------------+
     | company   year   invest   mvalue   kstock   time   industry   mvalue~n |
     |------------------------------------------------------------------------|
  1. |       1   1935    317.6   3078.5      2.8      1          0     777.05 |
  2. |       2   1935    209.9   1362.4     53.8      1          0   1206.075 |
  3. |       3   1935     33.1   1170.6     97.8      1          0   1254.025 |
  4. |       4   1935    40.29    417.5     10.5      1          0     1442.3 |
  5. |       5   1935    39.68    157.7    183.2      1          0    1507.25 |
     +------------------------------------------------------------------------+

. di (mvalue[1] + mvalue[2] + mvalue[3] + mvalue[4])/4
1507.25

Using the option name excludeself as a search term will turn up several other examples.

Last edited by Nick Cox; 23 Jun 2018, 08:53.

Comment

Nikos Tsileponis

Join Date: Jul 2014
Posts: 72

23 Jun 2018, 11:29

Dear Nick,

Thank you very much for your reply. Your code is very helpful. However, it is not doing exactly what I want. Let me share a data example:

Code:

id    date    yearq    year    ar    ind
18    04jan2010    2010q1    2010    -.0015109    31
18    06jan2010    2010q1    2010    -.016261    31
18    07jan2010    2010q1    2010    .0010408    31
28    20jan2010    2010q1    2010    .0119183    31
28    21jan2010    2010q1    2010    .0009229    31
28    22jan2010    2010q1    2010    -.0033737    31
28    25jan2010    2010q1    2010    .0022921    31
28    28jan2010    2010q1    2010    .0038215    31
28    29jan2010    2010q1    2010    .003191    31
28    01feb2010    2010q1    2010    -.0142662    31
28    02feb2010    2010q1    2010    -.0153596    31
42    02mar2010    2010q1    2010    .0092511    31
42    03mar2010    2010q1    2010    .0035187    31
42    04mar2010    2010q1    2010    .0025556    31
134    01feb2010    2010q1    2010    .014595    32
134    02feb2010    2010q1    2010    .0052227    32
134    03feb2010    2010q1    2010    -.0042054    32
134    04feb2010    2010q1    2010    -.0075812    32
134    05feb2010    2010q1    2010    -.0158029    32
134    08feb2010    2010q1    2010    .0052976    32
134    09feb2010    2010q1    2010    .0251309    32
134    10feb2010    2010q1    2010    .0231058    32
134    11feb2010    2010q1    2010    .0061567    32
134    12feb2010    2010q1    2010    .0014083    32

If I use excludeself as you suggest, it only excludes the current observation. This is not exactly what I would like it to do because I am dealing with daily data. Specifically, I would like to estimate the mean(ar) for all other firms that belong to the same industry (and I want to do this for every year-quarter). So, I need to exclude from the calculations all observations that belong to the same id, and not just the current observation. In the above example, if we focus on industry==31, yearq==2010q1 and id==18, in the first 3 observations (i.e., the ones that belong to id==18) I need to estimate the mean of ar for firms 28 and 42 (i.e., the ids that belong to the same industry) WITHOUT taking into consideration the values of ar for id==18.

Thank you.

Best,
Nikos

Comment

Robert Picard

Join Date: Mar 2014

Posts: 1536
#4

24 Jun 2018, 08:29

A mean is just the total divided by the number of observations. To get what you want, calculate the total and number of observations within the desired group (ind yearq) and do the same for groups whose values you want to exclude (ind yearq id) and then calculate the desired mean from these components. Something like:

Code:

sort ind yearq id by ind yearq: egen tot_all = total(ar) by ind yearq: egen N_all = count(ar) by ind yearq id: egen tot_self = total(ar) by ind yearq id: egen N_self = count(ar) gen wanted = (tot_all - tot_self) / (N_all - N_self)
1 like
Comment
Nikos Tsileponis

Join Date: Jul 2014

Posts: 72
#5

24 Jun 2018, 08:42

Dear Robert,

Thank you very much for reply. Yes, what you say makes sense and I have also thought about it. However, I was wondering whether there is another way of doing it, e.g., using rangestat as Nick Cox suggested above, because I need to do the same for the mean, sd, skewness and kurtosis. I haven't been able to achieve the desired outcome using rangestat unfortunately. The excludeself option that rangestat offers is very useful, but it does not solve my problem as I explain above, because I don't want to exclude just the current observations but all observations that belong to that firm. Your solution would work for the mean, but I am not sure about the sd, skewness and kurtosis. And I am not sure if I can do it with rangestat.

Any help would be highly appreciated. Thank you.

Best,
Nikos
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

24 Jun 2018, 09:02

This can be done with rangestat but would require you write your own Mata routines to exclude the desired observations. With your problem, it's simpler to use rangerun (also from SSC):

Code:

program drop _all
program stat_exclude
    sum ar if id != rr_id, detail
    gen double ar_N    = r(N)    
    gen double ar_mean = r(mean)    
    gen double ar_sd   = r(sd)    
    gen double ar_skew = r(skewness)
    gen double ar_kurt = r(kurtosis)    
end
rangerun stat_exclude, by(ind yearq) interval(id . .) sprefix(rr_)

Comment

Nikos Tsileponis

Join Date: Jul 2014

Posts: 72
#7

25 Jun 2018, 07:30

Dear Robert,

Thank you very much for your reply. It is very helpful indeed. I appreciate it a lot.

Best,
Nikos
Comment

Announcement