Average values across variables

Raul Athwall

Join Date: Dec 2022

Posts: 96
#1

Average values across variables

19 Dec 2022, 04:01

Hi,
I have 30 variables with values on each day spanning 2004-2022. I want to average each word's value for each day t. How would I go about doing this, so that I have a new variable as the average of each day?
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3449
#2

19 Dec 2022, 04:23

It depends on how your data is organized. Can you give us an example?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Raul Athwall

Join Date: Dec 2022
Posts: 96

19 Dec 2022, 04:38

Code:

date    cost    cheap    donation    asset    Competitiveadvantage    France    Gold
1/1/2004    10    70    0    0    0    23    6
1/2/2004    15    75    0    6    0    23    10
1/3/2004    10    65    0    0    0    20    9
1/4/2004    11    76    0    0    0    24    10
1/5/2004    0    44    0    0    0    22    21
1/6/2004    15    74    0    15    0    25    10
1/7/2004    10    67    5    15    0    22    11
1/8/2004    16    64    0    14    0    24    12
1/9/2004    15    58    3    11    5    24    9
1/10/2004    14    79    0    0    5    25    14
1/11/2004    14    81    0    18    7    24    14
1/12/2004    19    74    0    5    0    27    9
1/13/2004    19    70    0    13    2    24    11
1/14/2004    17    65    0    10    2    24    11
1/15/2004    20    61    0    7    4    21    10
1/16/2004    14    63    0    8    0    19    10

The data is like this, so I have many words where I want to average their value over each day

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#4

19 Dec 2022, 06:04

Raul:
you may want to consider the -rowmean- function available from -egen-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Raul Athwall

Join Date: Dec 2022

Posts: 96
#5

19 Dec 2022, 08:47

Thank you Carlo.

With my 50 words, I regressed these on another variable and computed a t-statistic. With my code

Code:

egen UKIS = rmean(varlist)

, I only want to include those words that had a negative t-statistic. Is there a shortcut for this?
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1379
#6

19 Dec 2022, 20:53

Raul Athwall I don't think I understand what you are doing. It is not clear to me if taking a mean across these variables (as the rowmean function for egen does) makes conceptual sense. Is that really what you want?

On the other hand, finding the mean for every variable (separately) for each day is also not making sense since your data example suggests you have just one observation per day.

Could you please clarify?
Comment
Raul Athwall

Join Date: Dec 2022

Posts: 96
#7

21 Dec 2022, 03:13

I essentially want the mean for each variable on each day e.g looking at my data on the first day I want to add the observations for 'cost', 'cheap', 'donation' etc and get an average for this on that particular day, and do this for every day within my dataset (from 2004-2022). It isn't a mean for each variable but an average observation on each day.
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1379
#8

21 Dec 2022, 03:22

Could I ask what the larger purpose is? I am struggling to imagine what this "average observation" would be useful for.
Comment
Raul Athwall

Join Date: Dec 2022

Posts: 96
#9

21 Dec 2022, 03:26

Essentially my project is creating an index of google trends words to relate this to the UK stock market to see if there is a relationship. I have data on each of these words and calculated log daily differences in the words, and to create my index I want the average daily difference of all the words I have chosen on day t
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1379
#10

21 Dec 2022, 03:51

Ah I see, thanks for the explanation. You mentioned you wanted to restrict it to the variables with negative t statistics. Could you show your code for generating those statistics?
Comment

Raul Athwall

Join Date: Dec 2022
Posts: 96

#11

21 Dec 2022, 04:08

Code:

foreach var of varlist ldiffcost_w-ldiffexpense_w {
  2. regress `var' rmrf, robust
  3. }

Linear regression                               Number of obs     =      3,536
                                                F(1, 3534)        =       0.98
                                                Prob > F          =     0.3225
                                                R-squared         =     0.0003
                                                Root MSE          =     .11711

------------------------------------------------------------------------------
             |               Robust
 ldiffcost_w | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        rmrf |   .1873577   .1893697     0.99   0.323    -.1839274    .5586427
       _cons |   -.003509   .0019697    -1.78   0.075    -.0073708    .0003528
------------------------------------------------------------------------------

I have around 30 that have a negative t-statistic, and these are the only ones I want to include in my average

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1379
#12

21 Dec 2022, 04:19

So from what little I have understood, here is the path I would take:
first, note that the sign of a t statistic is completely determined by the sign of the coefficient itself.

second, we can access the coefficients using the e(b) matrix that is stored after any regression. For a simple regression, the slope coefficient is accessed by e(b)[1,1]

So in the loop running the regressions, I would do something like this (assuming you want to collect the names of the dependent variables from the regressions for which the slope coefficient is negative):

Code:

local negvars foreach var of varlist ldiffcost_w - ldiffexpense_w { regress `var' rmrf, robust if e(b)[1,1] < 0 local negvars `negvars' `var' }

and then later, I would use the local macro we created in the egen command described before:

Code:

egen UKIS = rowmean(`negvars')
Comment
Raul Athwall

Join Date: Dec 2022

Posts: 96
#13

22 Dec 2022, 03:20

Just so I have understood correctly, this isolates those with negative slope coefficient (also negative t statistics), and put this into the rowmean function?
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1379
#14

22 Dec 2022, 03:26

Yes: this collects the dependent variables (among those in the variable list in the foreach var of varlist ... statement) which have a negative slope coefficient when regressed on rmrf, and puts them into the local macro negvars, so they can be averaged using the rowmean function of the egen command.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35662
#15

22 Dec 2022, 04:05

Whether this is a good idea is another question. For a start, the criterion necessarily lets through many variables with a weak relationship that qualifies as being negative But if this was set for me as an assignment, I wouldn't want to lump words together any way.
Comment

Announcement

Average values across variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment