Difference in Differences with multiple events

David HajHooj

Join Date: Apr 2020

Posts: 18
#1

Difference in Differences with multiple events

16 May 2020, 09:54

Hey all,

I'm trying to apply difference in differences in stata for traders in stock markets before they use robot investing and after they use robot investing. My main dependent variable is portfolio performance, So I want to see how does their portfolio perform before and after using robot investing. The problem that concern me is that most of the things i saw on the internet had time variable (For example before 1994 after 1994). Where in my case i want to see before switching to robot, after switching to robot.

I have a variable for switching = 1 if investor switched to robot, 0 otherwise
and a variable that tells me whether this trade was made by a robot or human. 1 = robot, 0 = human.

Can someone please tell me how can I do the difference in differences in this case. Thanks
Tags: None
Kye Lippold

Join Date: Jun 2019

Posts: 67
#2

16 May 2020, 16:17

This is a straightforward enough problem--diff-in-diff can be easily done with the treatment (robo-investing in this case) turning on at different times. But to make sure my suggestions are relevant, could you please post an example of your data, using the -dataex- command? (See -help dataex- for information).
Comment

David HajHooj

Join Date: Apr 2020
Posts: 18

16 May 2020, 16:55

Originally posted by Kye Lippold View Post

This is a straightforward enough problem--diff-in-diff can be easily done with the treatment (robo-investing in this case) turning on at different times. But to make sure my suggestions are relevant, could you please post an example of your data, using the -dataex- command? (See -help dataex- for information).

Thank You Kye for responding,

My data looks like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
InvestorId StockId Switching AutomaticTrading Performance
20             32390   0                0              .
20             32411   0                0               .09867152
20             34733   0                0               .05027843
20             37041   0                0               .5676744
20             38587   0                0               .14006527
20             40238   0                0              -.13145956
20             42419   0                0               .03951561
20             43284   0                0               .0186967
20             44037   1                1              -.0415748
20             44060   0                0              1.1997385
20             44278   0                0              -.7503465
20             44435   0                0               .8813246
20             44953   0                0                .3260774
20             47458   0                0                .27319214
20             48892   0                0                .25726753
20             49943   0                0                .2555361
20             52181   0                0                 .252478
20             54990   1                1                 .23942897
20             55036   0                1                 .23070805
20             55060   0                1                 .2226026
20             55071   0                1                 .26550338
20             55075   0                1                 .20598054
end
label values AutomaticTrading autotrade
label def autotrade 0 "Human Trading", modify
label def autotrade 1 "Robot Trading", modify

Please Note that Switching is a dummy 1 = Switch to robot, 0 = didn't switch. And AutoTrading = 1 means trade attempted by robot and 0 is human trade

Last edited by David HajHooj; 16 May 2020, 17:20.

Comment

Kye Lippold

Join Date: Jun 2019

Posts: 67
#4

16 May 2020, 17:47

Great, the example is very helpful. I see you have investors trading multiple stocks, but no time variable for when the trade took place. You will need some sort of time ordering to be able to talk about results "after" switching to robo-investing.

Do you have any way to get a time or date for each trade? Or can we assume that stock IDs are assigned in order, meaning a higher value always means the sale took place after an earlier value? This information will determine how to set up the method.
Comment
David HajHooj

Join Date: Apr 2020

Posts: 18
#5

16 May 2020, 17:56

Originally posted by Kye Lippold View Post

Great, the example is very helpful. I see you have investors trading multiple stocks, but no time variable for when the trade took place. You will need some sort of time ordering to be able to talk about results "after" switching to robo-investing.

Do you have any way to get a time or date for each trade? Or can we assume that stock IDs are assigned in order, meaning a higher value always means the sale took place after an earlier value? This information will determine how to set up the method.

Thank you Kye, yes i have the time of each trade and they are in order. I have a variable that have the exact time of each trade and based on that time variable i generated tradeId where tradeId is 1,2,3,4...... 1 means first trade and so one.
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#6

16 May 2020, 18:15

Ok, great. Then to set up a diff-in-diff with multiple events, you would first merge the time variable on to the dataset (I will assume you name it time). You will want to have observations across multiple investors for each time period, so defining time at an aggregated level (such as day) would be better than using a very short period (such as second).

You can then run the following code. (I assume from your other posts that you are familiar with reghdfe already).

Code:

ssc install reghdfe // if needed xtset InvestorId time rename AutomaticTrading treated reghdfe Performance treated , absorb(InvestorID time) cluster(InvestorID)

In this case, the use of robo-investing is your indicator for being "treated". This indicator can turn on or off for each trade, but the coefficient on the treated variable will tell you the average increase in performance when trades are done while robo-investing, relative to what is expected for a non-robo trade by that investor at that time period (i.e. including investor and time fixed effects).

The "Switching" variable could also be useful if you want to see how the effects of robo-trading change over time... but that is going beyond the basic diff-in-diff.

Hope that helps!
1 like
Comment
David HajHooj

Join Date: Apr 2020

Posts: 18
#7

16 May 2020, 18:31

Originally posted by Kye Lippold View Post

Ok, great. Then to set up a diff-in-diff with multiple events, you would first merge the time variable on to the dataset (I will assume you name it time). You will want to have observations across multiple investors for each time period, so defining time at an aggregated level (such as day) would be better than using a very short period (such as second).

You can then run the following code. (I assume from your other posts that you are familiar with reghdfe already).

Code:

ssc install reghdfe // if needed xtset InvestorId time rename AutomaticTrading treated reghdfe Performance treated , absorb(InvestorID time) cluster(InvestorID)

In this case, the use of robo-investing is your indicator for being "treated". This indicator can turn on or off for each trade, but the coefficient on the treated variable will tell you the average increase in performance when trades are done while robo-investing, relative to what is expected for a non-robo trade by that investor at that time period (i.e. including investor and time fixed effects).

The "Switching" variable could also be useful if you want to see how the effects of robo-trading change over time... but that is going beyond the basic diff-in-diff.

Hope that helps!

Thank you so much Kye for your help! I appreciate it
Comment
Said Kaawach

Join Date: Mar 2019

Posts: 32
#8

17 May 2020, 10:15

Originally posted by Kye Lippold View Post

Ok, great. Then to set up a diff-in-diff with multiple events, you would first merge the time variable on to the dataset (I will assume you name it time). You will want to have observations across multiple investors for each time period, so defining time at an aggregated level (such as day) would be better than using a very short period (such as second).

You can then run the following code. (I assume from your other posts that you are familiar with reghdfe already).

Code:

ssc install reghdfe // if needed xtset InvestorId time rename AutomaticTrading treated reghdfe Performance treated , absorb(InvestorID time) cluster(InvestorID)

In this case, the use of robo-investing is your indicator for being "treated". This indicator can turn on or off for each trade, but the coefficient on the treated variable will tell you the average increase in performance when trades are done while robo-investing, relative to what is expected for a non-robo trade by that investor at that time period (i.e. including investor and time fixed effects).

The "Switching" variable could also be useful if you want to see how the effects of robo-trading change over time... but that is going beyond the basic diff-in-diff.

Hope that helps!

Hello Kye,

If someone attempted that regression, how can we know the effect of before using robot investing and after using robot investing?

Thanks
Comment
Kye Lippold

Join Date: Jun 2019

Posts: 67
#9

18 May 2020, 15:34

Hi Said-- If I understand your question correctly, you would look at the coefficient on the treated variable. Here is an example using the Grunfeld data (of several US companies). We will pretend that there was a (fake) treatment that applied from 1945 forward to company 1.

Code:

webuse grunfeld, clear xtset company year gen treated = year>=1945 & company==1 reghdfe kstock treated, absorb(company year) cluster(company)

The coefficient on treated tells us that the capital stock (kstock) of company 1 was 748 units higher than expected (for that company and year) during the period of treatment. So that is the DiD estimate of the effect of the treatment.
1 like
Comment
Said Kaawach

Join Date: Mar 2019

Posts: 32
#10

18 May 2020, 16:37

Originally posted by Kye Lippold View Post

Hi Said-- If I understand your question correctly, you would look at the coefficient on the treated variable. Here is an example using the Grunfeld data (of several US companies). We will pretend that there was a (fake) treatment that applied from 1945 forward to company 1.

Code:

webuse grunfeld, clear xtset company year gen treated = year>=1945 & company==1 reghdfe kstock treated, absorb(company year) cluster(company)

The coefficient on treated tells us that the capital stock (kstock) of company 1 was 748 units higher than expected (for that company and year) during the period of treatment. So that is the DiD estimate of the effect of the treatment.

Thank you Kye!
Comment

Announcement