Test on Herd Behavior

Guest
#1

Test on Herd Behavior

06 Mar 2018, 12:54

Greetings,

I am new to Stata and needless to say, this forum. I apologise in advance for any ignorant question or statement that ensues.

I am trying to detect herd behaviour in a stock market and run the following empirical specification:

Before we go into details, I declare that I have the Stock Prices, Trading Volume and Opening Price of a market. I will have to find the market return,R_(m,t) which is given by R_t = 100*[log P_t - log P_(t-1)], where P_t is just Stock Price at time t, is it possible to do it in Stata? Secondly, after I have derived the value of R_(m,t), how do I go about to run the regression on Eq. (3)?

Any help would be very much appreciated.

Last edited by sladmin; 09 Apr 2018, 08:53. Reason: anonymize poster
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

06 Mar 2018, 13:17

I think that, unless there is somebody out there who is already familiar with this particular problem, you will need to clarify your question in two ways.

1. You will need to show an example of your data, using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code. Without example data, the chances of getting a response with code that does not actually work in your data are high.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

2. Next, in your description, you refer to R_(m,t), which you then say is given by R_t =... (what happened to m?) And what is R_it? Are your log's natural logs (I would guess), or base 10?
1 like
Comment

Guest

06 Mar 2018, 13:32

Dear Clyde,

Thanks for your reply. I am currently using 14.1 but following your instructions, I have installed dataex for my Stata.

I will assume that R_(m,t) = R_t in this case, also, ignore R_it for now. The logs were natural logs.

Thank you for your help, below is the example data as requested, I hope I have made my question clearer.

Thank you.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 date float(price open high low) str6 vol float change
"Dec 29, 2017" 341.81 340.83 341.94 340.78 "16.14M"   .37
"Dec 28, 2017" 340.55 338.26 340.79 338.06 "22.07M"    .7
"Dec 27, 2017" 338.17 338.06 339.07 337.37 "17.79M"   .12
"Dec 26, 2017" 337.75 336.11 337.82 335.81 "15.35M"   .53
"Dec 25, 2017" 335.96 337.42 338.32 335.53 "15.30M"  -.52
"Dec 22, 2017" 337.71  337.7 338.22 337.33 "10.84M"   .02
"Dec 21, 2017" 337.63 336.71 338.48 335.74 "19.49M"    .3
"Dec 20, 2017" 336.61 338.34 338.48 336.05 "12.25M"  -.51
"Dec 19, 2017" 338.32 336.98 338.42 336.76 "13.08M"   .35
"Dec 18, 2017" 337.15 336.45 337.94 336.21 "16.89M"   .15
"Dec 15, 2017" 336.65 336.76 337.16  334.8 "21.66M"  -.03
"Dec 14, 2017" 336.76 336.91 337.68 336.03 "12.36M"  -.18
"Dec 13, 2017" 337.38 336.74 337.53 335.61 "18.30M"   .16
"Dec 12, 2017" 336.84 338.27  338.3 336.21 "14.79M"  -.37
"Dec 11, 2017" 338.09 335.57 338.19 335.57 "19.64M"   .82
"Dec 08, 2017" 335.35 332.43 335.77 332.32 "18.00M"   .83
"Dec 07, 2017" 332.58 334.21 334.46 332.25 "14.80M"  -.58
"Dec 06, 2017" 334.51    333  334.7 331.15 "27.30M"   .31
"Dec 05, 2017" 333.49 341.06 341.17 332.77 "39.00M" -2.21
"Dec 04, 2017" 341.03 342.96 343.18 340.26 "21.40M"  -.45
"Dec 01, 2017" 342.57 340.83 342.93 340.54 "15.20M"   .35
"Nov 30, 2017" 341.38 341.81 342.79 339.74 "26.60M"  -.04
"Nov 29, 2017" 341.53 340.51  341.9 339.57 "24.50M"   .29
"Nov 28, 2017" 340.55 337.31 340.67 337.18 "19.70M"   .96
"Nov 27, 2017" 337.32 341.96 342.27 337.05 "0.26K"   -1.3
"Nov 24, 2017" 341.78 339.72 342.13 339.55 "0.26K"    .45
"Nov 23, 2017" 340.24 344.03 344.93 340.16 "0.35K"  -1.16
"Nov 22, 2017" 344.22 343.18 345.42 342.74 "0.30K"    .34
"Nov 21, 2017" 343.07 342.35 343.62 341.78 "0.29K"    .11
"Nov 20, 2017"  342.7 341.41 342.77 336.78 "0.39K"    .18
"Nov 17, 2017"  342.1 349.56 350.43 341.54 "0.52K"  -2.15
"Nov 16, 2017" 349.62 349.77 351.02 349.44 "0.28K"   -.11
"Nov 15, 2017"    350 352.65 352.65 349.91 "0.35K"   -.79
"Nov 14, 2017"  352.8 354.23 355.08 352.02 "0.50K"   -.36
"Nov 13, 2017" 354.09 352.98 354.14 352.34 "0.44K"    .35
"Nov 10, 2017" 352.84 352.32 352.95 351.83 "0.34K"    .16
"Nov 09, 2017" 352.28 349.93 352.55 349.93 "0.43K"    .69
"Nov 08, 2017" 349.86 349.02  350.9 348.62 "0.37K"    .27
"Nov 07, 2017" 348.91 346.55    349 346.05 "0.34K"    .74
"Nov 06, 2017" 346.33 344.95 346.63 343.22 "0.34K"    .39
"Nov 03, 2017" 344.98 347.78 348.52 343.72 "0.38K"   -.82
"Nov 02, 2017" 347.82  351.1 351.49 347.57 "0.37K"    -.8
"Nov 01, 2017" 350.61 350.78 351.77 350.41 "0.32K"    .01
"Oct 31, 2017" 350.58 348.46  351.3 348.46 "0.39K"    .45
"Oct 30, 2017" 349.02 352.99 353.41 348.58 "0.52K"  -1.23
"Oct 27, 2017" 353.37 356.28 356.34 353.01 "0.32K"   -.82
"Oct 26, 2017" 356.29 354.62 356.42 353.94 "0.43K"    .56
"Oct 25, 2017" 354.31 351.37 354.42 351.01 "0.32K"    .85
"Oct 24, 2017" 351.32 351.96 352.32 350.82 "0.23K"    -.2
"Oct 23, 2017" 352.01 353.22  353.4 351.42 "0.36K"   -.25
"Oct 20, 2017" 352.89  347.3 352.92 347.16 "0.51K"   1.69
"Oct 19, 2017" 347.02    349 349.62 346.72 "0.34K"   -.57
"Oct 18, 2017" 349.02 351.44 352.75  348.6 "0.44K"   -.65
"Oct 17, 2017" 351.31 355.48 356.31  350.8 "0.52K"  -1.46
"Oct 16, 2017" 356.52 360.09 360.47 355.98 "0.40K"   -.99
"Oct 13, 2017"  360.1 358.83 360.18 358.77 "0.28K"     .3
"Oct 12, 2017" 359.02 362.67 363.21 357.18 "0.60K"  -1.06
"Oct 11, 2017" 362.88 363.19 363.58 362.58 "0.42K"   -.17
"Oct 10, 2017" 363.48 363.11 363.61 361.89 "0.45K"    .09
"Oct 09, 2017" 363.16 363.12  364.7 362.78 "0.70K"    .42
"Sep 29, 2017" 361.63 359.31 361.98 359.31 "0.41K"    .65
"Sep 28, 2017"  359.3 358.94 359.54 358.19 "0.31K"     .1
"Sep 27, 2017" 358.95 355.33 358.96 355.13 "0.41K"    .99
"Sep 26, 2017" 355.42 354.17 355.62 354.17 "0.33K"    .34
"Sep 25, 2017"  354.2    358    358 354.04 "0.59K"  -1.01
"Sep 22, 2017" 357.82 358.01  358.3 355.95 "0.36K"   -.25
"Sep 21, 2017"  358.7 358.79 360.76 358.44 "0.52K"   -.12
"Sep 20, 2017" 359.12 356.56 359.17 354.78 "0.44K"    .74
"Sep 19, 2017" 356.49 356.44 357.76 356.09 "0.34K"    .06
"Sep 18, 2017" 356.26 355.09 356.88 354.82 "0.38K"     .3
"Sep 15, 2017"  355.2 357.73 357.86 354.08 "0.66K"   -.79
"Sep 14, 2017" 358.04 359.33 361.39 357.61 "0.80K"   -.41
"Sep 13, 2017" 359.53 357.66 359.61 357.23 "0.59K"    .45
"Sep 12, 2017" 357.91 360.05 360.49 357.38 "0.57K"    -.5
"Sep 11, 2017" 359.71 355.54 359.78 355.24 "0.67K"   1.18
"Sep 08, 2017" 355.51 347.84 358.14 347.84 "1.18K"   2.32
"Sep 07, 2017" 347.44  347.7 348.96 347.04 "0.50K"   -.11
"Sep 06, 2017" 347.83 346.03 348.03 345.73 "0.46K"     .4
"Sep 05, 2017" 346.45  344.9  346.5 344.54 "0.38K"    .54
"Sep 04, 2017"  344.6 345.33 346.07 344.05 "0.47K"   -.13
"Sep 01, 2017" 345.06 345.55  347.2 344.88 "0.57K"   -.04
"Aug 31, 2017"  345.2 343.75  345.4 343.26 "0.47K"    .46
"Aug 30, 2017" 343.62 340.81 344.12 340.68 "0.51K"    .82
"Aug 29, 2017" 340.81 341.61 341.92 340.41 "0.32K"   -.33
"Aug 28, 2017" 341.93 338.96 342.34 338.96 "0.46K"    .82
"Aug 25, 2017" 339.15 336.47 339.26 336.47 "0.37K"    .81
"Aug 24, 2017" 336.44 337.56 338.02 336.08 "0.27K"   -.22
"Aug 23, 2017" 337.18 337.69 338.02 336.11 "0.22K"   -.27
"Aug 22, 2017" 338.08 339.02 339.09 337.72 "0.27K"   -.29
"Aug 21, 2017" 339.07 337.96 339.08 337.93 "0.34K"    .45
"Aug 18, 2017" 337.54 336.39 337.64 335.93 "0.33K"     .2
"Aug 17, 2017" 336.88 335.22 336.89 335.22 "0.34K"    .64
"Aug 16, 2017" 334.75 334.42 335.04 333.88 "0.31K"    .07
"Aug 15, 2017" 334.53 334.83 335.32 334.19 "0.28K"   -.12
"Aug 14, 2017" 334.93 331.81 335.05 331.72 "0.37K"   1.03
"Aug 11, 2017" 331.51 334.91 334.91 331.35 "0.48K"  -1.19
"Aug 10, 2017" 335.49 336.03 336.73 334.31 "0.38K"   -.22
"Aug 09, 2017" 336.24 334.84 336.26 334.61 "0.29K"    .44
"Aug 08, 2017" 334.76 334.75 335.05  334.1 "0.26K"    .07
"Aug 07, 2017" 334.54 333.96 334.59 333.02 "0.20K"    .18
end

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

06 Mar 2018, 14:01

Well, thank you for the example data. But now I am even more confused. I don't see how I can ignore R_it, as it's needed to calculate CSAD_t in equation (3). And without CSAD_t there is no regression possible.

As far as calculating R_mt, the key is that you need a numeric Stata internal format date variable, not the string version that you have. Once that is done, it is easy to sort the data chronologically and then calculate the log of the ratio of the price on consecutive dates.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 date float(price open high low) str6 vol float change
"Dec 29, 2017" 341.81 340.83 341.94 340.78 "16.14M"   .37
"Dec 28, 2017" 340.55 338.26 340.79 338.06 "22.07M"    .7
"Dec 27, 2017" 338.17 338.06 339.07 337.37 "17.79M"   .12
"Dec 26, 2017" 337.75 336.11 337.82 335.81 "15.35M"   .53
"Dec 25, 2017" 335.96 337.42 338.32 335.53 "15.30M"  -.52
"Dec 22, 2017" 337.71  337.7 338.22 337.33 "10.84M"   .02
"Dec 21, 2017" 337.63 336.71 338.48 335.74 "19.49M"    .3
"Dec 20, 2017" 336.61 338.34 338.48 336.05 "12.25M"  -.51
"Dec 19, 2017" 338.32 336.98 338.42 336.76 "13.08M"   .35
"Dec 18, 2017" 337.15 336.45 337.94 336.21 "16.89M"   .15
"Dec 15, 2017" 336.65 336.76 337.16  334.8 "21.66M"  -.03
"Dec 14, 2017" 336.76 336.91 337.68 336.03 "12.36M"  -.18
"Dec 13, 2017" 337.38 336.74 337.53 335.61 "18.30M"   .16
"Dec 12, 2017" 336.84 338.27  338.3 336.21 "14.79M"  -.37
"Dec 11, 2017" 338.09 335.57 338.19 335.57 "19.64M"   .82
"Dec 08, 2017" 335.35 332.43 335.77 332.32 "18.00M"   .83
"Dec 07, 2017" 332.58 334.21 334.46 332.25 "14.80M"  -.58
"Dec 06, 2017" 334.51    333  334.7 331.15 "27.30M"   .31
"Dec 05, 2017" 333.49 341.06 341.17 332.77 "39.00M" -2.21
"Dec 04, 2017" 341.03 342.96 343.18 340.26 "21.40M"  -.45
"Dec 01, 2017" 342.57 340.83 342.93 340.54 "15.20M"   .35
"Nov 30, 2017" 341.38 341.81 342.79 339.74 "26.60M"  -.04
"Nov 29, 2017" 341.53 340.51  341.9 339.57 "24.50M"   .29
"Nov 28, 2017" 340.55 337.31 340.67 337.18 "19.70M"   .96
"Nov 27, 2017" 337.32 341.96 342.27 337.05 "0.26K"   -1.3
"Nov 24, 2017" 341.78 339.72 342.13 339.55 "0.26K"    .45
"Nov 23, 2017" 340.24 344.03 344.93 340.16 "0.35K"  -1.16
"Nov 22, 2017" 344.22 343.18 345.42 342.74 "0.30K"    .34
"Nov 21, 2017" 343.07 342.35 343.62 341.78 "0.29K"    .11
"Nov 20, 2017"  342.7 341.41 342.77 336.78 "0.39K"    .18
"Nov 17, 2017"  342.1 349.56 350.43 341.54 "0.52K"  -2.15
"Nov 16, 2017" 349.62 349.77 351.02 349.44 "0.28K"   -.11
"Nov 15, 2017"    350 352.65 352.65 349.91 "0.35K"   -.79
"Nov 14, 2017"  352.8 354.23 355.08 352.02 "0.50K"   -.36
"Nov 13, 2017" 354.09 352.98 354.14 352.34 "0.44K"    .35
"Nov 10, 2017" 352.84 352.32 352.95 351.83 "0.34K"    .16
"Nov 09, 2017" 352.28 349.93 352.55 349.93 "0.43K"    .69
"Nov 08, 2017" 349.86 349.02  350.9 348.62 "0.37K"    .27
"Nov 07, 2017" 348.91 346.55    349 346.05 "0.34K"    .74
"Nov 06, 2017" 346.33 344.95 346.63 343.22 "0.34K"    .39
"Nov 03, 2017" 344.98 347.78 348.52 343.72 "0.38K"   -.82
"Nov 02, 2017" 347.82  351.1 351.49 347.57 "0.37K"    -.8
"Nov 01, 2017" 350.61 350.78 351.77 350.41 "0.32K"    .01
"Oct 31, 2017" 350.58 348.46  351.3 348.46 "0.39K"    .45
"Oct 30, 2017" 349.02 352.99 353.41 348.58 "0.52K"  -1.23
"Oct 27, 2017" 353.37 356.28 356.34 353.01 "0.32K"   -.82
"Oct 26, 2017" 356.29 354.62 356.42 353.94 "0.43K"    .56
"Oct 25, 2017" 354.31 351.37 354.42 351.01 "0.32K"    .85
"Oct 24, 2017" 351.32 351.96 352.32 350.82 "0.23K"    -.2
"Oct 23, 2017" 352.01 353.22  353.4 351.42 "0.36K"   -.25
"Oct 20, 2017" 352.89  347.3 352.92 347.16 "0.51K"   1.69
"Oct 19, 2017" 347.02    349 349.62 346.72 "0.34K"   -.57
"Oct 18, 2017" 349.02 351.44 352.75  348.6 "0.44K"   -.65
"Oct 17, 2017" 351.31 355.48 356.31  350.8 "0.52K"  -1.46
"Oct 16, 2017" 356.52 360.09 360.47 355.98 "0.40K"   -.99
"Oct 13, 2017"  360.1 358.83 360.18 358.77 "0.28K"     .3
"Oct 12, 2017" 359.02 362.67 363.21 357.18 "0.60K"  -1.06
"Oct 11, 2017" 362.88 363.19 363.58 362.58 "0.42K"   -.17
"Oct 10, 2017" 363.48 363.11 363.61 361.89 "0.45K"    .09
"Oct 09, 2017" 363.16 363.12  364.7 362.78 "0.70K"    .42
"Sep 29, 2017" 361.63 359.31 361.98 359.31 "0.41K"    .65
"Sep 28, 2017"  359.3 358.94 359.54 358.19 "0.31K"     .1
"Sep 27, 2017" 358.95 355.33 358.96 355.13 "0.41K"    .99
"Sep 26, 2017" 355.42 354.17 355.62 354.17 "0.33K"    .34
"Sep 25, 2017"  354.2    358    358 354.04 "0.59K"  -1.01
"Sep 22, 2017" 357.82 358.01  358.3 355.95 "0.36K"   -.25
"Sep 21, 2017"  358.7 358.79 360.76 358.44 "0.52K"   -.12
"Sep 20, 2017" 359.12 356.56 359.17 354.78 "0.44K"    .74
"Sep 19, 2017" 356.49 356.44 357.76 356.09 "0.34K"    .06
"Sep 18, 2017" 356.26 355.09 356.88 354.82 "0.38K"     .3
"Sep 15, 2017"  355.2 357.73 357.86 354.08 "0.66K"   -.79
"Sep 14, 2017" 358.04 359.33 361.39 357.61 "0.80K"   -.41
"Sep 13, 2017" 359.53 357.66 359.61 357.23 "0.59K"    .45
"Sep 12, 2017" 357.91 360.05 360.49 357.38 "0.57K"    -.5
"Sep 11, 2017" 359.71 355.54 359.78 355.24 "0.67K"   1.18
"Sep 08, 2017" 355.51 347.84 358.14 347.84 "1.18K"   2.32
"Sep 07, 2017" 347.44  347.7 348.96 347.04 "0.50K"   -.11
"Sep 06, 2017" 347.83 346.03 348.03 345.73 "0.46K"     .4
"Sep 05, 2017" 346.45  344.9  346.5 344.54 "0.38K"    .54
"Sep 04, 2017"  344.6 345.33 346.07 344.05 "0.47K"   -.13
"Sep 01, 2017" 345.06 345.55  347.2 344.88 "0.57K"   -.04
"Aug 31, 2017"  345.2 343.75  345.4 343.26 "0.47K"    .46
"Aug 30, 2017" 343.62 340.81 344.12 340.68 "0.51K"    .82
"Aug 29, 2017" 340.81 341.61 341.92 340.41 "0.32K"   -.33
"Aug 28, 2017" 341.93 338.96 342.34 338.96 "0.46K"    .82
"Aug 25, 2017" 339.15 336.47 339.26 336.47 "0.37K"    .81
"Aug 24, 2017" 336.44 337.56 338.02 336.08 "0.27K"   -.22
"Aug 23, 2017" 337.18 337.69 338.02 336.11 "0.22K"   -.27
"Aug 22, 2017" 338.08 339.02 339.09 337.72 "0.27K"   -.29
"Aug 21, 2017" 339.07 337.96 339.08 337.93 "0.34K"    .45
"Aug 18, 2017" 337.54 336.39 337.64 335.93 "0.33K"     .2
"Aug 17, 2017" 336.88 335.22 336.89 335.22 "0.34K"    .64
"Aug 16, 2017" 334.75 334.42 335.04 333.88 "0.31K"    .07
"Aug 15, 2017" 334.53 334.83 335.32 334.19 "0.28K"   -.12
"Aug 14, 2017" 334.93 331.81 335.05 331.72 "0.37K"   1.03
"Aug 11, 2017" 331.51 334.91 334.91 331.35 "0.48K"  -1.19
"Aug 10, 2017" 335.49 336.03 336.73 334.31 "0.38K"   -.22
"Aug 09, 2017" 336.24 334.84 336.26 334.61 "0.29K"    .44
"Aug 08, 2017" 334.76 334.75 335.05  334.1 "0.26K"    .07
"Aug 07, 2017" 334.54 333.96 334.59 333.02 "0.20K"    .18
end

gen sif_date = daily(date, "MDY")
assert missing(sif_date) == missing(date)
format sif_date %td
gen r_mt = 100*log(price/price[_n-1])

Notes:

1. There are gaps in the dates in your data. It appears, at least casually, that those gaps are all weekends or US holidays where the stock markets are closed. If that is true of your entire data set, this code will be correct. But if there are other dates that are missing, then it will incorrectly calculate r_mt by using the price from some earlier date (whichever date is closest to it and before it in your data) that is inappropriate. If your data has this problem, then you need to create a business calendar so that you can have numeric date variables that correspond properly to dates when the stock market is open. See -help bcal- and -help business calendars- if you need to do this.

2. I have implemented the calculation as the log of the ratio rather than the difference of the logarithms. Mathematically they are the same thing. But for computational efficiency (important if your data set is large), it is much quicker to calculate one quotient and one logarithm than to calculate to logarithms and one subtraction.

Once you figure out what's going on with R_it, post back and we can go through the calculation of CSAD_t and the regression.

Comment

Guest
#5

06 Mar 2018, 14:45

Dear Clyde,

That was my mistake. R_(i,t) is the observed stock return of firm i at time t, R(m,t) is the cross-sectional average stock of N returns in the portfolio at time t.

To go into details, I am testing the presence of herd behaviour in the Chinese Stock Market. The i in R_(i,t) is supposed to be the returns of dual-listed firms (across A and B shares), as far as I am concerned, there are around 87 of them in the Chinese Stock Market but I am still struggling with acquiring the data (Stock Prices, Trading Volume) of the said companies. The idea of the regression is, the closer the R_(i,t) and R_(m,t) are, the smaller the CSAD will be, indicating herding, CSAD is merely a measure of return dispersion. It should be straightforward to understand the statement above, since a firm's return almost mirrors that of the overall market, it means that the firm invests following the market's consensus. The R^2_(m,t) is to capture the non-linearity of the relationship between CSAD and market return, R_(m,t). Hence, after running the regression (3), if γ_2 is significantly negative, we know that herding behaviour exists. I hope my explanation above made sense to you.

Thank you.

P.S. For all it's worth, my paper follows the methodology of "Herding behavior in Chinese stock markets : An examination of A and B shares" by Tan et al. (2008)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

06 Mar 2018, 16:49

Nice explanation, thank you. But I don't see how you can calculate R_it in the data you have: it seems to have just one observation per date, and I guess it is the overall market price information; there doesn't seem to be anything to identify different shares of different firms.

Let's just say that when you do get the individual shares data I'll be happy to help you with the rest of the calculations.
Comment
Wessel de Kroo

Join Date: Feb 2017

Posts: 92
#7

06 Mar 2018, 17:04

To add on

#5 and #6 with respect to acquiring data on Chinese stock markets. Datastream would be suitable if you have acces to this

You can for example provide a list of companies to gather this kind of data here.

Then you will have :

1. data on overall market price information. the data you are currently working with
2. download data via datastream. for unique firms and different share classes

in order to run these regressions.
Comment
Guest
#8

07 Mar 2018, 03:19

Dear Clyde and Wessel,

Clyde: In Eq. (4), do I have to collect the data for every dual-listed firm (87 of them) and merge them to form R_(i,t)? Let i = 1, 2, ..., 87, I will have to collect 87 different data over the period of interest and compile them together, right? Correct me if I am wrong, I hope we are on the same page.

Wessel: Thanks for the recommendation. Unfortunately, I do not have access to Datastream, I have tried investing.com and bloomberg but I guess I am out of luck.

Thank you.
Comment
Guest
#9

07 Mar 2018, 04:00

Dear Clyde and Wessel,

Clyde: In Eq. (4), do I have to collect the data for every dual-listed firm (87 of them)? For simplicity's sake, I will assume that there are only 3 firms of interest. In Eq. (4), it is then going to be
CSAD_t = 1/3 [ | R_1,t- R_m,t| + | R_2,t- R_m,t | + | R_3,t- R_m,t | ]? In which R_m,t= 100*[(log(P_t)-log(P_t-1)]. Once we have derived the value of CSAD_t , we can then proceed to run the regression Eq. (3) with the aforementioned CSAD_t on the LHS, and the specification on the RHS, thus getting estimates for the parameters of interest, the α and the γ₁ and γ₂ . Please correct me if I am wrong, I hope we are on the same page.

Wessel: Thanks for the recommendation. Unfortunately, I do not have access to Datastream, I have tried investing.com and Bloomberg but I guess I am out of luck.

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#10

07 Mar 2018, 09:59

Yes, your formula for CSAD_t is correct. Once you have calculated that, the next step is to retain only one observation per time period, because the regression shown in equation 3 has only t subscripts in it. Then you calculate a variable equal to abs(R_mt), and another equal to the square of R_mt, and regress CSAD_t on those. Actually, since the latter regressor is the square of the former, you can just do it as :

Code:

regress CSAD c.abs_r_mt##c.abs_r_mt

where abs_r_mt is the variable that contains the absolute value of R_mt.
Comment
Guest
#11

07 Mar 2018, 11:00

Dear Clyde,

Good news - I have managed to identify the dual-listed companies manually (it took me hours), and now I have the comprehensive list of the said companies and their data. The old saying goes "haste makes waste," so I'd carry out my regressions one at a time. Given my subpar explanations, I am unclear as to how much you understand what I am trying to achieve. Essentially, I have 34 dual-listed companies listed in both A-shares and B-shares market in Shanghai Stock Exchange and another 33 dual-listed companies listed in both A-shares and B-shares market in Shenzhen Stock Exchange. It should be clear that I am testing herding in 4 markets, namely the Shanghai A market (SSEA), Shanghai B market (SSEB), Shenzhen A market (SZSA) and Shenzhen B market (SZSB), thus, I suppose 4 regressions will be run?

The example data given above is from SSEB over the period Dec 14' - Jan 18'. Taking one step at a time, I would first like to test the presence of herding in SSEB, reiterating myself, I have 34 individual firms' data now, over the same period. How do I go about getting my CSAD_t , the LHS of Eq. (3)? Following your instructions, I have successfully derived R_m,t , however, I am still struggling to find my R_i,t . Do I have to merge the data of the said 34 firms first?

Again, I hope I made sense to you, and that I am / we are making progress.

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#12

07 Mar 2018, 12:29

however, I am still struggling to find my R_i,t . Do I have to merge the data of the said 34 firms first?

Yes, the first step is to merge the data on the 34 firms with the data you currently have with the SSEB data. To accomplish this merger you will need the date variable to be encoded the same way in both data sets. Not having the full layout of the 34 firms data, the following is pseudo-code but should give you the idea.

Code:

// PREPARE DATA SETS FOR MERGING use 34_firms_data, clear gen sif_date = daily(date, "MDY") assert missing(sif_date) == missing(date) format sif_date %td // CALCULATE Rit by firm (date), sort: gen r_it = 100*log(price/price[_n-1]) rename price share_price tempfile for_merging save `for_merging' // MARKET RETURNS FOR SSEB DATA use SSEB_data, clear gen sif_date = daily(date, "MDY") assert missing(sif_date) == missing(date) format sif_date %td sort sif_date gen r_mt = 100*log(price/price[_n-1]) rename price market_price // MERGE THEM merge 1:m stata_date using `for_merging' // CALCULATE CSAD gen abs_diff = abs(r_it-r_mt) by stata_date, sort: egen csad = mean(abs_diff) // GO TO ONE OBS PER DATE collapse (first) csad r_mt, by(stata_date) // SET UP AND DO REGRESSION gen abs_r_mt = abs(r_mt) regress csad c.abs_r_mt##c.abs_r_mt

The key assumptions about the 34_firms_data sets are that it contains a date variable that looks like the one in the SSEB data, that it contains a price variable, and that the dates covered in the 34_firms_data substantially match the ones in the SSEB data.

You might want to consider using robust standard errors on your regression, if you think that the residuals are likely to be heteroscedastic. That's not a statistical question so much as one about what to expect of this kind of data, so it is out of my area of expertise (or even rudimentary knowledge!). You might consult an experienced finance colleague about that.

As you are a self-described Stata beginner, let me, at this point, offer you some advice about how to really get moving forward with it. Open the PDF documentation that comes with your installation (Select PDF Documentation from the Help menu.) Read in full the Getting Started [GS] and User's Guide [U] volumes. It is a fairly long read, and you won't remember everything there. But it will introduce you to the general way Stata works, the generic syntax and the overall approach to data management and analysis. It will also take you through the most basic commands that Stata users need for routine, everyday work. Once you have been through those, you will have acquired enough knowledge that in most situations, you will be able to figure out which commands are likely to be needed for your particular problem, and then you can return to the PDF documentation or the help files for the details of how to apply them.

Another strategy for learning about Stata (in addition to, not as replacement for, the above) is to frequent this Forum and read the threads started by other people. This was an important part of my learning process when I was a beginner, and I learned an enormous amount that way. (I also learned a lot about statistics doing that.) So pick threads whose topics are not clearly over your head or focused on exotic issues you will probably never face, and read along.
Comment

Guest

#13

07 Mar 2018, 16:00

Dear Clyde,

I am sorry that I did not make it clear enough, the data I have are 34 different data sets, each is in one CSV file. In other words, I have 34 CSV files (Excel). Each of them contains the same information, which includes price and date, as you've mentioned above. Do I have to merge them manually using Excel first? It might be a good idea to, as you suggested, post the example data.

This is my first example data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 date float(price open high low) str7 vol float change
"Dec 29, 2017" 5.28 5.35 5.35 5.23 "10.07M" -1.31
"Dec 28, 2017" 5.35 5.33 5.38 5.31 "4.80M" .19
"Dec 27, 2017" 5.34 5.34 5.37 5.31 "4.90M" 0
"Dec 26, 2017" 5.34 5.32 5.36 5.28 "4.92M" 0
"Dec 25, 2017" 5.34 5.44 5.44 5.3 "5.81M" -2.02
"Dec 22, 2017" 5.45 5.37 5.47 5.34 "5.73M" 1.3
"Dec 21, 2017" 5.38 5.36 5.42 5.29 "5.88M" 0
"Dec 20, 2017" 5.38 5.43 5.44 5.33 "6.22M" -.92
"Dec 19, 2017" 5.43 5.42 5.46 5.41 "6.36M" .18
"Dec 18, 2017" 5.42 5.55 5.56 5.4 "5.95M" -2.17
"Dec 15, 2017" 5.54 5.48 5.56 5.42 "6.63M" .91
"Dec 14, 2017" 5.49 5.54 5.55 5.46 "5.40M" -1.08
"Dec 13, 2017" 5.55 5.5 5.58 5.46 "5.82M" 1.28
"Dec 12, 2017" 5.48 5.59 5.59 5.46 "6.52M" -2.32
"Dec 11, 2017" 5.61 5.55 5.63 5.52 "6.08M" 1.08
"Dec 08, 2017" 5.55 5.51 5.63 5.47 "5.22M" .73
"Dec 07, 2017" 5.51 5.51 5.58 5.44 "5.20M" -.54
"Dec 06, 2017" 5.54 5.51 5.56 5.36 "8.77M" .18
"Dec 05, 2017" 5.53 5.61 5.62 5.45 "6.20M" -1.07
"Dec 04, 2017" 5.59 5.7 5.76 5.58 "8.39M" -1.93
"Dec 01, 2017" 5.7 5.78 5.79 5.69 "10.07M" -1.55
"Nov 30, 2017" 5.79 5.82 5.85 5.76 "4.86M" -.52
"Nov 29, 2017" 5.82 5.84 5.85 5.77 "3.74M" -.51
"Nov 28, 2017" 5.85 5.79 5.86 5.74 "4.66M" 1.74
"Nov 27, 2017" 5.75 5.8 5.82 5.75 "3.25M" -.69
"Nov 24, 2017" 5.79 5.75 5.83 5.72 "4.31M" .17
"Nov 23, 2017" 5.78 5.83 5.88 5.76 "4.87M" -.86
"Nov 22, 2017" 5.83 5.8 5.88 5.76 "4.24M" .17
"Nov 21, 2017" 5.82 5.85 5.87 5.78 "3.80M" -.34
"Nov 20, 2017" 5.84 5.92 6.01 5.7 "7.34M" -2.01
"Nov 17, 2017" 5.96 6.23 6.25 5.92 "8.93M" -4.18
"Nov 16, 2017" 6.22 6.25 6.29 6.21 "5.93M" -.32
"Nov 15, 2017" 6.24 6.23 6.26 6.21 "4.70M" -.16
"Nov 14, 2017" 6.25 6.22 6.29 6.2 "6.05M" .32
"Nov 13, 2017" 6.23 6.33 6.34 6.2 "8.19M" -1.42
"Nov 10, 2017" 6.32 6.38 6.38 6.28 "7.23M" -1.56
"Nov 09, 2017" 6.42 6.45 6.45 6.29 "10.08M" -.31

I would like to post more example data but I have left my school and I am currently accessing Stata using Remote Desktop, which really isn't the most useful software that's designed to operate Stata. I have tried to import more CSV files so that you could get a clearer glimpse of what is going on but it crashed every time I tried, unfortunately I will only be able to have access to my school computer tomorrow morning (my time zone is GMT +0). However, I still hope that you understand my question. Basically, I have 34 data sets like the one you see above, but I do not know how to put them together so that I could aggregate them to find my R_i,t .

Thank you.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#14

07 Mar 2018, 17:46

OK. So I'm going to make some assumptions about your 34 .csv files.

1. They are all in your current working directory, and they are the only .csv files in your current working directory.
2. Since it appears that the content of the file does not mention what firm it is about, I assume that the filename itself gives the firm name. That is, the filename is: X.csv, where X is the name of the firm whose data appears in that file.
3. The first row in each csv file contains the variable names date, price, etc.

With these assumption, you can do this:

Code:

clear local files: dir "." files "*.csv" local stubs: subinstr local files ".csv" "", all tempfile building save `building', emptyok foreach s of local stubs { import delimited `s'.csv, varnames(1) gen firm = `"`s'"' append using `building' save `"`building'"', replace } use `building', clear isid firm year, sort quietly compress save 34_firms_data, replace

The above is not tested, so beware of typos, but the logic is, I'm pretty sure, correct. This code creates the 34_firms_data file that you will need for the code suggested earlier in #11. Note, by the way, that this is a series of -append- operations, not -merge-ing.

I cannot say too emphatically that you should not put these files together in Excel. In fact, you should never do any data management in Excel. It leaves no audit trail of what you've done. If you have to subsequently explain your work to somebody, there is no record of it to refer to. If you have to go back to this project some months from now and you do not recall the details of what you did, there is no record of it to refer to. Excel is fine for exchanging data sets with others who have Excel on their machines and don't have any real statistics package in common with you. It is fine for some visual displays of data. But it is totally unacceptable as a data management and analysis tool. If you want to occasionally use it to "play around" with data or do a "quick and dirty back-of-the-envelope calculation," I suppose that's OK. But it should never be used for work that is intended to be taken seriously, and certainly not for work that somebody might actually rely on. When you acquire data in Excel (or related formats like .csv) the first thing you should do is import it to Stata (or some other program that is suitable for analysis work.)
Comment

Guest

#15

08 Mar 2018, 07:39

Dear Clyde,

I have instead used the code you provided here: https://www.statalist.org/forums/for...iple-csv-files . Hence, I have

Code:

. local myfilelist : dir . files "*.csv"

.
. foreach file of local myfilelist {
  2.
. drop _all
  3.
. insheet using "`file'"
  4.
. local building = subinstr("`file'",".csv","",.)
  5.
. save "`building'", replace
  6.
. }
(7 vars, 757 obs)
file a anhui gujing.dta saved
(7 vars, 731 obs)
file a bengang.dta saved
(7 vars, 757 obs)
file a changchai.dta saved
(7 vars, 725 obs)
file a china bicycle.dta saved
(7 vars, 744 obs)
file a china fangda.dta saved
(7 vars, 646 obs)
file a china national accord.dta saved
(7 vars, 703 obs)
file a chongqing changan.dta saved
(7 vars, 731 obs)
file a csg.dta saved
(7 vars, 737 obs)
file a dalian.dta saved
(7 vars, 658 obs)
file a dongxu jan15.dta saved
(7 vars, 750 obs)
file a fawer.dta saved
(7 vars, 750 obs)
file a fiyta.dta saved
(7 vars, 744 obs)
file a foshan electrical.dta saved
(7 vars, 727 obs)
file a guangdong electrical.dta saved
(7 vars, 679 obs)
file a guangdong provincial.dta saved
(7 vars, 631 obs)
file a hainan dadonghai.dta saved
(7 vars, 630 obs)
file a hainan pearl.dta saved
(7 vars, 739 obs)
file a hefei meiling.dta saved
(7 vars, 464 obs)
file a hubei sanonda.dta saved
(7 vars, 757 obs)
file a jiangling.dta saved
(7 vars, 672 obs)
file a konka.dta saved
(7 vars, 755 obs)
file a lu thai.dta saved
(7 vars, 748 obs)
file a shandong chenming.dta saved
(7 vars, 668 obs)
file a shenbao toaug17.dta saved
(7 vars, 607 obs)
file a shenzhen chiwan wharf tonov17.dta saved
(7 vars, 640 obs)
file a shenzhen nanshan.dta saved
(7 vars, 757 obs)
file a shenzhen prop.dta saved
(7 vars, 671 obs)
file a shenzhen seg.dta saved
(7 vars, 741 obs)
file a shenzhen tellus.dta saved
(7 vars, 667 obs)
file a shenzhen textile.dta saved
(7 vars, 710 obs)
file a shenzhen wongtee tooct17.dta saved
(7 vars, 587 obs)
file a shenzhen zhongheng frmar15.dta saved
(7 vars, 741 obs)
file a sino great wall.dta saved
(7 vars, 757 obs)
file a wuxi little swan.dta saved
(7 vars, 757 obs)
file a yantai changyu.dta saved
(7 vars, 751 obs)
file aboetech.dta saved

now.
I suppose I have imported all the relevant data to find my R_i,t, in fact, if you notice, there are 36 firms after I have retrieved some extra data, which, of course, does not really matter. Moving on, I used your code from #13:

Code:

 gen firm = `"`s'"'     append using `building'     save `"`building'"', replace

but stata showed that '751 missing values generated', and when I forcefully proceeded anyway, using your code

Code:

    
 use `building', clear isid firm year, sort quietly compress save 34_firms_data, replace

The second line didn't go through, it simply showed

Code:

 . isid firm year, sort variable year not found r(111);

Did I do something wrong from the very beginning?

P.S. You've mentioned in #11 that

Originally posted by Clyde Schechter View Post

The key assumptions about the 34_firms_data sets are that it contains a date variable that looks like the one in the SSEB data, that it contains a price variable, and that the dates covered in the 34_firms_data substantially match the ones in the SSEB data.

The 36 data sets above do fit your criterion. They are from the same source, so, naturally, all of their 7 variables are the same. The dates covered in the said data sets match the ones in the SSEB data to a great extent, as you could infer from the number of observations, there are some outliers, those with only roughly 500 observations, due to irrecoverable data, I have consulted my supervisor on this issue and he is fine with it.

Last edited by sladmin; 09 Apr 2018, 08:53. Reason: anonymize poster

Announcement