I am a beginner both in terms of STATA and statistics in general and that's why I want to start by apologizing if the questions don't make much sense. I am studying the spillover effects of data breach disclosures (i.e., I want to see how a data breach affects the disclosure quality of the competitors).
My first concern is related to the mathematical regression which in my humble opinion should look similar to Disclosure quality _i,t = beta0 + beta1*PeerBreach + control variables.
For this I gathered the data which actually consists of the following 4 documents. However, I do not know to I should merge them and which other (dummy) variables I should generate:
(1) PRC data breaches - this document contains the company name, year of breach, gvkey and cik code.
(2) TNIC data from Hoberg and Philips - here I have 4 columns: year, gvkey1, gvkey2 and score. The score serves for analyzing the similarity between companies. I sorted the score in descending order and kept only top 5 competitors for each gvkey1.
clear
input int year long(gvkey1 gvkey2) float(score competitor_rank competitor_rank1)
1989 1003 8579 .0322 9 1
1989 1003 14536 .0271 8 1
1989 1003 4526 .0203 7 1
1989 1003 12770 .0193 6 1
1989 1003 9118 .0161 5 1
1989 1004 5130 .2058 1501 2
1989 1004 7548 .2044 1500 2
1989 1004 10877 .1933 1499 2
1991 1004 1573 .1924 1498 2
1991 1004 8386 .1894 1497 2
1994 1009 13616 .0395 1538 3
1991 1009 9742 .0266 1537 3
1993 1009 13616 .0146 1536 3
1992 1009 9742 .0057 1535 3
1989 1009 9742 .004 1534 3
1993 1011 29365 .308 1805 4
1994 1011 29731 .2118 1804 4
1994 1011 28698 .2048 1803 4
1994 1011 29255 .2026 1802 4
1994 1011 29365 .1978 1801 4
2001 1013 7980 .1422 4424 6
1993 1013 10388 .1416 4423 6
1996 1013 10553 .1385 4422 6
1995 1013 3705 .1381 4421 6
2001 1013 2537 .1345 4420 6
1991 1017 5070 .0836 4478 7
1990 1017 5070 .067 4477 7
1992 1017 4159 .0655 4476 7
1989 1017 5070 .0537 4475 7
1992 1017 5070 .0494 4474 7
1994 1021 11065 .1905 5115 8
1994 1021 4045 .1391 5114 8
1994 1021 3806 .1336 5113 8
1995 1021 7216 .122 5112 8
1991 1021 4051 .1182 5111 8
1989 1028 3513 .1181 5156 9
1989 1028 14264 .0585 5155 9
1990 1028 11218 .04 5154 9
1990 1028 7932 .0332 5153 9
1989 1028 4169 .0284 5152 9
2007 1034 63645 .1073 6129 10
1996 1034 25813 .0987 6128 10
2003 1034 14446 .0982 6127 10
2003 1034 63051 .0964 6126 10
2006 1034 63645 .0898 6125 10
1988 1036 2698 .0393 6196 11
1988 1036 5788 .031 6195 11
1991 1036 8859 .0262 6194 11
1991 1036 3580 .0246 6193 11
1992 1036 3580 .024 6192 11
1991 1038 12669 .272 6518 12
1991 1038 10399 .2432 6517 12
1990 1038 12669 .2389 6516 12
1993 1038 16979 .2377 6491 12
1993 1038 17165 .2377 6479 12
1996 1043 3693 .0144 6540 13
1989 1043 12470 .0126 6539 13
1996 1043 6790 .0118 6538 13
1990 1043 12470 .0112 6537 13
1989 1043 4454 .0104 6536 13
2002 1045 1230 .27 7508 14
1999 1045 3851 .2391 7507 14
1993 1045 3851 .2349 7506 14
1998 1045 3851 .2341 7505 14
2005 1045 3851 .2303 7504 14
2013 1050 7281 .1581 7671 15
2015 1050 8423 .1073 7670 15
2010 1050 7281 .0736 7669 15
2009 1050 7281 .0657 7668 15
2011 1050 7281 .061 7667 15
1993 1054 28325 .1968 8778 16
1993 1054 24352 .1894 8777 16
1993 1054 14459 .1881 8776 16
1991 1054 12142 .1798 8775 16
1992 1054 24409 .1764 8774 16
1993 1055 27873 .1677 10422 17
1989 1055 13588 .1609 10421 17
1991 1055 21251 .1574 10420 17
1989 1055 1573 .1498 10418 17
1989 1055 14489 .1498 10419 17
1989 1056 12657 .127 11456 18
1990 1056 12657 .116 11455 18
1997 1056 4900 .1084 11454 18
1993 1056 5594 .0878 11453 18
1992 1056 5594 .0876 11452 18
1990 1065 19798 .0351 11532 19
1989 1065 10032 .031 11531 19
1989 1065 11083 .029 11530 19
1988 1065 5122 .028 11529 19
1989 1065 1392 .0274 11528 19
2004 1072 25848 .1252 11705 20
2003 1072 25848 .1234 11704 20
2000 1072 11191 .1223 11703 20
1999 1072 11191 .1198 11702 20
1996 1072 11191 .1178 11701 20
1991 1073 1004 .1439 12402 21
1991 1073 13930 .1431 12401 21
1991 1073 12627 .1415 12400 21
1991 1073 15855 .1301 12399 21
1991 1073 8480 .1283 12398 21
end
(3) Readability score - here I have a dataset that contains a score that measures disclosure quality. Thus it contains a CIK code as in dataset (1), year and readability score which as of my understanding should be my dependent variable in the end.
(4) control variables - here I have year, gvkey, and some control variables (eg. Return on Assets, Laverage ratio)
I would appreciate any help, however small, that can make me understand which are the following steps that I have to perform in order to measure the spillover effects. Thank you in advance for any suggestions!
My first concern is related to the mathematical regression which in my humble opinion should look similar to Disclosure quality _i,t = beta0 + beta1*PeerBreach + control variables.
For this I gathered the data which actually consists of the following 4 documents. However, I do not know to I should merge them and which other (dummy) variables I should generate:
(1) PRC data breaches - this document contains the company name, year of breach, gvkey and cik code.
(2) TNIC data from Hoberg and Philips - here I have 4 columns: year, gvkey1, gvkey2 and score. The score serves for analyzing the similarity between companies. I sorted the score in descending order and kept only top 5 competitors for each gvkey1.
clear
input int year long(gvkey1 gvkey2) float(score competitor_rank competitor_rank1)
1989 1003 8579 .0322 9 1
1989 1003 14536 .0271 8 1
1989 1003 4526 .0203 7 1
1989 1003 12770 .0193 6 1
1989 1003 9118 .0161 5 1
1989 1004 5130 .2058 1501 2
1989 1004 7548 .2044 1500 2
1989 1004 10877 .1933 1499 2
1991 1004 1573 .1924 1498 2
1991 1004 8386 .1894 1497 2
1994 1009 13616 .0395 1538 3
1991 1009 9742 .0266 1537 3
1993 1009 13616 .0146 1536 3
1992 1009 9742 .0057 1535 3
1989 1009 9742 .004 1534 3
1993 1011 29365 .308 1805 4
1994 1011 29731 .2118 1804 4
1994 1011 28698 .2048 1803 4
1994 1011 29255 .2026 1802 4
1994 1011 29365 .1978 1801 4
2001 1013 7980 .1422 4424 6
1993 1013 10388 .1416 4423 6
1996 1013 10553 .1385 4422 6
1995 1013 3705 .1381 4421 6
2001 1013 2537 .1345 4420 6
1991 1017 5070 .0836 4478 7
1990 1017 5070 .067 4477 7
1992 1017 4159 .0655 4476 7
1989 1017 5070 .0537 4475 7
1992 1017 5070 .0494 4474 7
1994 1021 11065 .1905 5115 8
1994 1021 4045 .1391 5114 8
1994 1021 3806 .1336 5113 8
1995 1021 7216 .122 5112 8
1991 1021 4051 .1182 5111 8
1989 1028 3513 .1181 5156 9
1989 1028 14264 .0585 5155 9
1990 1028 11218 .04 5154 9
1990 1028 7932 .0332 5153 9
1989 1028 4169 .0284 5152 9
2007 1034 63645 .1073 6129 10
1996 1034 25813 .0987 6128 10
2003 1034 14446 .0982 6127 10
2003 1034 63051 .0964 6126 10
2006 1034 63645 .0898 6125 10
1988 1036 2698 .0393 6196 11
1988 1036 5788 .031 6195 11
1991 1036 8859 .0262 6194 11
1991 1036 3580 .0246 6193 11
1992 1036 3580 .024 6192 11
1991 1038 12669 .272 6518 12
1991 1038 10399 .2432 6517 12
1990 1038 12669 .2389 6516 12
1993 1038 16979 .2377 6491 12
1993 1038 17165 .2377 6479 12
1996 1043 3693 .0144 6540 13
1989 1043 12470 .0126 6539 13
1996 1043 6790 .0118 6538 13
1990 1043 12470 .0112 6537 13
1989 1043 4454 .0104 6536 13
2002 1045 1230 .27 7508 14
1999 1045 3851 .2391 7507 14
1993 1045 3851 .2349 7506 14
1998 1045 3851 .2341 7505 14
2005 1045 3851 .2303 7504 14
2013 1050 7281 .1581 7671 15
2015 1050 8423 .1073 7670 15
2010 1050 7281 .0736 7669 15
2009 1050 7281 .0657 7668 15
2011 1050 7281 .061 7667 15
1993 1054 28325 .1968 8778 16
1993 1054 24352 .1894 8777 16
1993 1054 14459 .1881 8776 16
1991 1054 12142 .1798 8775 16
1992 1054 24409 .1764 8774 16
1993 1055 27873 .1677 10422 17
1989 1055 13588 .1609 10421 17
1991 1055 21251 .1574 10420 17
1989 1055 1573 .1498 10418 17
1989 1055 14489 .1498 10419 17
1989 1056 12657 .127 11456 18
1990 1056 12657 .116 11455 18
1997 1056 4900 .1084 11454 18
1993 1056 5594 .0878 11453 18
1992 1056 5594 .0876 11452 18
1990 1065 19798 .0351 11532 19
1989 1065 10032 .031 11531 19
1989 1065 11083 .029 11530 19
1988 1065 5122 .028 11529 19
1989 1065 1392 .0274 11528 19
2004 1072 25848 .1252 11705 20
2003 1072 25848 .1234 11704 20
2000 1072 11191 .1223 11703 20
1999 1072 11191 .1198 11702 20
1996 1072 11191 .1178 11701 20
1991 1073 1004 .1439 12402 21
1991 1073 13930 .1431 12401 21
1991 1073 12627 .1415 12400 21
1991 1073 15855 .1301 12399 21
1991 1073 8480 .1283 12398 21
end
(3) Readability score - here I have a dataset that contains a score that measures disclosure quality. Thus it contains a CIK code as in dataset (1), year and readability score which as of my understanding should be my dependent variable in the end.
(4) control variables - here I have year, gvkey, and some control variables (eg. Return on Assets, Laverage ratio)
I would appreciate any help, however small, that can make me understand which are the following steps that I have to perform in order to measure the spillover effects. Thank you in advance for any suggestions!
Comment