Dear all,
I recently adapted my original dataset to fix a missing data issue. This means my new dataset contains more firms than the previous one but other than that, my code has remained the same. I want to use my dataset to see if the diversity (gender, age, nationality) of the team at every quarter influences firm survival using stcox.
However, using stset I only now noticed an issue I have not yet encountered before, namely my number of subjects doesn't match my number of failures in a single-failure-per-subject-data. This was not the case in my previous dataset as they matched perfectly, but the new dataset contains a lot more firms that did not fail and thus have are censored (after 40 quarters or after my data collection ended)
Now I am wondering whether
a) This is normal/ok and what the reason behind this difference could be (sorry I am not very well versed in survival analyses yet)
--> I can't immediately observe any issues but there are quite some firms in the dataset, cox results and ph assu;ption graphs (estat phtest, plot) seem to be similar for both datasets
b) if this is not supposed to be like this, how can I identify the problem (aka which firms (BvdIdNumber) causes the issue)
Below an excerpt of my new and old dataset using dataex
Old dataset
In which I used the following stset and stcox commands
and this is the stset output of my full old dataset 
New dataset
and this is the stset output of my full new dataset 
Thanks in advance for your help!
Side note: I useSTATA/MP 15 on my work laptop and STATA/SE 18 on my personal pc
Best regards,
Laura
*edited to add tags to the original post
I recently adapted my original dataset to fix a missing data issue. This means my new dataset contains more firms than the previous one but other than that, my code has remained the same. I want to use my dataset to see if the diversity (gender, age, nationality) of the team at every quarter influences firm survival using stcox.
However, using stset I only now noticed an issue I have not yet encountered before, namely my number of subjects doesn't match my number of failures in a single-failure-per-subject-data. This was not the case in my previous dataset as they matched perfectly, but the new dataset contains a lot more firms that did not fail and thus have are censored (after 40 quarters or after my data collection ended)
Now I am wondering whether
a) This is normal/ok and what the reason behind this difference could be (sorry I am not very well versed in survival analyses yet)
--> I can't immediately observe any issues but there are quite some firms in the dataset, cox results and ph assu;ption graphs (estat phtest, plot) seem to be similar for both datasets
b) if this is not supposed to be like this, how can I identify the problem (aka which firms (BvdIdNumber) causes the issue)
Below an excerpt of my new and old dataset using dataex
Old dataset
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str16 BvdIdNumber long(Country Industry) float(Gender Age Nationality) byte Quarters_num float FirmFailure2 "AT9070350951" 12 6 0 .25851214 0 1 0 "AT9070350951" 12 6 0 .25851214 0 2 0 "AT9070350951" 12 6 0 .2530698 0 3 0 "AT9070350951" 12 6 0 .2530698 0 4 0 "AT9070350951" 12 6 0 .2530698 0 5 0 "AT9070350951" 12 6 0 .2530698 0 6 0 "AT9070350951" 12 6 0 .24785185 0 7 0 "AT9070350951" 12 6 0 .24785185 0 8 0 "AT9070350951" 12 6 0 .24785185 0 9 0 "AT9070350951" 12 6 0 .24785185 0 10 0 "AT9070350951" 12 6 0 .24284475 0 11 0 "AT9070350951" 12 6 0 .24284475 0 12 0 "AT9070350951" 12 6 0 .24284475 0 13 0 "AT9070350951" 12 6 0 .24284475 0 14 0 "AT9070350951" 12 6 0 .23803593 0 15 0 "AT9070350951" 12 6 0 .23803593 0 16 0 "AT9070350951" 12 6 0 .23803593 0 17 0 "AT9070350951" 12 6 0 .23803593 0 18 0 "AT9070350951" 12 6 0 .23341388 0 19 0 "AT9070350951" 12 6 0 .23341388 0 20 0 "AT9070350951" 12 6 0 .23341388 0 21 1 "AT9070350951" 12 6 0 .23341388 0 22 1 "AT9070350951" 12 6 0 .2289679 0 23 1 "AT9070350951" 12 6 0 .2289679 0 24 1 "AT9070350951" 12 6 0 .2289679 0 25 1 "AT9070350951" 12 6 0 .2289679 0 26 1 "AT9070350951" 12 6 0 .22468813 0 27 1 "AT9070350951" 12 6 0 .22468813 0 28 1 "AT9070350951" 12 6 0 .22468813 0 29 1 "AT9070350951" 12 6 0 .22468813 0 30 1 "AT9070350951" 12 6 0 .2205654 0 31 1 "AT9070350951" 12 6 0 .2205654 0 32 1 "AT9070350951" 12 6 0 .2205654 0 33 1 "AT9070350951" 12 6 0 .2205654 0 34 1 "AT9070350951" 12 6 0 .21659125 0 35 1 "AT9070350951" 12 6 0 .21659125 0 36 1 "AT9070350951" 12 6 0 .21659125 0 37 1 "AT9070350951" 12 6 0 .21659125 0 38 1 "AT9070350951" 12 6 0 .2127578 0 39 1 "AT9070350951" 12 6 0 .2127578 0 40 1 "AT9070422953" 12 6 0 0 0 1 0 "AT9070422953" 12 6 0 0 0 2 0 "AT9070422953" 12 6 0 .0288615 0 3 0 "AT9070422953" 12 6 0 .05656854 0 4 0 "AT9070422953" 12 6 0 .05656854 0 5 0 "AT9070422953" 12 6 0 .05656854 0 6 0 "AT9070422953" 12 6 0 0 0 7 0 "AT9070422953" 12 6 0 0 0 8 1 "AT9070422953" 12 6 0 0 0 9 1 "AT9070422953" 12 6 0 0 0 10 1 "AT9070422953" 12 6 0 0 0 11 1 "AT9070422953" 12 6 0 0 0 12 1 "AT9070422953" 12 6 0 0 0 13 1 "AT9070422953" 12 6 0 0 0 14 1 "AT9070422953" 12 6 0 0 0 15 1 "AT9070422953" 12 6 0 0 0 16 1 "AT9070422953" 12 6 0 0 0 17 1 "AT9070422953" 12 6 0 0 0 18 1 "AT9070422953" 12 6 0 0 0 19 1 "AT9070422953" 12 6 0 0 0 20 1 "AT9070422953" 12 6 0 0 0 21 1 "AT9070422953" 12 6 0 0 0 22 1 "AT9070422953" 12 6 0 0 0 23 1 "AT9070422953" 12 6 0 0 0 24 1 "AT9070422953" 12 6 0 0 0 25 1 "AT9070422953" 12 6 0 0 0 26 1 "AT9070422953" 12 6 0 0 0 27 1 "AT9070422953" 12 6 0 0 0 28 1 "AT9070422953" 12 6 0 0 0 29 1 "AT9070422953" 12 6 0 0 0 30 1 "AT9070422953" 12 6 0 0 0 31 1 "AT9070422953" 12 6 0 0 0 32 1 "AT9070422953" 12 6 0 0 0 33 1 "AT9070422953" 12 6 0 0 0 34 1 "AT9070422953" 12 6 0 0 0 35 1 "AT9070422953" 12 6 0 0 0 36 1 "AT9070422953" 12 6 0 0 0 37 1 "AT9070422953" 12 6 0 0 0 38 1 "AT9070422953" 12 6 0 0 0 39 1 "AT9070422953" 12 6 0 0 0 40 1 "AT9110939024" 12 9 0 .08318903 0 1 0 "AT9110939024" 12 9 0 .06865115 0 2 0 "AT9110939024" 12 9 0 .08158924 0 3 0 "AT9110939024" 12 9 0 .08158924 0 4 0 "AT9110939024" 12 9 0 .08158924 0 5 0 "AT9110939024" 12 9 0 .0673435 0 6 0 "AT9110939024" 12 9 0 .08004982 0 7 0 "AT9110939024" 12 9 0 .08004982 0 8 0 "AT9110939024" 12 9 0 .08004982 0 9 0 "AT9110939024" 12 9 0 .06608474 0 10 0 "AT9110939024" 12 9 0 .07856742 0 11 0 "AT9110939024" 12 9 0 .07856742 0 12 0 "AT9110939024" 12 9 0 .07856742 0 13 0 "AT9110939024" 12 9 0 .06487218 0 14 0 "AT9110939024" 12 9 0 .07713892 0 15 1 "AT9110939024" 12 9 0 .07713892 0 16 1 "AT9110939024" 12 9 0 .07713892 0 17 1 "AT9110939024" 12 9 0 .06370331 0 18 1 "AT9110939024" 12 9 0 .07576144 0 19 1 "AT9110939024" 12 9 0 .07576144 0 20 1 end label values Country country_cat2 label def country_cat2 12 "other", modify label values Industry NACE_cat2 label def NACE_cat2 6 "J - Information and communication", modify label def NACE_cat2 9 "M - Professional, scientific and technical activities", modify
In which I used the following stset and stcox commands
Code:
stset Quarters_num, id(BvdIdNumber) failure(FirmFailure2)
New dataset
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str16 BvdIdNumber long(countrycat industrycat) float(BlauGender VariationAge BlauNationality) byte Quarters_num float FirmFailure3 "AT9010104892" 18 7 0 0 0 1 0 "AT9010104892" 18 7 0 0 0 2 0 "AT9010104892" 18 7 0 0 0 3 0 "AT9010104892" 18 7 0 0 0 4 0 "AT9010104892" 18 7 0 0 0 5 0 "AT9010104892" 18 7 0 0 0 6 0 "AT9010104892" 18 7 0 0 0 7 0 "AT9010104892" 18 7 0 0 0 8 0 "AT9010104892" 18 7 0 0 0 9 0 "AT9010104892" 18 7 0 0 0 10 0 "AT9010104892" 18 7 0 0 0 11 0 "AT9010104892" 18 7 0 0 0 12 0 "AT9010104892" 18 7 0 0 0 13 0 "AT9010104892" 18 7 0 0 0 14 0 "AT9010104892" 18 7 0 0 0 15 0 "AT9010104892" 18 7 0 0 0 16 0 "AT9010104892" 18 7 0 0 0 17 0 "AT9010104892" 18 7 0 0 0 18 0 "AT9010104892" 18 7 0 0 0 19 0 "AT9010104892" 18 7 0 0 0 20 0 "AT9010104892" 18 7 0 0 0 21 0 "AT9010104892" 18 7 0 0 0 22 0 "AT9010104892" 18 7 0 0 0 23 0 "AT9010104892" 18 7 0 0 0 24 0 "AT9010104892" 18 7 0 .15427783 .5 25 0 "AT9010104892" 18 7 0 .15427783 .5 26 0 "AT9010104892" 18 7 0 .15427783 .5 27 0 "AT9010104892" 18 7 0 .15152287 .5 28 0 "AT9010104892" 18 7 0 .15152287 .5 29 0 "AT9010104892" 18 7 0 .15152287 .5 30 0 "AT9010104892" 18 7 0 .15152287 .5 31 0 "AT9010104892" 18 7 0 .14886458 .5 32 0 "AT9010104892" 18 7 0 .14886458 .5 33 0 "AT9010104892" 18 7 0 .14886458 .5 34 0 "AT9010104892" 18 7 0 .14886458 .5 35 0 "AT9010104892" 18 7 0 .14629795 .5 36 0 "AT9010104892" 18 7 0 .14629795 .5 37 0 "AT9010104892" 18 7 0 .14629795 .5 38 0 "AT9010104892" 18 7 0 .14629795 .5 39 0 "AT9010104892" 18 7 0 .14381832 .5 40 0 "AT9010106150" 18 9 0 .5191417 0 1 0 "AT9010106150" 18 9 0 .5191417 0 2 0 "AT9010106150" 18 9 0 .50632334 0 3 0 "AT9010106150" 18 9 0 .50632334 0 4 0 "AT9010106150" 18 9 0 .50632334 0 5 0 "AT9010106150" 18 9 0 .50632334 0 6 0 "AT9010106150" 18 9 0 .4941228 0 7 0 "AT9010106150" 18 9 0 .4941228 0 8 0 "AT9010106150" 18 9 0 .4941228 0 9 0 "AT9010106150" 18 9 0 .4941228 0 10 0 "AT9010106150" 18 9 0 .4824964 0 11 0 "AT9010106150" 18 9 0 .4824964 0 12 0 "AT9010106150" 18 9 0 .4824964 0 13 0 "AT9010106150" 18 9 0 .4824964 0 14 0 "AT9010106150" 18 9 0 .4714045 0 15 0 "AT9010106150" 18 9 0 .4714045 0 16 0 "AT9010106150" 18 9 0 .4714045 0 17 0 "AT9010106150" 18 9 0 .4714045 0 18 0 "AT9010106150" 18 9 0 .4608111 0 19 0 "AT9010106150" 18 9 0 .4608111 0 20 0 "AT9010106150" 18 9 0 .4608111 0 21 0 "AT9010106150" 18 9 0 .4608111 0 22 0 "AT9010106150" 18 9 0 .4506834 0 23 0 "AT9010106150" 18 9 0 .4506834 0 24 0 "AT9010106150" 18 9 0 .4506834 0 25 0 "AT9010106150" 18 9 0 .4506834 0 26 0 "AT9010106150" 18 9 0 .4409913 0 27 0 "AT9010106150" 18 9 0 .4409913 0 28 0 "AT9010106150" 18 9 0 .4409913 0 29 0 "AT9010106150" 18 9 0 .4409913 0 30 0 "AT9010106150" 18 9 0 .4317073 0 31 0 "AT9010106150" 18 9 0 .4317073 0 32 0 "AT9010106150" 18 9 0 .4317073 0 33 0 "AT9010106150" 18 9 0 .4317073 0 34 0 "AT9010106150" 18 9 0 .4228061 0 35 0 "AT9010106150" 18 9 0 .4228061 0 36 0 "AT9010106150" 18 9 0 .4228061 0 37 0 "AT9010106150" 18 9 0 .4228061 0 38 0 "AT9010106150" 18 9 0 .4142646 0 39 0 "AT9010106150" 18 9 0 .4142646 0 40 0 "AT9010119613" 18 9 0 .26444644 0 1 0 "AT9010119613" 18 9 0 .26444644 0 2 0 "AT9010119613" 18 9 0 .26444644 0 3 0 "AT9010119613" 18 9 0 .2602153 0 4 0 "AT9010119613" 18 9 0 .2602153 0 5 0 "AT9010119613" 18 9 0 .2602153 0 6 0 "AT9010119613" 18 9 0 .2602153 0 7 0 "AT9010119613" 18 9 0 .25611743 0 8 0 "AT9010119613" 18 9 0 .25611743 0 9 0 "AT9010119613" 18 9 0 .25611743 0 10 0 "AT9010119613" 18 9 0 .25611743 0 11 0 "AT9010119613" 18 9 0 .2521466 0 12 0 "AT9010119613" 18 9 0 .2521466 0 13 0 "AT9010119613" 18 9 0 .2521466 0 14 0 "AT9010119613" 18 9 0 .2521466 0 15 0 "AT9010119613" 18 9 0 .24829705 0 16 0 "AT9010119613" 18 9 0 .24829705 0 17 0 "AT9010119613" 18 9 0 .24829705 0 18 0 "AT9010119613" 18 9 0 .24829705 0 19 0 "AT9010119613" 18 9 0 .24456325 0 20 0 end label values countrycat countrycat label def countrycat 18 "other", modify label values industrycat industrycat label def industrycat 7 "K - Financial and insurance activities", modify label def industrycat 9 "M - Professional, scientific and technical activities", modify
Code:
stset Quarters_num, id(BvdIdNumber) failure(FirmFailure3)
Thanks in advance for your help!
Side note: I useSTATA/MP 15 on my work laptop and STATA/SE 18 on my personal pc
Best regards,
Laura
*edited to add tags to the original post