I'm trying to get an article published which attempts to assess if there was a higher risk of a condition lasting longer in a year when compared to a previous year. I sincerely appreciate any help you could provide. Apologies for the extra-long post and for asking two questions on this subreddit on the same day, but feel like it's a different (though related issue). If any mod thinks differently, I will merge them of course.
I have completed the descriptive analysis of a condition and written a table showing all the different variables and N, % and p25, median and p75 duration for that condition by each one of the variables. I have also conducted a Cox proportional hazards model. All anallyses have been stratified by sex (so I have two Cox models, one for each sex).
In most of the literature I've read, they show a total N at the top of the table. For some reason, my ex-boss and co-author insisted on me using a final row, below all the variables, showing "Total". He also insisted on removing missing data information from the descriptive table analysis. Now, after sending the article for publication, my reviewer asks me to add the missing data on the descriptive table and also the observation information for each Cox model (there are two, as the analsyis has been stratified by sex). I'm using Stata, and running into some difficulties here.
1) When running the Cox Model, I used "If" conditions to restrict the analysis to the condition that was of interest to us. Let's say there is condition type A and condition type B. When calculating total N, I should NOT include condition type B in it, even if it was included in the database, I understand (please correct me if I'm wrong). Now, I've also restricted the analysis to condition episodes that lasted X days (as I was told by my supervisors that condition episodes exceeding those days were virtually impossible, and more likely to be an error when collecting information). Should those episodes be counted when calculating the total N?
2) Is the total N then obtained just by dropping the subjects suffering from condition type B (the one I'm not interested in) and using the "describe function"? I seem to get the total number of subjects that way. And the total N is supposed to be the total number of subjects included in the study, right? So in theory, when adding up each variable's categories including missing data, each variable should add up the same number, right?
3) Is it OK to show total N like I described (at a bottom row)? As I'm the main author, would it be OK, to change it to the usual N at the top row?
4) My last problem would be with the number of observations information for the Cox Model. My reviewer asks me to include the information on the number of observations for each model. I understand he does not refer to the "XXXXX total observations 0 exclusions". I get that number right (meaning the total observation coincides with the number of observations I get with the describe command).
However, what I think he refers to is to the number at the "Number of obs = XXXXX" line. Assuming I'm using complete case analysis, that number should be lower than the total number of subjects for each variable right? (As the program is dropping any subject with missing data on any variable). My co-author instead says that "if there are no missing, models should have an N that's the same as the total of the sample". I can't wrap my head around this. There are missing values, the thing is that we are just not showing them. How to reconcile this number with showing information on the missing data?
Again, thanks a lot for any help and apologies for the extension.
PS.: I'm aware user names need to be composed of a full name and username, is there any way I can change my username to follow the rules? Thanks!
I have completed the descriptive analysis of a condition and written a table showing all the different variables and N, % and p25, median and p75 duration for that condition by each one of the variables. I have also conducted a Cox proportional hazards model. All anallyses have been stratified by sex (so I have two Cox models, one for each sex).
In most of the literature I've read, they show a total N at the top of the table. For some reason, my ex-boss and co-author insisted on me using a final row, below all the variables, showing "Total". He also insisted on removing missing data information from the descriptive table analysis. Now, after sending the article for publication, my reviewer asks me to add the missing data on the descriptive table and also the observation information for each Cox model (there are two, as the analsyis has been stratified by sex). I'm using Stata, and running into some difficulties here.
1) When running the Cox Model, I used "If" conditions to restrict the analysis to the condition that was of interest to us. Let's say there is condition type A and condition type B. When calculating total N, I should NOT include condition type B in it, even if it was included in the database, I understand (please correct me if I'm wrong). Now, I've also restricted the analysis to condition episodes that lasted X days (as I was told by my supervisors that condition episodes exceeding those days were virtually impossible, and more likely to be an error when collecting information). Should those episodes be counted when calculating the total N?
2) Is the total N then obtained just by dropping the subjects suffering from condition type B (the one I'm not interested in) and using the "describe function"? I seem to get the total number of subjects that way. And the total N is supposed to be the total number of subjects included in the study, right? So in theory, when adding up each variable's categories including missing data, each variable should add up the same number, right?
3) Is it OK to show total N like I described (at a bottom row)? As I'm the main author, would it be OK, to change it to the usual N at the top row?
4) My last problem would be with the number of observations information for the Cox Model. My reviewer asks me to include the information on the number of observations for each model. I understand he does not refer to the "XXXXX total observations 0 exclusions". I get that number right (meaning the total observation coincides with the number of observations I get with the describe command).
However, what I think he refers to is to the number at the "Number of obs = XXXXX" line. Assuming I'm using complete case analysis, that number should be lower than the total number of subjects for each variable right? (As the program is dropping any subject with missing data on any variable). My co-author instead says that "if there are no missing, models should have an N that's the same as the total of the sample". I can't wrap my head around this. There are missing values, the thing is that we are just not showing them. How to reconcile this number with showing information on the missing data?
Again, thanks a lot for any help and apologies for the extension.
PS.: I'm aware user names need to be composed of a full name and username, is there any way I can change my username to follow the rules? Thanks!
Comment