Sequential logistic regression

Emeka Dim

Join Date: Jan 2018

Posts: 44
#1

Sequential logistic regression

21 Jan 2018, 04:10

Hi,

Please, I need some help on commands for how to conduct the following analysis below. Any commands that would enable me to conduct the following analysis is appreciated. I do not also mind if any person can refer me to a primer textbook where I can read up on sequential logistic regression. Please see the kind of regression am I talking about below:

I can provide more details on the table if necessary.

Thank you.

Emeka Dim.
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#2

22 Jan 2018, 01:29

Terminology differs from field to field. So it happens that the same term is used for very different things. This is the case here. I would not have called that a sequential logit, and I suspect that you searched for that term and found descriptions of very different models, and got (understandably) very confused.

That table looks like a different models with different explanatory variables. I don't think it requires a name, as the whole table is not a model, but just a description of different models. Moreover, this comparing of models with different explanatory variables is very problematic when you have logistic regressions. See e.g. (Mood 2010). I have been repeatedly very critical of that article on this list, but that has to do with her claim about not being able to compare groups, her point on not being able to compare logit models with different explanatory variables is fine.

So I would not do this, but if you want to ignore my warning you can just estimate the different models with logit.

Carina Mood (2010) Logistic regression: why we cannot do what we think we can do, and what we can do about it. European Sociological Review, 26(1): 67-82. https://doi.org/10.1093/esr/jcp006

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Emeka Dim

Join Date: Jan 2018

Posts: 44
#3

22 Jan 2018, 02:04

Thanks for the reply.

My aim for this paper is to replicate the analysis for this study (please see attached) because I have a more recent data. I am a doctoral student in Sociology Department, University of Saskatchewan. The author of the article employed a 1999 dataset while I intend to use a 2014 dataset of the same study. I also wanted to replicate the tables 5 and 6 of the article.

Normally, I would simply conduct bivariate and multivariate logistic regressions with the dataset for the manuscript I intend to write. However, I was interested in the insights the author of this paper brought into the study via the sequential logistic regression shown in tables 6 and the comparison of odd ratio using t-test in table 5.

Table 5:

Table 6:

Please, what do you advise? I am new to these new forms of analysis. I am doing some reading of these analyses. I have been introduced to hierarchical, stepwise, and nested logistic regressions and I can show you the commands I have tried so far and the results I am getting (with another dataset available to me). I am just curious to know how various blocks of models affect or relate to an outcome variable. Thanks and looking forward to your reply.

Emeka Dim.

Attached Files

IPV Against Aboriginal Men in Canada.pdf (87.6 KB, 2 views)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#4

22 Jan 2018, 03:41

Based on the table, the authors just estimated 4 different logit models. Presumably they called that a sequential logit model, because they sequentially added variables. Is that clearer?

I'll repeat my warning that this is bad practice when dealing with a logit model.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Emeka Dim

Join Date: Jan 2018

Posts: 44
#5

22 Jan 2018, 11:48

Thanks for the feedback. So would you advise that instead of trying to replicate the analyses in Table 6, I should only run a multivariate logistic regression?

Also, what is your take on how table 5 was arrived at? The author conducted two separate multivariate logistic models and used a t-test to compare the odds ratio for the two models to check for significance. I am also thinking of replicating that table. I have tried the following commands on another dataset:

. logistic PhyVio V024 V102 V012 Religion1 Witness FamVio1 Alcohol Perp [pweight = V005] if V501 == 1
. estimates store a
. logistic PhyVio V024 V102 V012 Religion1 Witness FamVio1 Alcohol Perp [pweight = V005] if V501 == 2
. lrdrop1

I can show you the results I got from the above command.

Last edited by Emeka Dim; 22 Jan 2018, 12:00.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#6

23 Jan 2018, 02:02

For table 6 you need to run 4 separate logistic regressions. I will repeat: don't do that; it is bad practice. See the reference in #2.

lrdrop1 is a user written program. If you refer to a user written program, you need to tell us that and tell us where you got this from. Quite often there are multiple versions floating around in cyberspace, and it is useless if we are talking about different versions. I assume you got this from SSC. This will not give you the test you are looking for. It tests for each explanatory variable whether or not you can leave that variable out of your second model. What you want is a test for each explanatory variable whether they are the same in the first and second model. For that you can use the suest command, followed by the test command.

Code:

// open example data sysuse nlsw88, clear // prepare the data gen byte black = race == 2 if !missing(race) label variable black "respondent's race" label define black 0 "not black" /// 1 "black" label value black black gen byte edcat = cond(grade < 12, 1, /// cond(grade == 12, 2, /// cond(grade < 16, 3,4))) /// if !missing(grade) label variable edcat "respondent's education" label define edcat 1 "< highschool" /// 2 "highschool" /// 3 "some college" /// 4 "college" label value edcat edcat gen byte highoc = occupation < 3 if !missing(occupation) label variable highoc "high occupation" label define highoc 1 "higher" /// 0 "lower" label value highoc highoc // estimate the models logit highoc i.edcat ttl_exp i.south if black == 0, or est store white logit highoc i.edcat ttl_exp i.south if black == 1, or est store black // combine them in one suest white black // test test ([white_highoc=black_highoc]:2.edcat) /// ([white_highoc=black_highoc]:3.edcat) /// ([white_highoc=black_highoc]:4.edcat) /// ([white_highoc=black_highoc]:ttl_exp) /// ([white_highoc=black_highoc]:1.south), mtest

This will give you chi square statistics instead of z statistics, but with 1 degree of freedom the z-statistic is just the square root of the chi-square statistic. So you can report whichever you prefer.

Two further warnings: 1. If you want to submit this research to a journal, chances are they will reject it as you are comparing odds ratios across groups. See the reference in #2. In this case I disagree (see: http://maartenbuis.nl/wp/oddsratio.html) but I (still) take a minority positions. 2. The author did not do a t-test, and labeling these values as t is incorrect. Instead that column should have been labeled z, that is, these values are not to be compared with a t-distribution but with a standard normal distribution.

So in short, it may very well be that the article you are replicating substantively very good, but you don't want to learn your statistics from it.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Emeka Dim

Join Date: Jan 2018

Posts: 44
#7

23 Jan 2018, 20:33

Someone shared this website with me and it has your name as the author:
http://www.maartenbuis.nl/software/seqlogit.html

It has the sequential logistic regression commands. Please what do you think?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#8

24 Jan 2018, 01:44

I started in #2 that terminology in statistics is not harmonized, so occasionally the same term is used for different things. What you want to do is not what I would have called a sequential logit. So that command does not apply to your situation.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Emeka Dim

Join Date: Jan 2018

Posts: 44
#9

28 Jan 2018, 03:40

Concerning the replicate of Table 5, I saw this link of yours: http://www.maartenbuis.nl/software/ftest.html. Please, would the commands and materials in this link of yours enable me to replicate the analysis in Table 5?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#10

29 Jan 2018, 01:33

No, ftest is for linear regression only.

I don't understand why you are looking for additional commands. As far as I can determine you already have everything you need. Apparently, you are still stuck. It will be more productive if you tell us where you got stuck, that is: what you want to do in the problematic step, what you tried and what Stata told you in return, and why you think that is not what you want.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Emeka Dim

Join Date: Jan 2018

Posts: 44
#11

08 Feb 2018, 16:11

Thank you for the reply.

I recently found a way to do the analysis by that author. It was quite simple and it did not involve any other commands.

I did an analysis recently and one of my former lecturers questioned the validity of the analysis. The analysis is in the attached document. The first table shows how I conducted a sequential logistic regression but the table I am more interested in is the second table.

My lecturer said that there appear to be errors in the perpetration of violence variable (in Models 3 for NDHS 2008 and 2013 of the second table). He said that the distribution may not be good for any meaningful analysis. He suggested that if the `n' for a particular category is too small, one is likely to have a very high exponentiated B value. I personally have some objections to his suggestions as I feel that there is nothing wrong with the tables. Perhaps you may feel different and I suggest you would know far better than me.

Please, what do you suggest with this result? Are the models valid? Is there anything wrong with the variable on the perpetration of violence in the two models?

Thank you and looking forward to your reply.

Attached Files

Sequential logistic regression analysis.docx (18.4 KB, 1 view)
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#12

09 Feb 2018, 02:32

What happens if you add a binary variable to a model? Than you compare the expected outcome of perpetrators with the expected outcome of non-perpetrators. The expected outcome is estimated, so if you have a very small number of perpetrators that estimation is going to be very imprecise and susceptible to outliers. If one of the expected outcome is problematic, then the difference (or ratio) of these will also be problematic. An unrealistically large odds ratio is one possible indicator for that.

There is no hard cut-off between small and too small. You just need to know the potential problems and make an informed decision. So it is now up to you to make your case that the number of perpetrators is big enough and leave it in your model or the case that it is too small and leave that variable out. Others may and will challenge that choice, and you need to find the right balance between explaining and defending your decision and being open to and responsive to the arguments of the people challenging you. If you get that balance right, then you are doing real science.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
Emeka Dim

Join Date: Jan 2018

Posts: 44
#13

09 Feb 2018, 10:58

What if the variable for perpetration has outcome like: Non-perpetrator 98.5% and Perpetrator 1.5% (for the 2013 data) and Non-perpetrator 98.2% and Perpetrator 1.8% (for the 2008 data). Can one say that this variable is too skewed to be employed as a predictor variable in a logistic regression analysis?
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#14

11 Feb 2018, 11:52

I told you in #12 what the issues are. That is all we can do. The final decision is a tradeoff that you alone can make. The 1.5% is small, but whether that is too small depends on many other issues, e.g. the size of the sample, the importance of that variable, etc. That is not a decision a random person on the internet can make for you.

Last edited by Maarten Buis; 11 Feb 2018, 12:10.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

Announcement

Sequential logistic regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment