Question about the results in piecewise poisson regression

David Lu

Join Date: May 2016

Posts: 105
#1

Question about the results in piecewise poisson regression

11 May 2016, 01:23

Hi all,

Code:

poisson income c.iv1##c.iv2 c.iv1#c.iv1 c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode i.areacode, vce(robust)

Then, I drop one of c.iv1#c.iv1, and it becomes significant.

Code:

poisson income c.iv1##c.iv2 c.iv2#c.iv2 cv1 cv2 cv3 cv4 i.indcode i.areacode, vce(robust)

But if I keep dropping another square term, c.iv2#c.iv2, then it becomes insignificant,

I have checked the correlationship between the variables, but found nothing serious collinearity

. corr income iv1 iv2
(obs=267)

| income iv1 iv2
-------------+---------------------------
income | 1.0000
iv1 | 0.0318 1.0000
iv2 | -0.0178 0.2790 1.0000

Is there something wrong here?

Thanks,
David

Last edited by David Lu; 11 May 2016, 01:25.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

11 May 2016, 04:06

David:
yiou may have a singleton dummy among your predictors (please, see http://www.stata.com/statalist/archi.../msg00851.html).
You also report very high pseudo R-sq with most non-signifucant pedictors: I would take a look at -estat vce- after -poisson-.

Kind regards,
Carlo
(Stata 19.0)
Comment
David Lu

Join Date: May 2016

Posts: 105
#3

11 May 2016, 05:41

Originally posted by Carlo Lazzaro View Post

David:
yiou may have a singleton dummy among your predictors (please, see http://www.stata.com/statalist/archi.../msg00851.html).
You also report very high pseudo R-sq with most non-signifucant pedictors: I would take a look at -estat vce- after -poisson-.

Hi Carlo,

You're absoultely right. It has a tons of dummy because of the control for industrial effect and regional effect. But I am still confused about what the problem is after looking at -estat vce- after -poisson-, can you explain a bit ?

Thanks,
David
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

11 May 2016, 05:52

David:
I meant that you may have two problems:
- a singleton dummy, that makes the calculation of Wald test unfasible;
- multicollinearity, which inflates your pseudo-Rsq but leaves non-significant coefficients: you may want to eyeball the -estat vce, corr- output to sniff out the potential culprit(s) if that is the issue.

Kind regards,
Carlo
(Stata 19.0)
Comment
David Lu

Join Date: May 2016

Posts: 105
#5

11 May 2016, 06:05

Originally posted by Carlo Lazzaro View Post

David:
I meant that you may have two problems:
- a singleton dummy, that makes the calculation of Wald test unfasible;
- multicollinearity, which inflates your pseudo-Rsq but leaves non-significant coefficients: you may want to eyeball the -estat vce, corr- output to sniff out the potential culprit(s) if that is the issue.

Hi Carlo,

I check there is no singleton dummy in my model. Also, I've check the problem of multiconllinearity, none of the variables are highly correlated (<0.4). So, that what I am confused why the significance of the coefficients becomes so tricky.

Thanks,
David
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

11 May 2016, 06:18

David:
- missing Wald test: there was similar thread some time ago on this forum: http://www.statalist.org/forums/foru...andard-errors;
- if multicollinearity is not the issue, it may be that you have included too many interactions or, in general terms, too many predictors (by the way: are all of them useful for your research purposes? What does the literature in your research field suggest for dealing with the same research topic?).

Kind regards,
Carlo
(Stata 19.0)
Comment
David Lu

Join Date: May 2016

Posts: 105
#7

11 May 2016, 06:50

Originally posted by Carlo Lazzaro View Post

David:
- missing Wald test: there was similar thread some time ago on this forum: http://www.statalist.org/forums/foru...andard-errors;
- if multicollinearity is not the issue, it may be that you have included too many interactions or, in general terms, too many predictors (by the way: are all of them useful for your research purposes? What does the literature in your research field suggest for dealing with the same research topic?).

Hi Carlo,

Missing wale test is less severe for me. What I am really worried about is the tricky significance of the coeffients. In my models, I have two variables in their square term. What I want to know is why they are insignificant if staying together in the same model and turn to be significant if one is out. I think it's because multiconlllinearity but I find no evidence about this. And if I do piecewise regression, I really doubt if my logic makes sense to the reviewer why I partially drop out one squared variable with another included in the model. Becasue I assume that all of the insignificant squared variables should be partially out, how can I convince others I left one and drop the other and then the model improves?

Btw, it seems that I misunderstood the concept of singleton dummy in my context. If those industries count once only should be taken as the case of singleton dummy, then, the sample has 7 singleton dummies.

Best,
David

Last edited by David Lu; 11 May 2016, 07:21.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#8

11 May 2016, 07:19

David:
I understand your concerns: that's why I previously suggested you to skim through the literature in yiur research field and see if any example of what you're after already exist (even better if published in your target journal).

Kind regards,
Carlo
(Stata 19.0)
Comment
David Lu

Join Date: May 2016

Posts: 105
#9

11 May 2016, 07:35

Originally posted by Carlo Lazzaro View Post

David:
I understand your concerns: that's why I previously suggested you to skim through the literature in yiur research field and see if any example of what you're after already exist (even better if published in your target journal).

Hi Carlo,

Yes, I have seen some similar cases. But then, they don't do piecewise regression and only report the results of the final model. So, I lost the cue and find no concrete explanation about why this happens. Have you met some similar example before?

Best,
David
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

11 May 2016, 07:39

David:
just out of curiosity: as income is a continuous variable, why using -poisson- instead of -regression-?

Kind regards,
Carlo
(Stata 19.0)
Comment
David Lu

Join Date: May 2016

Posts: 105
#11

11 May 2016, 07:46

Originally posted by Carlo Lazzaro View Post

David:
just out of curiosity: as income is a continuous variable, why using -poisson- instead of -regression-?

Hi Carlo,

Yes, it is a continuous variable but it's highly right skewed. And -regression- is not suitable for this type of data. I have to either transform my data or use another regression. I consulted the literature and asked questions here, most of them suggest me not take the log form of the data and tried to use -poisson- or -glm- with a log link function, it also fits for nonnegative continuous variable not only count variable.

Best,
David
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#12

11 May 2016, 07:58

David:
I suppose you refer to http://blog.stata.com/2011/08/22/use...tell-a-friend/.
However, if this suggestion does not ease your research procedure, going -regression- vs -poisson- may depend on some features of your data, for instances how many zeros you have in the depvar before deciding to log income.
As an aside, normality of the depvar is not required for -regression-.

Kind regards,
Carlo
(Stata 19.0)
Comment
David Lu

Join Date: May 2016

Posts: 105
#13

11 May 2016, 08:07

Originally posted by Carlo Lazzaro View Post

David:
I suppose you refer to http://blog.stata.com/2011/08/22/use...tell-a-friend/.
However, if this suggestion does not ease your research procedure, going -regression- vs -poisson- may depend on some features of your data, for instances how many zeros you have in the depvar before deciding to log income.
As an aside, normality of the depvar is not required for -regression-.

Hi Carlo,

That's where I started from, it's a very inspiring and helpful post. My depvar is positive no zero inside. I previously decided to use the log, but some scholars disagree with that. Personally, both -regression and -poisson- fit for my data, they all yield significant results. But since -poisson- has much stronger argument for skewed data, and also it fits well with nonnegative variables, so I go for -poisson- or -glm-. But what I am worried about is if it makes sense to researchers by dropping one square term while retaining the other without specific explanation or arguement. Does it make sense to you?

David
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#14

11 May 2016, 08:45

David:
I'm afraid it does not, unless some theorethical argument can support your choice.

Kind regards,
Carlo
(Stata 19.0)
Comment
David Lu

Join Date: May 2016

Posts: 105
#15

11 May 2016, 08:58

Originally posted by Carlo Lazzaro View Post

David:
I'm afraid it does not, unless some theorethical argument can support your choice.

Hi Carlo,

That's what I am confused about. Since those without piecewise regression, they escape the theoretical argument why dropping one and maintain the other. But I prefer to use piecewise and try to explain why it reasonable to do so, do u know if there is similar literature that can help me in explaining why just keeping only one squared term in the model and dropping the other?

Thanks,
David
Comment

Announcement

Question about the results in piecewise poisson regression

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment