Can I ignore bins with high p-values

Laura Muhr

Join Date: Jun 2019

Posts: 10
#1

Can I ignore bins with high p-values

26 Jun 2019, 06:27

Hello!
I am analyzing the effect of wind on the fertility rate using a quadratic term of wind:

I get the following output in stata:

The utest showed that the inverse U-shape is significant at a 5% level.
Now i wanted to do a robustness check using wind bins of 25kmh, which gives me the following output with a constant of 146.3851:

Plotting both yields the following graph, where the red parts are significant while the grey ones are nonsignificant.

Now this shows me, that there is not really a strong U-shape, as some bins in the positive "area" are negative and the other way around. What stands out is that all those bins that break the ranks are insignificant.
So I am confused on how to argue about the results. Can I just ignore the insignificant ones and talk about a U-shape? I have around 90 observations in the first 9 bins, is this maybe too less? I dont know if I can talk about a causal effect in this case. Maybe the only thing that I can really say is that quite low wind speeds increase the fertility rate while really high ones decrease it?
I was also wondering why the graph of the bins is so much higher than the polynomial graph.
So, basically the question is: Does the result of the binning process "destroy" my assumption about the U-shape?

Thank you a lot in advance!

Attached Files
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2531
#2

26 Jun 2019, 06:42

Hi Laura,
While intuitively using binned data may be an alternative way to estimate nonlinearities compared to the quadratic model, they are not nested model, and you cannot make direct comparisons across them.
The quadratic model, for instance, imposes a continuous function that has a rate of change (first derivative) that is fixed. So under that assumption, your U-shape test finds that there is evidence that suggests an inverse U shape.
Using Bins can be considered as a non-smooth non parametric model. You are basically saying that the relationship between wind and fertility can have any shape. as a result you find that almost linear effect for most of the distribution, with a large negative effect for the highest Bin.
It seems to me that the quadratic term is basically taking on that last bin to get its inverse U shape.

For why both effects do not much on each other. The reason may have to do with what is used as baseline. I think it may depend on how are you estimating the numbers behind those figures. In general binned data and other parametric forms may not be comparable because they both have "different" comparison group, or, as it seems to be in you case, because of how those variables interact with your fixed effects.

One thing i may suggest to have a third way to analyze your data is to use the user written command xtsemipar. It may provide a more flexible approach to estimate the nonlinear effect of Wind on Fertility, that is still continuous but not constrained to be quadratic.
Hope this helps.
Fernando
Comment
Laura Muhr

Join Date: Jun 2019

Posts: 10
#3

26 Jun 2019, 06:59

Hello Fernando,

thank you very much. I will try the recommended command.
So in general you suggest me to drop the idea of "binning" in my approach and instead try another continuous non-linear model, did I understand you right in that point?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2531
#4

26 Jun 2019, 07:13

Yes, that is my suggestion.
Binning is great for exploration, as yo just noticed, but its harder to explain when multiple bins are used. And there are many people who do not like their use either because the results can be very sensitive to the binning strategy (how large are the Binns and why).
1 like
Comment
Laura Muhr

Join Date: Jun 2019

Posts: 10
#5

26 Jun 2019, 09:19

Unfortunately I was not able to use the recommended command (I do not know what I did wrong, stata just gives me error messages no matter what I try)).
Anyway, as Fernando, you said that it might not be a quadratic curve, I tried a third degree polynomial.
The problem now is that I cannot really say which one fits my data better.
The values of the second degree polynomial are as stated above.
(ß1=0.625 with p=0.024 and ß2=-0.0003 with p=0.003.)
The third degree polynomial gives the following values:
ß1=0.1368187 with p=0.015, ß2=-0.0011 with p=0.022 and ß3=0.00000203 with p=0.088.

Is it more appropriate to use the second or third degree polynomial given the p-values?
Or is it maybe better to decide due to the scatterplot?
My data give the following scatterplot and the regression outcomes the following graphs:

2nd degree:

3rd degree:

Moreover, the utest is a appropriate test for a ushape I think. Is there something similar for a third degree polynomial? Or does it maybe make sense to do two ushape test when dividing the data range in two parts?

Thanks so much
Comment

Announcement

Can I ignore bins with high p-values

Comment

Comment

Comment

Comment