Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I ignore bins with high p-values

    Hello!
    I am analyzing the effect of wind on the fertility rate using a quadratic term of wind:

    Click image for larger version

Name:	Bildschirmfoto 2019-06-26 um 14.13.50.png
Views:	1
Size:	12.5 KB
ID:	1504965

    I get the following output in stata:

    Click image for larger version

Name:	Bildschirmfoto 2019-06-26 um 14.16.26.png
Views:	1
Size:	66.9 KB
ID:	1504967

    The utest showed that the inverse U-shape is significant at a 5% level.
    Now i wanted to do a robustness check using wind bins of 25kmh, which gives me the following output with a constant of 146.3851:

    Click image for larger version

Name:	Bildschirmfoto 2019-06-26 um 14.18.05.png
Views:	1
Size:	58.5 KB
ID:	1504968


    Plotting both yields the following graph, where the red parts are significant while the grey ones are nonsignificant.

    Click image for larger version

Name:	Bildschirmfoto 2019-06-26 um 14.21.42.png
Views:	1
Size:	33.7 KB
ID:	1504969


    Now this shows me, that there is not really a strong U-shape, as some bins in the positive "area" are negative and the other way around. What stands out is that all those bins that break the ranks are insignificant.
    So I am confused on how to argue about the results. Can I just ignore the insignificant ones and talk about a U-shape? I have around 90 observations in the first 9 bins, is this maybe too less? I dont know if I can talk about a causal effect in this case. Maybe the only thing that I can really say is that quite low wind speeds increase the fertility rate while really high ones decrease it?
    I was also wondering why the graph of the bins is so much higher than the polynomial graph.
    So, basically the question is: Does the result of the binning process "destroy" my assumption about the U-shape?

    Thank you a lot in advance!
    Attached Files

  • #2
    Hi Laura,
    While intuitively using binned data may be an alternative way to estimate nonlinearities compared to the quadratic model, they are not nested model, and you cannot make direct comparisons across them.
    The quadratic model, for instance, imposes a continuous function that has a rate of change (first derivative) that is fixed. So under that assumption, your U-shape test finds that there is evidence that suggests an inverse U shape.
    Using Bins can be considered as a non-smooth non parametric model. You are basically saying that the relationship between wind and fertility can have any shape. as a result you find that almost linear effect for most of the distribution, with a large negative effect for the highest Bin.
    It seems to me that the quadratic term is basically taking on that last bin to get its inverse U shape.

    For why both effects do not much on each other. The reason may have to do with what is used as baseline. I think it may depend on how are you estimating the numbers behind those figures. In general binned data and other parametric forms may not be comparable because they both have "different" comparison group, or, as it seems to be in you case, because of how those variables interact with your fixed effects.

    One thing i may suggest to have a third way to analyze your data is to use the user written command xtsemipar. It may provide a more flexible approach to estimate the nonlinear effect of Wind on Fertility, that is still continuous but not constrained to be quadratic.
    Hope this helps.
    Fernando

    Comment


    • #3
      Hello Fernando,

      thank you very much. I will try the recommended command.
      So in general you suggest me to drop the idea of "binning" in my approach and instead try another continuous non-linear model, did I understand you right in that point?

      Comment


      • #4
        Yes, that is my suggestion.
        Binning is great for exploration, as yo just noticed, but its harder to explain when multiple bins are used. And there are many people who do not like their use either because the results can be very sensitive to the binning strategy (how large are the Binns and why).

        Comment


        • #5
          Unfortunately I was not able to use the recommended command (I do not know what I did wrong, stata just gives me error messages no matter what I try)).
          Anyway, as Fernando, you said that it might not be a quadratic curve, I tried a third degree polynomial.
          The problem now is that I cannot really say which one fits my data better.
          The values of the second degree polynomial are as stated above.
          (ß1=0.625 with p=0.024 and ß2=-0.0003 with p=0.003.)
          The third degree polynomial gives the following values:
          ß1=0.1368187 with p=0.015, ß2=-0.0011 with p=0.022 and ß3=0.00000203 with p=0.088.

          Is it more appropriate to use the second or third degree polynomial given the p-values?
          Or is it maybe better to decide due to the scatterplot?
          My data give the following scatterplot and the regression outcomes the following graphs:

          Click image for larger version

Name:	scatterplot.png
Views:	1
Size:	157.7 KB
ID:	1504997


          2nd degree:

          Click image for larger version

Name:	Bildschirmfoto 2019-06-26 um 17.15.16.png
Views:	1
Size:	34.6 KB
ID:	1504998


          3rd degree:
          Click image for larger version

Name:	Bildschirmfoto 2019-06-26 um 17.14.27.png
Views:	1
Size:	35.3 KB
ID:	1504999



          Moreover, the utest is a appropriate test for a ushape I think. Is there something similar for a third degree polynomial? Or does it maybe make sense to do two ushape test when dividing the data range in two parts?

          Thanks so much

          Comment

          Working...
          X