Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Confusion with fixed effects results using only two groups

    Hello, I am attempting to study the effects of painting bright-green cycling lanes on the number of cycling collisions in Santa Monica. As can be seen below, I have geographical data of where cycling collisions occurred, as marked by the red and green dots. Also in the image below are green lines, which represent normal cycling lanes which have been painted green, and red lines, which represent normal cycling lanes which have not been painted green. The highlighted green area surrounding the green lines is a buffered area to account for possible rerouting of cycling trips after the treatment. I then have 2 areas, the area within the green buffer, and the area outside of it. My aim is to use these areas in a poisson fixed effects regression as location fixed effects.
    Click image for larger version

Name:	Screen Shot 2023-06-24 at 12.51.27 PM.png
Views:	2
Size:	834.4 KB
ID:	1718274

    My data is structured as follows:
    I am using daily data, with two observations per day.

    example:

    Date ...........| t |.......... num_collisions |........... area |....... percentage complete of green bike lanes |......... other variables....
    1/1/2017...... 1.................. 0.............................. 1................................................ 1
    1/1/2017...... 2.................. 1.............................. 0................................................ 2
    1/2/2017...... 3.................. 2.............................. 1................................................ 2.5
    1/2/2017...... 4.................. 1.............................. 0................................................ 3

    etc...

    My goal was to try to capture the number of collisions each day inside of the treatment area and outside of the treatment area, represented with a dummy variable coded 1 for collisions occurring within the treatment area and 0 for collisions which occurred outside of the area. These represent the red and green dots on the map above.

    What I am having trouble with now is running the poisson fixed effects regression.
    Here is the code I used:

    xtset area date
    xi i.season
    xi i.area

    xtpoisson num_collisions perc_complete_shapeLength2 metro_trip_frequency precip_accum_24_hour_mm_sq i.season ,fe irr vce(robust)

    Where:
    num_collisions = daily collision count, within either area 0 or 1
    perc_complete_shapeLength2 = percentage of green lanes finished being painted based on the length of each given segment, also measured daily (scaled from 0-100)
    metro_trip_frequency = daily counts of metro bike share trips in the Santa Monica area, to control for cycling volume.
    precip_accum_24_hour_mm_sq = daily rain accumulation in Santa Monica in mm^2 (controlling for weather)
    i.season = indicator variable of the season


    What I am confused about is the effect on the number of collisions. I want to know how much the treatment area was effected, but all I have is the effect on the overall number of collisions from both areas (or at least I think this is the case)
    Does anyone know how I would be able to know the effect on the treatment area?
    Also, I was a bit suspicious of how low the p values were for each variable, is this something to be concerned about?
    I would love to get some feedback as to these regression results and to find out if I am actually doing this correctly because it is something I am new to.

    All the best and thank you to anyone who responds,
    Clark Easley

    Results:
    Click image for larger version

Name:	Screen Shot 2023-06-24 at 1.19.13 PM.png
Views:	3
Size:	446.8 KB
ID:	1718275




    Attached Files
    Last edited by Clark Easley; 24 Jun 2023, 05:29.

  • #2
    I'm not at all sure I understand what you are doing. But if what you want to do is see whether the completion of green painting is differently associated with the number of collisions in the two areas, your regression model does not capture that. You would need to add an area X percent completed interaction term to your model.

    So, something like this:
    Code:
    xtpoisson num_collisions c.perc_complete_shapeLength2##i.area  metro_trip_frequency precip_accum_24_hour_mm_sq i.season, fe irr
    The coefficient of the interaction term would then give you a test of the difference in effect of perc_complete... on num_collisions across the two groups. Perhaps even more helpful would be to look at the average collisions:percent complete association separately in both areas. You could get that by following your regression with
    Code:
    margins area, dydx(perc_complete_shapeLength2)
    Note also that I have removed -vce(robust)-, which is inappropriate when there are only two groups.
    Last edited by Clyde Schechter; 24 Jun 2023, 10:54.

    Comment


    • #3
      Thank you so much for the response Clyde,

      Can you possibly give a more detailed explanation as to why robust standard errors are not appropriate with 2 groups?

      Thanks so much,

      Clark Easley

      Comment


      • #4
        I can't really go into too much detail. Basically, cluster robust standard errors are asymptotically correct--meaning that they are increasingly accurate as the number of clusters grows towards infinity. Various theoretical and simulation studies using samples with small numbers of clusters have shown that you really do need a large number for accuracy. Just how large is needed is actually pretty complicated and depends a lot on other properties of the data set. Various rules of thumb for how many clusters you need have been proposed. The most lenient I am aware of is 15 clusters. Most people seem to think somewhere in the 30-50 range is where they become usefully correct, although I have seen simulation studies showing that in some situations even 100 clusters yielded seriously incorrect results. In any event, 2 is definitely not enough.

        Comment

        Working...
        X