Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman probit with hierarchical data structure

    My data comes from a travel survey in which respondents record all trips they took on a single day. My study is on the effect of the physical environment, socio-demographics, and the weather, on whether some used active transport for a trip, or used any other travel mode. Each row in the dataset is a single trip. However, some respondents did not make any trips, and hence cannot be included in the study. This is a possible source of selection bias, since these respondents or their environment may have different characteristics from those who did make trips.

    To correct for this, I want to use the Heckman model. Since my outcome variable is binary (used active transport or not), I need to use heckprobit. The selection equation models whether some one made any trips on the day of the travel survey, and the outcome equation models for the people who did make trips, for each trip whether they used active transport. My code looks like this:

    Code:
    heckprob used_active_transport density age educ hourly_temperature trip_distance time_of_day,
    select(made_anytrips = density age educ daily_temperature day_of_week) vce(cluster respondent_id)
    My problem is that as far as I can gather, heckprobit assumes that the selection and outcome parts are on the same level. In the example data for both heckman and heckprobit this is the case. In my data, the selection is on the respondent level, while the outcome is on the trip level, with multiple trips being clustered within a single respondent. I have found the xtheckman command which may be of help by including respondent-level random effects, but it seems to have been made with a continuous instead of a binary outcome variable in mind. Some one suggested using clustered standard errors as I'm doing in the code above but I'm not sure if that is an adequate solution.

    For my research question it is important that I analyze the individual trips, e.g. aggregating trips to respondents is not preferred. Am I correct in assuming that using heckprobit for this data structure is problematic, and if so, what options do I have?
    Last edited by Maarten Hogeweij; 13 Aug 2025, 09:28.

  • #2
    Cross-posted at https://stats.stackexchange.com/ques...structure-in-s

    Please note our request https://www.statalist.org/forums/help#crossposting that you tell people about cross-posting.

    Comment


    • #3
      Sorry, I saw that in the FAQ but somehow forgot to implement it in my post!
      Last edited by Maarten Hogeweij; 13 Aug 2025, 10:31.

      Comment

      Working...
      X