Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • CMP with ordered probit main outcome and endogenous ordered probit in first stage

    I have panel data on farmers who decide which place to sell their output (first stage ordered probit) and which mode of transportation to reach that place (main equation ordered probit). My main variable of interest is gender that appears in both stages. I am not adding interactions to keep it simple. My main focus is on ordered probits because it gives more nuanced results. After reading the stata help files on conditional (recursive) mixed-process estimator (cmp), the examples and the papers by Roodman (2011), Chyi and Mao (2012), Botezat and Pfeiffer (2014), I developed the following code. It seems to be working very well, so my main question is whether someone could take a look to see if I am missing something, as cmp and eoprobit are new commands to me.


    cmp (modetransp_order = i.FHH_clean controls i.year) ( placesale_resources_hh = i.FHH_clean controls i.year) , indicators($cmp_oprobit $cmp_oprobit) vce(cluster hhdid)

    margins , dydx(*) predict(eq(#1) outcome(0) pr)
    margins , dydx(*) predict(eq(#1) outcome(1) pr)
    margins , dydx(*) predict(eq(#1) outcome(2) pr)

    margins , dydx(*) predict(eq(#2) outcome(0) pr)
    margins , dydx(*) predict(eq(#2) outcome(1) pr)
    margins , dydx(*) predict(eq(#2) outcome(2) pr)



    I obtain similar magnitudes and significances when using endogenous ordered probit (command eoprobit) for the main equation, but cmp is more efficient and consistent.
    Code:
    eoprobit modetransp_order FHH_clean controls i.year , endogenous(placesale_resources_hh = FHH_clean controls i.year, oprobit) vce(cluster hhdid)
    • Nothing is significant when using endogenous ordered probit (command eoprobit) for the first stage
    • When using ordered probit (oprobit) separately (not a system of equations) for the first stage, I obtained similar magnitudes and significances for some controls compared to cmp
    • Code:
    oprobit placesale_resources_hh FHH_clean controls i.year , vce(cluster hhdid)

    margins , dydx(*) post


    Thanks so much for your time,

    Laura
    Last edited by Laura Maratou-Kolias; 22 Feb 2023, 11:15.

  • #2
    On a glance, the cmp and margins commands look good to me. Of course, I can't speak to whether they exactly express the model you intend.
    --David

    Comment


    • #3
      Thanks so much, David!! I am excited about the possibilities using cmp. Thanks for the comprehensive documentation.

      Comment


      • #4
        Hello David and everyone,

        I have the following results and have trouble with the interpretation. What is the baseline that the ordered categorical variable from the first stage enters in the marginal effects of the second categorical variable? For example, in column 2 of this table, the place reduces the probability of the household using the first type of transportation (value 0) by 43%. Which category of the first stage ordered variable influences the first category of the second stage ordered variable and so on with all the combinations of the two categorical variables? Each categorical dependent variable takes the values 0, 1 and 2.


        Thanks so much,

        Laura

        Comment


        • #5

          Comment


          • #6
            Hi Laura. I don't see a table...

            Comment


            • #7
              Sorry, I can't seem to get a table to work, so I pasted it as text here, and the formatting is off. The estimate for the first stage endogenous variable is 1.60, while the marginal effect is -0.43 for category 1, 0.18 for category 2, and 0.25 for category 3. The marginal effects for the second stage are calculated for each category cutoff value so I was wondering how the categories of the first stage come into play there. I also calculated marginal effects for the first stage separately.

              Regression number (1) (2) (3) (4)
              Category of categorical dependent variable (Transportation) Category 1 (value 0) Category 2 (value 1) Category 2 (value 2)
              Regression description Estimates Marginal Effects Marginal Effects Marginal Effects
              ordered categorical variable from the first stage (categories 0, 1, 2) (Place) 1.60*** -0.43*** 0.18** 0.25**
              [0.23] [0.04] [0.08] [0.11]

              Notes: Standard errors in brackets. *, ** and *** indicate 10%, 5% and 1% significance levels, respectively.
              Last edited by Laura Maratou-Kolias; 06 Jun 2023, 19:02.

              Comment


              • #8
                The cmp command line in the original post does not put either equation's left-side variable on the right side of the other, so it looks like a "seemingly unrelated" model. Commands such as "margins , dydx(*) predict(eq(#1) outcome(0) pr)" will, I believe, tell you the average change in the probability of the given outcome in the given equation of a one-unit increase of each right-side variable, with the means computed at the actual data values for each observation.

                Comment


                • #9
                  Thanks so much. In subsequent analysis, I put the left-side variable of the second equation in the right-side variable in the first equation as shown in the code below. The notation of the equations is shown in equations 1 and 2. I understand that this is a two-stage ordered probit.

                  Would the interpretation of the marginal effects stay the same as you clarified above, since now the place (categorical) appears in the transportation (categorical) marginal effects?

                  cmp (modetransp_order = i.FHH_clean controls placesale_resources_hh i.year) ( placesale_resources_hh = i.FHH_clean controls i.year) , indicators($cmp_oprobit $cmp_oprobit) vce(cluster hhdid)

                  PLACE = γ0 + γ1 G + γ2 Χ + ξ (1)
                  TRANSP = β0 + β1 G + β2 Κ + β3 PLACE + η (2)

                  Comment


                  • #10
                    I think the one point of complexity is just this: the first equation does not "know" that the placesale_resources_hh is being modeled as ordered probit. It treats it like any other variable. Its single coefficient, β3, represents the impact on the outcome of a 1-unit increase, whether from 0 to 1 or 2 to 3. That carries over to other "margins" you might compute. For outcomes, such as those probabilities, where according to the estimates the impact of a 1-unit increase depends on the values of PLACE and the other variables, then what answer you get from margins depends on how you tell it to compute the average marginal effect. I think by default it averages over the predicted effect in each of your observations.

                    Comment


                    • #11
                      Thanks for the reply and clarification. I tried various code options but cmp does not allow me to specify at() option in the margins command for the first stage endogenous variable. I tried at(placesale_resources_hh=(0 1 2)) but it did not work, so I opted for SUR framework to use the standard cmp margin commands and it is conceptually closer to my problem. I also estimated endogenous ordered probit (eoprobit) for the two stages. I found the marginal effects at each category point in both equations using cmp.

                      My follow-up question is that I get different number of observations in three models I analysed for sensitivity purposes. Do you know why CMP gives different N and why N in margins does not correspond to that for estimates?
                      1. Ordered probit of place (standalone) and ordered probit of transportation (standalone), N=742 for estimates and marginal effects
                      2. Endogenous ordered probit (eoprobit) of place (1st stage) and transportation (2nd stage), N=565 for marginal effects (I cannot see the estimates)
                      3. CMP SUR place and transportation (no endogenous variable in transportation), N=750 for estimates, N=595 for marginal effects.

                      Comment


                      • #12
                        I should say that all three models have exactly the same control variables (different ones for place and transport) but same across models

                        Comment


                        • #13
                          In general, these commands should define the sample based on non-missingness of required variables. However, by default, cmp is greedy in defining the sample for each equation. If an observation is complete for one equation but not the other, it will include it in the data for the equation for which it is complete. The reported sample size is then the number of observations for which at least one equation has complete data. The return macros e(N1), e(N2),... give the sample sizes by equation. You can restrict the sample as you please using an "if" clause.

                          Comment


                          • #14
                            Thanks so much; that’s very helpful. Now the number of observations of the estimates is the same using ordered probit and cmp SUR. The margins have different N between ordered probit and cmp SUR, probably because they are different estimators, but they are close enough to be addressed in a footnote. Thanks again!

                            Comment

                            Working...
                            X