Estimating marginal effects of a constructed variable

Saika Belal

Join Date: Apr 2016

Posts: 7
#1

Estimating marginal effects of a constructed variable

27 Jul 2023, 21:10

Hi Statalist community

I want to run a probit regression and estimate marginal effects of two variables that do not directly enter into the regression

Here is a simplified version of my problem:

I want to calculate the marginal effects of continuous variables pay and length on a binary dependent variable taskaccept (which takes values 0 or 1)

However, taskaccept is a function of ln(pay)/ln(length), such that

\[
taskaccept = a + b * (ln(pay)/ln(length))
\]

I created a variable

\[
lnpl = (ln(pay)/ln(length))
\]

And then ran the regression

Code:

probit taskaccept lnpl

What can i do now to estimate separate marginal effects with respect to pay and length?

I cannot run:

Code:

margins, dydx(pay length)

because pay and length are not covariates.

I can use the chain rule to calculate d(taskaccept)/d(pay) and d(taskaccept)/d(length) using _b[lnpl] however this restricts both marginal effects of pay and length to have the same standard errors etc., while the true relationship between taskaccept and pay and length is such that they would have separate marginal effects.

Is there a way for me to run a regression while "reminding" Stata that one of the independent variables is actually a composite of other variables? Or is there some other way for me to approach this problem?

I’ll really appreciate your help. I’m a rookie to this forum, if that wasn’t completely clear from the question.

Last edited by Saika Belal; 27 Jul 2023, 21:11. Reason: Added tags
Tags: marginal effect, margins
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#2

28 Jul 2023, 01:24

Can you use \(\ln(\frac{pay}{length})\) instead of \(\frac{\ln(pay)}{\ln(length)}\)? In that case the equation simplifies to \(\beta \ln(\frac{pay}{length}) = \beta (\ln(pay) -\ln(length)) = \beta \ln(pay) - \beta\ln(length)\), that means you just add both \(\ln(pay)\) and \(\ln(length)\) and constrain the coefficient of \(\ln(length)\) to be minus the coefficient of \(\ln(pay)\), with the constraint option. After that, you can just use margins to get marginal effects and their standard errors.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment

John Mullahy

Join Date: Dec 2016
Posts: 753

28 Jul 2023, 05:56

You might try either gmm or nl. Here's an example with the auto data,

Code:

cap preserve
cap drop _all

sysuse auto

nl (price={alpha}+{beta}*log(weight)/log(length)), var(weight length)

margins, dydx(*) gen(m)

sum m1 m2

cap restore

Result:

Code:

. nl (price={alpha}+{beta}*log(weight)/log(length)), var(weight length)

Iteration 0:  residual SS =  4.83e+08
Iteration 1:  residual SS =  4.83e+08


      Source |      SS            df       MS
-------------+----------------------------------    Number of obs =         74
       Model |  1.525e+08          1   152478134    R-squared     =     0.2401
    Residual |  4.826e+08         72  6702600.87    Adj R-squared =     0.2295
-------------+----------------------------------    Root MSE      =   2588.938
       Total |  6.351e+08         73  8699525.97    Res. dev.     =   1371.108

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      /alpha |  -99295.53   22113.07    -4.49   0.000    -143377.1   -55213.92
       /beta |   69129.61   14493.79     4.77   0.000     40236.77    98022.46
------------------------------------------------------------------------------
Note: Parameter alpha is used as a constant term during estimation.

.
. margins, dydx(*) gen(m)

Average marginal effects                                    Number of obs = 74
Model VCE: GNR

Expression: Fitted values, predict()
dy/dx wrt:  weight length

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      weight |   4.723883   .9904146     4.77   0.000     2.782706     6.66506
      length |  -109.0759   22.86898    -4.77   0.000    -153.8983   -64.25354
------------------------------------------------------------------------------

.
. sum m1 m2

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          m1 |         74    4.723883    1.398918   2.620231   7.849442
          m2 |         74   -109.0759    14.93345  -148.9029  -84.71975

.

Comment

Saika Belal

Join Date: Apr 2016

Posts: 7
#4

02 Aug 2023, 13:16

Thank you both very much for your help. I began by looking into the nl option that John (@John Mullahy) suggested,
and I have a few follow up questions.

Ultimately, I want to estimate marginal effects of pay and length at different levels of factor variables akin to running:

Code:

probit taskaccept mainVAR##i.factor1##i.factor2 margins i.factor1##i.factor2 , dydx(pay length)

where mainVAR = (ln(pay)/ln(length))

I expanded John's code to include interactions with one factor variable: "female" which takes values 0 and 1.

Code:

nl (taskaccept={a}+{b1}*log(pay)/log(length) + {b2}*female + {b3}*female*(log(pay)/log(length)) ) /// if length>1 , var(pay length female) coeflegend

This part runs fine. (I use the condition "if length>1" to avoid log(pay)/log(length) equating to missing.)
I would then like to run:

Code:

margins i.female, dydx(taskksh tasksecs) gen(m)

But I get the error:

HTML Code:

factor 'female' not found in list of covariates r(322);

Do you have any suggestions for this?

Additionally, if I want to add a set of covariate controls to my equation, will I need to extend the equation by adding a parameter for each control variable (see below)?

Code:

nl (taskaccept={a}+{b1}*log(pay)/log(length) + {b2}*female + {b3}*female*(log(pay)/log(length)) /// + {b4}*control1 + {b5}*control2 + {b6}*control3 + ... ) /// if length>1 , var(pay length female) coeflegend

Last edited by Saika Belal; 02 Aug 2023, 14:07. Reason: added @John Mullahy
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2487

02 Aug 2023, 13:35

Hi Saika,
you could use "f_able" (see how it works here https://journals.sagepub.com/doi/pdf...6867X211000005)

here a small example

Code:

ssc install f_able
ssc install frause
frause oaxaca, clear
** Important, you need to use fgen to create the variables (so the "construction info" is left behind)
fgen age_educ= age/educ

** add the original variables. If not meant to be included in your model add them with "o."
reg lnwage age_educ o.age o.educ

** need to include the name of all "constructed variables"
f_able age_educ, auto

** because of numerical derivatives you may need "noestimcheck"
margins, dydx(age) atmeans noestimcheck
margins, dydx(educ) atmeans noestimcheck

margins, atmeans expression(_b[age_educ]/educ)
margins, atmeans expression(-_b[age_educ]*age/educ^2)

Note that you will need to include the original variables with "o.", and possibly add -noestimcheck-
Im also providing you with the "manual" version by deriving the exact marginal effect analytically

You can also use this with other commands (like probit/logit). Although it will take a bit extra time because of the numerical derivatives

HTH
F

Just for fun, John Mullahy example

Code:

sysuse auto, clear
fgen lw_ll = log(weight)/log(length)
reg price lw_ll o.weight o.length
f_able lw_ll, auto
margins, dydx(*) noestimcheck

Last edited by FernandoRios; 02 Aug 2023, 13:41.

Comment

Saika Belal

Join Date: Apr 2016

Posts: 7
#6

03 Aug 2023, 14:33

Originally posted by FernandoRios View Post

here a small example

Code:

ssc install f_able ssc install frause frause oaxaca, clear ** Important, you need to use fgen to create the variables (so the "construction info" is left behind) fgen age_educ= age/educ ** add the original variables. If not meant to be included in your model add them with "o." reg lnwage age_educ o.age o.educ ** need to include the name of all "constructed variables" f_able age_educ, auto ** because of numerical derivatives you may need "noestimcheck" margins, dydx(age) atmeans noestimcheck margins, dydx(educ) atmeans noestimcheck margins, atmeans expression(_b[age_educ]/educ) margins, atmeans expression(-_b[age_educ]*age/educ^2)

Note that you will need to include the original variables with "o.", and possibly add -noestimcheck-
Im also providing you with the "manual" version by deriving the exact marginal effect analytically

You can also use this with other commands (like probit/logit). Although it will take a bit extra time because of the numerical derivatives

Hi Fernando ( @FernandoRios ),

Thank you for the very helpful suggestion. f_able is pretty much exactly what I was looking for. I also see that you wrote the package, thank you, it was much needed.

I’ve been partially able to use your example and adjust it for my needs, with two issues. I have found a workaround for one of them but the other I cannot solve. Would love your suggestions.
I am using Stata MP 14.2.

Here is a simplified version of my code:

Code:

cap drop lnpl fgen lnpl = ln(pay)/ln(length) probit taskaccept c.lnpl##i.female o.pay o.length /// , nolog vce(cluster phonenumber) cap drop __lnpl // workaround to address error 1 f_able , nlvar(lnpl) margins i.female , dydx(pay length) nochainrule noestimcheck numerical // error 2 pops up here di "end of code" // this line does not run

Here are the errors I get:

Error 1 (Found a workaround):
variable __lnpl already defined
r(110);

This error shows up before running the line with the margins command.
I address this by inserting code (cap drop __lnpl). It’s not ideal, but it works.

Error 2 (Did not find a solution):
something required
r(100);

This second error shows up after running the margins command. It stops the rest of the do-file from running. I have tried out your code with the frause Oaxaca dataset, and the same issue arises. Stata runs this line (margins, dydx(age) atmeans noestimcheck) and then shows the error (r(100) with the same error message). Stata does not run the remaining lines of code (i.e., margins, dydx(educ) atmeans noestimcheck, etc.)

I am guessing it is not a coding error since the same error shows up with your example code. Is this a Stata version issue? Can you please suggest a workaround? Is there a way that I can override the error since it does not seem to stop it from running the line of code it is referring to?

Thanks for your help.

Last edited by Saika Belal; 03 Aug 2023, 14:47.
Comment

Saika Belal

Join Date: Apr 2016
Posts: 7

03 Aug 2023, 14:49

I may have found a solution/workaround. Adding the option postafter margins.

Code:

cap drop lnpl
fgen lnpl = ln(pay)/ln(length)
probit taskaccept c.lnpl##i.female o.pay o.length ///
, nolog vce(cluster phonenumber)
cap drop __lnpl // workaround to address error 1
f_able , nlvar(lnpl)
margins i.female , dydx(pay length) nochainrule noestimcheck numerical post // added option post
di "end of code" // now this line of code does run!

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2487
#8

03 Aug 2023, 16:23

Are you using the version from Stata journal or from ssc?
also the syntax I used in my examples was an upgrade
trynusing that ine
Comment
Saika Belal

Join Date: Apr 2016

Posts: 7
#9

04 Aug 2023, 11:46

Originally posted by FernandoRios View Post

Are you using the version from Stata journal or from ssc?
also the syntax I used in my examples was an upgrade
trynusing that ine

I am using it from ssc (ssc install f_able). Which part of your code was an upgrade? That way I will know what to adjust.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2487
#10

04 Aug 2023, 12:13

ok I found the error.
It has something to do with some changes Stata 14 had with margins.

So please, open file f_able_epilog.ado
and modify its content with this:

"
program f_able_epilog
syntax [anything]
if missing("`anything'") local anything `e(nldepvar)'
foreach i of local anything {
qui:replace `i'=__`i'
qui:drop __`i'
}
end

That should fix the problem.
Comment

Announcement