New on SSC: - prodest - module for production function estimation

Arian G.

Join Date: Oct 2015

Posts: 8
#16

02 May 2017, 09:30

Thanks for your work Gabriele. So to clarify, what does the first stage residual option produce? Technical efficiency?
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 74
#17

02 May 2017, 17:18

Dear Arian,

thanks for the post.

The first stage residual option produces a new variable containing the residuals of the first stage. In particular, in OP terms, assuming the first stage like

Code:

y_i,t = alpha + w_i,t*beta + k_i,t*gamma + h(inv_i,t , k_i,t ) + epsilon_i,t

the FSresiduals option generates newvar like

Code:

newvar = y_i,t - \hat{y}_i,t = \hat{epsilon}_i,t

These have been used in the literature, for example by De Loecker and Warzynski, to correct the value added models.

I hope to have clarified.

Best,

Gabriele
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#18

22 May 2017, 15:02

Hey Gabriele,

I have one question for your package. How does on include fixed effects ( factor variables as controls)? I was hoping to simply include it as i.state (e.g. fixed effect for the location) within the control() option, but that generates an error.
Also, is there an option for error clustering?

Many thanks,

Samira

Last edited by Samira Barzin; 22 May 2017, 15:15.
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 74
#19

23 May 2017, 01:45

Dear Samira,

unfortunately at the moment the control() option does not support factor variables. However, it is easy to work around with factor variables and proceed with the estimation.

Using the example data in prodest, just run:

Code:

prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_investment) va met(op) poly(4) reps(40) id(id) t(year) control(i.year) ## NOT WORKING ## qui tab year, gen(dy) // generate one dummy per year prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_investment) va met(op) poly(4) reps(40) id(id) t(year) control(dy*)

I acknowledge that it is memory-consuming and not really immediate, we will try to fix this in the next version of the command.

I hope to have clarified.

Best,

Gabriele
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#20

24 May 2017, 13:20

Hey Gabriele,
Many thanks for your insight, very helpful and looking forward to the next version also!
I have two more questions:
- is there any option for error clustering?
- I would like to arrive at the TFP(residual), but my whole sample consists of firms of different industries, so I would like to generate sector specific coefficients and hence sector specific residuals, but the "predict" command is tricky, since if I run regressions on subsamples I cannot replace the values of the stored coefficients and residuals, do you have any idea for this?
Once again, thanks for taking the time to reply!
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 74
#21

25 May 2017, 17:57

Dear Samira,

sorry for the late reply. Briefly:

1) There is no option to cluster the standard errors, but you do not really need it, given that SEs are calculated using cluster bootstrap, which "automatically" solves the issue.
2) If I interpreted correctly, you want to estimate models and collect the TFP from various subsamples in a single variable. I'd do something like

Code:

g tfp = . forv g = 1 / G #total number of groups#{ tempvar tfp`g' prodest log_y if group == `g', free(log_lab1 log_lab2) state(log_k) proxy(log_materials) va met(lp) opt(dfp) reps(50) id(id) t(year) # change with your model predict `tfp`g'', residuals replace tfp = `tfp`g'' if group == `g' }

I acknowledge that it is suboptimal and wasteful, but if you do not have too many observations it may work well.

I hope to have clarified.

Best,

Gabriele
2 likes
Comment
Pingyu He

Join Date: Feb 2016

Posts: 3
#22

25 May 2017, 19:43

Dear Gabriele,

Thank you for making this fantastic package available to all stata users.
I encountered several difficulties when using prodest to estimate TFP:
It seems that it takes a lot time to do ACF estimation with translog function. Once done, I was not able to predict TFP. The error message says "variable var_11 not found".

Is attrition option available with ACF estimation? I tried to include attrition option when using ACF, but got exactly the same coefficients. I'm pretty sure that attrition exists in my dataset.

I'm attaching the log file for your reference. I tried to upload the dataset as well, but somehow the system says it is invalid.
Attached Files

log_file.smcl (8.1 KB, 1 view)
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 74
#23

26 May 2017, 04:18

Dear Pingyu,

I am glad that you found prodest useful.

1. I added the translog option to ACF recently, and it is not properly optimized, hence it is surely possibile that the routine takes a while to complete. Let me stress, though, that ACF+translog models imply the estimation of several parameters in the second step (which is implemented by GMM in prodest), therefore I expect it to be slower with respect to other methods - mostly with 150+k observations as in your case. Unfortunately, I have to confirm that predict is not going to work with the translog function: as I have previously written in some posts on prodest postestimation, predict is still a work in progress - indeed, it is still undocumented and users use it at their own risk :D . In particular, when using the translog option some temporary variables are created in the background accounting for polynomials in the translog production function - the var_## you see in the results' table - which are not stored after the estimation: hence, the predict raises the error you can see when searches for them based on the estimated parameters.

2. The attrition option is not available for ACF models, thanks for pointing that out (I will add a warning message when attempting to run ACF with attrition).

I hope to have clarified.

Best,

Gabriele
1 like
Comment
Pingyu He

Join Date: Feb 2016

Posts: 3
#24

26 May 2017, 08:26

Originally posted by Gabriele Rovigatti View Post

Dear Pingyu,

I am glad that you found prodest useful.

1. I added the translog option to ACF recently, and it is not properly optimized, hence it is surely possibile that the routine takes a while to complete. Let me stress, though, that ACF+translog models imply the estimation of several parameters in the second step (which is implemented by GMM in prodest), therefore I expect it to be slower with respect to other methods - mostly with 150+k observations as in your case. Unfortunately, I have to confirm that predict is not going to work with the translog function: as I have previously written in some posts on prodest postestimation, predict is still a work in progress - indeed, it is still undocumented and users use it at their own risk :D . In particular, when using the translog option some temporary variables are created in the background accounting for polynomials in the translog production function - the var_## you see in the results' table - which are not stored after the estimation: hence, the predict raises the error you can see when searches for them based on the estimated parameters.

2. The attrition option is not available for ACF models, thanks for pointing that out (I will add a warning message when attempting to run ACF with attrition).

I hope to have clarified.

Best,

Gabriele

Dear Gabriele,

Thank you for the clarification. It does help!

I guess we might be able to predict manually in the first situation by generating these variables ourselves and then calculating the residual using estimated coefficients, right?
For the second one, is it on your agenda to add attrition option to ACF model?

Also, I have one more question about the free variable. Should it be the log of the number of employment or the log of the total wage bills when we are estimating revenue-based TFP? I feel that the latter is more appropriate b/c all variables are measured in dollars, but somehow in the literature most of them use the number of employment. Do you have any suggestions?

As for control variables, how do they enter equations in your program? Are they part of g(.) and affect the choice of inputs/investments like in Amiti and Konings(2007)? or they simply serve as additional control variables for revenue?

I look forward to your suggestions!

Sincerely,
Pingyu
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#25

11 Jul 2017, 09:13

Hey Gabriele,

I have been working for a little bit with your command now, I quite like it.
I have stumbled upon 2 things though, maybe you know what the problem could be?

- I have noticed that when I run a production function with OLS, Levinsohn Petrin (LP) and Levinsohn-Petrin with ACF (LP-ACF) then then OLS and LP-ACF often produces relatively similar results (LP produces slightly different results), on the capital and interm. inputs, the labour coefficients are quite similar across all three. Have you had this observation also?
- when I compare the residual across the three methods OLS, LP, LP-ACF, I find that the OLS predicted residuals are a lot smaller, often after a couple decimal, whereas the residuals via LP and LP-ACF are normally between 2 and 10. I am not sure what this could be due to. This is particularly strange given that the coefficients are relatively similar, so the residuals should be different to this degree. Could that be possible due to issues with the constant not being included in LP and LP-ACF and hence the constant still enters the residual under prodest? Any other ideas?

(Sorry I cannot post code here for privacy reasons)

Looking forward to your insight (or anyone else's who is interested in this).

Samira

Last edited by Samira Barzin; 11 Jul 2017, 09:15.
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 74
#26

11 Jul 2017, 11:05

Dear Samira,

thanks for the feedback on prodest. I will try to answer the first question although I would need more elements (e.g., a logfile) to provide more concrete examples and explanations.

1) I don't know what you mean with "running a production function with OLS". I presume you mean run a linear regression of the form: log_y = log_l \beta_{L} + log_k \beta_{k} + log_m \beta_{m} + \epsilon. In that case, no surprise that there is a high chance to have qualitatively similar results to LP which, as you know, performs a very similar regression in the first stage in order to estimate the free variables' parameters.

2) I cannot answer this question without having at least a logfile to confront. Let me stress, though, that the constant does not enter the residuals in prodest, given that they are defined as residuals_{it} = log_y - \hat{\beta_{l}} log_l + \hat{\beta_{k}} log_k

I hope to have clarified.

Gabriele

Pingyu He
I apologize for having missed your post above: I will try to answer now, hoping not to be too late.

1) I actually added a couple of new features in the last prodest issue, with respect to post-estimation commands. You will now be able to predict <newvarname>, residual after the translog estimation. On top of that, if you are interested in estimating the elasticities of labor and capital inputs you may want to try the predict, parameters yielding a table with the estimated elasticities
2) At the moment is not in the immediate agenda, but I will give it a look as soon as possible
3) My sense is that total wage bills would be a more appropriate choice, but as usual in empirical applications it all depends on the data you have available!
4) Control variables are just additional regressors
1 like
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#27

11 Jul 2017, 13:15

Hey Gabriele,
Many thanks for your fast reply.

Let me clarify:
(1) yes I run basically this: logY=ß*logK+ßlogL+ßlogM (where ß is the different coefficient for each variable and M=materials); my point of confusion is basically that the OLS and LP results are somewhat different, but LP-ACF results are sometimes more similar to OLS than they are to LP results. So I was wondering if you also observed this somewhere previously?

(2)I run (copy from the do-file)
(OLS) reg lnY lnK lnL lnM
predict omega, resid
(LP) prodest lnY, free(lnL) proxy(lnM) state(lnK) poly(3) method(lp) reps(50)
predict delta, resid
(LP-ACF) prodest lnY, free(lnL) proxy(lnM) state(lnK) poly(3) method(lp) reps(50) acf
predict gamma, resid

So as mentioned the coefficients quite similar as mentioned in (1), but what I am more worried about is the dispersion of the residuals omega, delta, gamma given that the coefficients are similar, I would expect that omega, delta and gamma should be similar, however I find omega close to zero, delta around 10 and gamma around 4....so I wonder what is going on there? Would you know?

Samira
Thanks for your help!
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#28

11 Jul 2017, 15:51

Also, in cases where a constant is included the model becomes.
lnY=ß(0)+ß(K)lnK+ß(L)lnL+ß(m)lnM+e
where ß(X) represents the coefficient with X=0, K, L, M where X=0 is the constant.
So then the residual should be: lnY-ß(K)lnK-ß(L)lnL-ß(M)lnM=ß(0)+e
am I wrong here?
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 74
#29

12 Jul 2017, 04:12

Dear Samira,

I admit to be a bit confused here. There are a couple points I'd like to stress in order to clarify:

1) Following your definition of omega, it must be a zero-mean variable. As you know, it comes from the very definition of OLS that the variable of fitted residuals is zero-mean. The same does not hold for two-step models, obviously.
2) As for the definition of residuals, in the OLS case and using the piece of code you copied, it amounts to \omega = lnY - ß(K) lnK - ß(L)lnL - ß(M) lnM - ß(0), while in the prodest case it is \delta = lnY - ß(K_LP) lnK - ß(L_LP) lnL - ß(M_LP) lnM and \gamma = lnY - ß(K_ACF) lnK - ß(L_ACF)lnL - ß(M_ACF) lnM. Let me stress, though, that the estimation routine takes care of the constant during the estimation.

Hope to have clarified,

Gabriele
1 like
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#30

12 Jul 2017, 05:38

Hey Gabriele,
Thanks again for your help, I really appreciate this.

Yes you are right regarding (1), absolutely. I was just confused as the coefficients were very similar, and then residual was not, I thought that shouldn't be. When I modify the OLS calculation to constant+residual, then this correlates closely as expected, so I think the issue was here.

(2) I am still a bit puzzled to the similarity of OLS and LP-ACF, both in coeffiencts and residual (including the constant under OLS as mentioned under (1)). For example when I correlate the residuals then I have a correlation of 0.9966 with OLS and LP-ACF, but only a correlation of 0.4285 of LP and LP-ACF. I am puzzled that LP-ACF appears to be a lot closer to OLS than LP results, any idea?
(This appears to be driven for the predominant majority through the coefficient on materials, which is a lot higher under LP-ACF then LP)

Last edited by Samira Barzin; 12 Jul 2017, 05:41.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment