New on SSC: - prodest - module for production function estimation

Gabriele Rovigatti

Join Date: Sep 2016

Posts: 73
#31

12 Jul 2017, 06:46

Dear Samira,

I don't have a definitive answer for that. The correlation appears to be very high, and it might be a concern, but I cannot provide you with an explanation for that without seeing the actual data the estimation is performed on. Moreover, to my knowledge there is no prior in the literature of LP parameters being more similar to OLS than ACF's, or the opposite, hence I do not find your result problematic from a statistical point of view.

Best,

Gabriele
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#32

12 Jul 2017, 06:54

Hey, thanks for this.
Sorry I cannot share the data unfortunately. I think there is a misunderstanding LP-ACF and OLS are close, LP is slightly different, especially for materials.
In the literature I have found in the Van Beveren 2010 paper that she finds LP and OLS to be relatively close with a correlation of 0.9262 between the residuals of OLS and LP (ACF corrected results are not included in that paper)
So for me OLS and LP are not that close with a 0.49 correlation but OLS LP-ACF are close with 0.9966;
the most puzzling is the materials coefficient with is around 0.1 under LP and at around 0.7 under LP-ACF and OLS
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 73
#33

12 Jul 2017, 11:39

Dear Samira,

could you please provide a full reference to the paper as I would like to have a proper look at it? By the way, I am not sure that correlations between the residuals should be the test you'd want to use to assess the goodness of your estimates - unless you're trying to replicate van Beveren 2010's results.

Please note also that, as stressed in my working paper here the ACF routine shows some strong dependency on routine starting values, hence you may want to try different optimizers in order to assess the 'stability' of the estimates.

Best,

Gabriele
1 like
Comment
Samira Barzin

Join Date: Apr 2016

Posts: 21
#34

13 Jul 2017, 06:27

Hey Gabriele,
Thanks for your reply.
I basically tested the correlation to spot simple characteristics, especially since I had this issue with similar coefficients and dissimilar residuals, but thats sorted now.
The paper is: Van Beveren, I. (2012; first publ. 2010). Total Factor Productivity Estimation: A Practical Review. Journal of Economic Surveys 26(1), pp. 98 - 128.
DOI: 10.1111/j.1467-6419.2010.00631.x
Comment
Malick Diallo

Join Date: Jul 2017

Posts: 1
#35

22 Jul 2017, 08:24

Dear Gabriele,
I would like to know if it is possible to control for fixed effects with the structural estimators developped by Olley and Pakes (1996) and Levinsohn and Petrin (2003) while using prodest

Best,
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 73
#36

24 Jul 2017, 03:41

Dear Malick,

I am afraid you cannot in prodest, unless you generate a set of dummies - i.e., one for each firm in the dataset - and specify the dummies within the control() option in prodest. Let me stress, though, that this routine is extremely heavy and paritularly slow. Below please find an example (I used the Stata command xito generate the dummies) which you may replicate using the 'example' section of help prodest. By the way, I did not read it but apparently this paper deals with the issue you have raised.

After uploading the data:

Code:

keep if _n < 1000 // keep only the first 1000 observations xi: prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_investment) control(i.id) va met(op) poly(2) reps(5) id(id) t(year) maxiter(50) // please note that the xi command will generate a dummy per panel

I hope to have clarified,

Gabriele
Comment

Pingyu He

Join Date: Feb 2016
Posts: 3

#37

01 Sep 2017, 07:47

Originally posted by Gabriele Rovigatti View Post

Code:

ssc install prodest

prodest is a new and comprehensive Stata module for production function estimation based on the control function approach. It includes Olley-Pakes (OP 1996), Levinshon-Petrin (LP 2003), Wooldridge (WRDG 2009) and Ackerberg-Caves-Frazer (ACF 2015) estimation techniques, plus a brand new methodology (Mollisi-Rovigatti, MR forthcoming) in order to better deal with short panels.
Its basic usage is similar to that of existing modules like opreg or levpet, but adds many features to control the optimization procedures and address estimation issues - gross output vs. value added, endogenous variables, attrition in the data. Type

Code:

help prodest

for a complete overview of options and features of the program, plus some clickable examples.

prodest is an ongoing project and the current version (1.0.2) is not meant not be definitive. Therefore suggestions, impressions and bug reporting are more than welcome.

Below some examples of the program usage

Code:

. insheet using https://raw.githubusercontent.com/GabBrock/prodest/master/prodest.csv, names clear
(8 vars, 1,758 obs)


. prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_investment) va met(op) poly(4) opt(
> bfgs) reps(40) id(id) t(year)
.........10.........20.........30.........40


op productivity estimator

Dependent variable: value added Number of obs = 1758
Group variable (id): id Number of groups = 386
Time variable (t): year
Obs per group: min = 1
avg = 4.6
max = 12

------------------------------------------------------------------------------
log_y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
log_lab1 | .2602104 .0454651 5.72 0.000 .1711005 .3493204
log_lab2 | .1609835 .0492148 3.27 0.001 .0645242 .2574428
log_k | .2963724 .0824046 3.60 0.000 .1348624 .4578824
------------------------------------------------------------------------------

. prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_investment) va met(op) acf opt(nm)
> reps(50) id(id) t(year)
.........10.........20.........30.........40.........50


op productivity estimator
ACF corrected
Dependent variable: value added Number of obs = 1758
Group variable (id): id Number of groups = 386
Time variable (t): year
Obs per group: min = 1
avg = 4.6
max = 12

------------------------------------------------------------------------------
log_y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
log_lab1 | .2161255 .0995254 2.17 0.030 .0210593 .4111917
log_lab2 | .2362255 .0637043 3.71 0.000 .1113673 .3610836
log_k | .4552823 .0930899 4.89 0.000 .2728295 .637735
------------------------------------------------------------------------------

. prodest log_y, free(log_lab1 log_lab2) state(log_k) proxy(log_materials) va met(lp) opt(dfp) reps
> (50) id(id) t(year)
.........10.........20.........30.........40.........50


lp productivity estimator

Dependent variable: value added Number of obs = 1758
Group variable (id): id Number of groups = 386
Time variable (t): year
Obs per group: min = 1
avg = 4.6
max = 12

------------------------------------------------------------------------------
log_y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
log_lab1 | .2262627 .0386052 5.86 0.000 .1505978 .3019276
log_lab2 | .1518426 .0389601 3.90 0.000 .0754821 .228203
log_k | .2155473 .0409458 5.26 0.000 .1352951 .2957996
------------------------------------------------------------------------------

Dear Gabriele,

Thank you for writing this user-friendly command to estimate various versions of production function.

I have a few questions related to the control variables and residuals. It seems not clear to me how control variables enter the first stage equation. I'm assuming that the control variables are part of the input choice function (if material inputs are used as the proxy variable), then we can express productivity as a function of inputs, control variables and capital. Under this circumstance, the final TFP estimation under a revenue-based Cobb-Douglas function should be the difference between first-stage fitted value and coefficient matrix times (free, state, proxy variables). That is, TFP = log(revenue) - fsresiduals - b1*log(L) - b1*log(K) - b1*log(M). However, it doesn't yield the same result as the post estimation command "predict TFP, residual" does. I hope you can clarify this for me. Thank you very much!

For your reference, my command is as follows:

prodest log_revenue, free(log_L) proxy(log_materials) state(log_K) method(lp) act control(export_dm type_dummy) id(firm) t(year) reps(5) poly(3) seed(1) fsresiduals(epsilon)
predict TFP, residual
/*maunally generate TFP*/
matrix b = e(b)'
gen TFP_manual = log_revenue - epsilon - (b[1,1] * log_L + b[2, 1] *log_K + b[5, 1]*log_materials)
compare TFP TFP_manual

Thanks,
Pingyu

Comment

Gabriele Rovigatti

Join Date: Sep 2016

Posts: 73
#38

01 Sep 2017, 09:44

Dear Pingyu,

thanks for the message: questions like yours help us improve the command.

Control variables are included in both the first and the second stage estimation. In particular, they enter linearly the first stage regression because this option has been primarily intended for discrete variables, which could not be easily managed as state variables.

To address the second query, let's define the first-stage residuals in the LP case - using your notation - as

Code:

\hat{epsilon} = log_revenue - \hat{\beta}_{L} * log_L - \tilde{\beta}_{K} *log_K - \tilde{\beta}_{M} * log_materials - \tilde{B} * \Phi(log_K, log_materials) [- \tilde{\beta}_{ctrl} * controls]

where \hat{.} indicates "final" estimates (free variables), \tilde{.} indicates "temporary" estimates (parameters for state and proxy variables are then estimated in the second stage) and Phi(.) indicates a n^{th} degree polynomial. Given that, log_revenue - epsilon becomes a function of second- and first-stage parameters for state and proxy variable (and their interactions). Hence, no surprise that using this difference within the 'manual' version of the residuals yields a different result than predict, since the definition we use in predicting omega reads

Code:

\hat{\omega} = log_revenue - \hat{\beta}_{L} * log_L - \hat{\beta}_{K} * log_K - \hat{\beta}_{M} * log_materials - \hat{\beta}_{ctrl} * controls

The same reasoning - with the differences implied by the functional form of the first- and second-stage equations - applies to ACF models.

I hope to have clarified.

Best,

Gabriele
Comment
Agnieszka Matulska

Join Date: Oct 2017

Posts: 6
#39

14 Oct 2017, 07:25

Dear Gabriele,

I want to estimate TFP and I applied levpet and prodest met(lp) commands in STATA. The results differnetiated to each other. Is it possible? The estimated coefficients (enclosed) are the same for labour variable and different in case of materials usage and capital variable. I can see that the number of enterprises (groups) are different. In prodest only enterprises withi no zero variables are taken into account while in levpet there is the number of all variables.

Thank you for help in advance
Agnieszka
Attached Files

SECTIONS CDE LEVPET PRODEST LP.docx (63.6 KB, 1 view)

New on SSC: - prodest - module for production function estimation - Statalist

https://www.statalist.org

ssc install prodest prodest is a new and comprehensive Stata module for production function estimation based on the control function approach. It includes
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 73
#40

14 Oct 2017, 10:44

Dear Agnieszka,

thanks for the message.

levpet and prodest differ in the estimation routine, since the former uses nl to perform the second-stage estimation, whereas we use a GMM routine. Moreover, as you remarked levpet reports the overall number of groups, while in prodest we just report the number of groups the estimation is performed on.

Having said that, the differences in your case seems unlikely to be due to the - minimal - differences in optimization routines. I suspect that it could be due to either the order of polynomial approximation (you may want to try poly(2) as a benchmark result): in this case, I would recommend to test the stability of results for higher order polynomials - in prodest, you may use up to the 6th order. It will slow down the estimation routine, but will provide more precise estimates. It is less likely, but you may want to try the prodest command changing the optimizer (with dfp or nr) to see whether results will change.

I hope to have clarified,

Gabriele
Comment
Agnieszka Matulska

Join Date: Oct 2017

Posts: 6
#41

14 Oct 2017, 14:55

Dear Gabriele,

Thank you for your help. I'm following your reccomendations tomorrow's morning.

Best,
Agnieszka
Comment
Eduardo Salas

Join Date: Oct 2017

Posts: 1
#42

24 Oct 2017, 08:19

Dear Gabriele,

Thanks for your work.
I have been working with your command and I'm interested in estimating the elasticities of labor, capital and materials inputs under a translog PF with revenue as the dependent variable. I have tried using predict, parameters but that only reports elasticities of labor and capital. Do you have any idea how to get the elasticity of materials as well?

Thank you!
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 73
#43

24 Oct 2017, 13:29

Dear Eduardo,

thanks for your message.

You found a bug in the predict_p.ado command for prodest postestimation: we did not account for gross output models in translog ACF estimation, thanks for pointing that out!

We are in the process of issuing a new version of the command - hopefully in the next few weeks - however, I enclose a debugged version of the predict_p.ado file (!CAREFUL, NOT SERIOUSLY TESTED!) in case you may want to give it a try.

Let me know if that works for you.

Best,

Gabriele
Attached Files

prodest_p.ado (4.3 KB, 1 view)
Comment
Han Ng

Join Date: Dec 2016

Posts: 20
#44

14 Jan 2018, 19:53

Dear Gabrielle,

Thanks for writing prodest.

I was recently doing some TFP estimation using the old opreg and lvpet commands. I was also trying out prodest in my project. The dataset that I am using was implemented successfully in the old opreg and lvpet commands, however prodest says that my datafile has insufficient observations.

Have you ever encountered such a problem and do you have any suggestions? Could this be due to using an unbalanced panel data?

Best,
Han
Comment
Gabriele Rovigatti

Join Date: Sep 2016

Posts: 73
#45

15 Jan 2018, 04:21

Dear Han,

thanks for the feedback.

Unfortunately, I am not able to provide an answer to such an issue without working on the data and/or looking at the logfile of the error. Could you provide a minimal working example or sending me the logfile with the error reproduced? You can either do it here or sending it to [email protected]

Best,

Gabriele
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment