No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • New on SSC: -sfkk- module to estimate endogenous stochastic frontier models in the style of Karakaplan and Kutlu (2015)

    Special thanks to the Amazing Kit Baum, -sfkk- is now available on SSC. You can install -sfkk- from SSC by entering the following command in Stata:
    ssc install sfkk
    sfkk fits endogenous stochastic production or cost frontier models following the methodology provided by Karakaplan and Kutlu (2015). sfkk provides estimators for the parameters of a linear model with a disturbance that is assumed to be a mixture of two components: a measure of inefficiency which is strictly nonnegative and a two-sided error term from a symmetric distribution. sfkk can handle endogenous variables in the frontier and/or the inefficiency, and the sfkk estimates outperform the standard frontier estimates that ignore endogeneity. See Karakaplan and Kutlu (2015) for a detailed explanation of their methodology and empirical analyses.

    Karakaplan and Kutlu (2017) provides the econometric methodology that -sfkk- is based on. This paper is published in the Economics Bulletin: Karakaplan (2017) provides the -sfkk- program itself with some examples. This paper is published in the Stata Journal:

    -sfkk- help file provides several examples and they can be viewed by typing the following command in Stata after installing the -sfkk- package:
    help sfkk
    Below is an example of an -sfkk- output:
    . use, clear
    . sfkk y x1 x2 z1, prod u(z2) en(z1 z2) i(iv1 iv2) delve nicely header beep compare
    18 Jun 2015 12:09:18
    Dependent Variable: y
    Frontier Variable(s): x1 x2 z1
    U Variable(s): z2
    W Variable(s):
    Endogenous Variable(s): z1 z2
    Excluded Instrument(s): iv1 iv2
    Exogenous Variable(s): iv1 iv2 x1 x2
    Delving into the problem...
    initial:       log likelihood =  709.21899
    rescale:       log likelihood =  709.21899
    rescale eq:    log likelihood =  709.21899
    Iteration 0:   log likelihood =  709.21899  
    Iteration 1:   log likelihood =  713.90317  
    Iteration 2:   log likelihood =  713.98024  
    Iteration 3:   log likelihood =  713.98037  
    Iteration 4:   log likelihood =  713.98037  
    Analyzing the exogenous comparison model...
    Table: Estimation Results
                                  Model EX             Model EN    
    Dep.var: y                                                      
    Constant                  0.475***  (0.017)    0.631***  (0.032)
    x1                        0.215***  (0.019)    0.186***  (0.031)
    x2                        0.089***  (0.021)    0.132***  (0.033)
    z1                       -0.355***  (0.022)   -0.747***  (0.111)
    Dep.var: ln(sigmau²)                                            
    Constant                 -3.786***  (0.602)   -7.096***  (0.828)
    z2                      -19.715    (10.599)    8.207***  (1.468)
    Dep.var: ln(sigmav²)                                            
    Constant                 -4.236***  (0.073)                    
    Dep.var: ln(sigmaw²)                                            
    Constant                                      -4.819***  (0.177)
    eta1 (z1)                                      0.457***  (0.114)
    eta2 (z2)                                      0.664***  (0.057)
    eta Endogeneity Test                          X2=155.15  p=0.000
    Observations                    500                  500        
    Log Likelihood                 342.86               713.98      
    Mean Prod Efficiency           0.9821               0.9152      
    Median Prod Efficiency         0.9946               0.9364      
    Notes: Standard errors are in parentheses. Asterisks indicate
    significance at the 0.1% (***), 1% (**) and 5% (*) levels.

  • #2
    Special thanks to the Amazing Kit Baum, -sfkk- version 1.0.1 is now available on SSC.

    This version fixes a default matsize related bug that caused -sfkk- help examples to abort with the "could not find feasible values r(491);" error message. This version also includes a couple of minor updates such as the header option now shows if constants are included in the model or not.


    • #3
      Special thanks to the Amazing Kit Baum, -sfkk- version 1.0.2 is now available on SSC.

      This version fixes a minor bug that caused -sfkk- to abort with an error message when header option is specified with no variables in Uhet other than the constant. An example of such a model is included in the help file with this version.
      Last edited by Mustafa Ugur Karakaplan; 16 Sep 2015, 21:18.


      • #4
        Special thanks to the Amazing Kit Baum, -sfkk- version 1.0.4 is now available on SSC.

        You can update your -sfkk- package from SSC by entering the following command in Stata:
        ssc install sfkk, replace

        You can check the version of your -sfkk- package by entering the following command in Stata:
        sfkk, ver


        • #5
          Dear Dr. Karakaplan,
          First let me thank you and your coauthors for your work, that is particularly relevant for my current research!
          I’m estimating a SF assuming that there is an endogenous variable (durat) affecting inefficiency (u). My main interest is assessing the impact of such a variable on the inefficiency of firms
          I’m using the command
          sfkk o_ric i_empl i_kap i_rawm if year==2007 , prod u(durat) en(durat) i(iv_durat) delve nicely header beep compare

          and I obtain

          Model EX Model EN
          none none
          Dep.var: o_ric
          Constant 2.748*** (0.048) 2.821*** (0.038)
          i_empl 0.353*** (0.008) 0.359*** (0.010)
          i_kap 0.085*** (0.005) 0.086*** (0.006)
          i_rawm 0.521*** (0.004) 0.515*** (0.006)
          Dep.var: ln(sigmau2)
          Constant -2.987*** (0.287) -1.701*** (0.125)
          durat -0.080** (0.028) -0.142*** (0.014)
          Dep.var: ln(sigmav2)
          Constant -2.159*** (0.033)
          Dep.var: ln(sigmaw2)
          Constant -2.276*** (0.032)
          eta1 (durat) -0.008*** (0.001)
          eta Endogeneity Test X2=116.15 p=0.000
          Observations 4278 4268
          Log Likelihood -1602.98 -1.7e+04
          Mean Prod Efficiency 0.8949 0.8546
          Median Prod Efficiency 0.8980 0.8659
          Notes: Standard errors are in parentheses. Asterisks indicate
          significance at the 0.1% (***), 1% (**) and 5% (*) levels.

          If I well understand the output, durat seems to (negatively) affect the variance of the inefficiency,
          My problem is: can I retrieve the impact on the (mean) level of u, analogously to what is possible when estimating the Battese and Coelli, 1995, model?

          Thank you,

          Last edited by Sabrina Ruberto; 20 Jul 2016, 01:04.


          • #6
            Dear Sabrina, thank you for posting this question. I have the similar problem for my research and it would be great to have the way to retrieve the impact of X on the mean level of u. I hope that the Dr. Karakaplan can indicate us the way to get it"


            • #7
              Hello Sabrina and Nataly,

              Thank you so much for contacting me with this question. I am so happy to see that you are finding -sfkk- useful for your research. What you are asking for is a highly requested feature that I am working on. I actually received so many emails about this request that I will make sure this feature is a part of the next released version of -sfkk-. It is possible to retrieve the impact of X on u with the current version of -sfkk- but it is not automatic. It would require 4-5 lines of coding. You may want to look at my SEJ paper about how to calculate the marginal effects of inefficiency variables on u.

              Gronberg, T. J., Jansen, D. W., Karakaplan, M. U. and Taylor, L. L. (2015), School district consolidation: Market concentration and the scale-efficiency tradeoff. Southern Economic Journal, 82: 580–597. doi: 10.1002/soej.12029

              The regression results on Sabrina's post is hard to read so I am copying below the image she sent to me in an email. In her model, durat variable is endogenous as eta Endogeneity Test indicates the endogeneity. In Model EN, durat's effect on inefficiency is larger and significant in absolute terms compared to Model EX. By using the efficiency(effvar[, replace]) option of -sfkk-, firm specific efficiency variables can be created for Model EN and Model EX. You can check the help file of -sfkk- and the examples there to learn more about "efficiency" option. Once you create the efficiency variables, you would need to write 3-4 lines of code to retrieve the impact of durat on the mean level of u.

              Click image for larger version

Name:	1E9356E438AD40E5A668C163A2C17D41.png
Views:	1
Size:	128.2 KB
ID:	1355208


              • #8
                Dear Dr. Karakaplan,

                First of all, thank you so much for this great work, it makes it very easy to address the most difficult problem in stochastic frontier models. In estimation of stochastic frontier, i have the following questions, i would be very very grateful to you if you could answer:

                1. i have 7 variables in the frontier, out of which i think 6 are endogenous (test also confirms it) so i have 6 IV's. The command works fine so as long as i have up to three endogenous variables and therefore three IVs, but it wont support if i add the fourth, fifth and sixth endogenous variables and their respective IVs and gives me an error (initial values are not feasible). I tried to use delve and (or) initial values but still getting the same error messages. I tried looking at my data and IV's and how they are constructed again and again, but the error won't go away. The Exogenous model with the same specification works fine too. Have you tried sfkk with 4+ Endogenous variables?
                2. If estimating translog, are the square terms and intraction terms also endogenous for endogenous variables? i.e. should i use the squared of IVs and interaction of IV too-just like other variables in frontier?

                Thank you very much in advance.
                Emal Jan


                • #9
                  Hello Emal,
                  Thank you very much for the questions. I am glad that you use sfkk for your research. To answer your questions:

                  1- Older versions of -sfkk- had no limit for the number of endogenous variables. The most recent version limits the number to 3 due to some core changes in the program. The next update will remove this limit to allow more endogenous variables. For now, you can use the previous version of sfkk. If you cannot find it, please send me an email and I will send you the package.

                  2- There are two sfkk options (exogenous and leaveout) that you can use to deal with endogenous square and interaction terms. For example if x and x2 (x square) are endogenous, you can write: ....., en(x) i(xiv) leave(x2) in your options to instrument for the main endogenous variable and leave the square (or interactions) out of the included instruments list. In the help file, you can find examples of these two options.



                  • #10
                    Dear Dr. Karakaplan,

                    Thank you very much for your prompt response, i really really appreciate your support.

                    When will sfkk be updated to allow for more endogenous variables? Really looking forward to it. My research depends so much on it. Yes please, I will send you an email to request the older version of the SFKK, and will be looking forward to another update of the sfkk.

                    Thank you very much again!
                    My best regards,


                    • #11
                      Dear Dr. Karakaplan,
                      First I wish to sincerely thank you for your work! I would be very grateful if you could answer the following question
                      To estimate my stochastic production frontier, I run:
                      sfkk o_Y i_L i_K, prod u(x1 x2 x3) en(x1) i(iv_x1) delve nicely header compare

                      According to my results, the x1 coefficient is negative, thus x1 seems (negatively) affecting the variance of the inefficiency.

                      Since the inefficiency u follows a half-normal distribution E(u)=sqrt(2 variance)/sqrt(pi), may I conclude that x1 is negatively affecting also the mean level of u?
                      Please consider that I'm not interested in the magnitude of such an effect, only in the sign.

                      Thank you very much in advance.


                      • #12
                        Hello Maria,

                        Your interpretation of the sign of x1 on the mean level of u seems correct.



                        • #13
                          Dear Dr Mustafa Karakaplan,

                          Firstly, I would like to congratulate you for your great achievement on introducing the sfkk in Stata and your wonderful paper “handling endogeneity in SFA models”.

                          I am currently using your method. I had a few quick question and I would really appreciate your help. So here goes the question:

                          I estimated my model and here are the results of the endogeneity test:

                          eta1(cdi) -0.380*** (0.076)
                          eta Endogeneity Test chi2=25.13 (0.000)

                          My questions:
                          1). What does eta1(cdi) mean? How do you interpreted the coefficient, sign, and the significance of the coefficient? Please note that "cdi” is my endogenous variable
                          2). Referring to your paper ( Karakaplan and Kutlu (2017) from the Economic bulletins), the endogeneity test is a test of the “component of the "eta η” term, I read the paper but I still can’t figure it out, what exactly are those components? Basically, I would like to describe the test in my paper in details to make it easier for the readers.
                          3). Based on the results above, I concluded that the η test detects endogeneity in the model and that correction is necessary, Is that correct? Does the null hypothesis state that: endogeneity is not detected, thus a rejection means endogeneity is detected? Am I right?

                          4). For now sfkk only allows the half-normal distributions and NOT truncated-normal yet? True?

                          I would highly appreciate your help and guidance.

                          Many thanks and best wishes,


                          • #14
                            Hello Hayat,

                            I am glad that you find my methodology and Stata command useful for your research. Below are some of my papers that would help you understand what eta endogeneity test results mean. You'll find the description of the eta tests as well as plenty of examples in these papers:

                            Karakaplan, Mustafa U. and Kutlu, Levent (2017) "Handling Endogeneity in Stochastic Frontier Analysis." Economics Bulletin
                            Karakaplan, Mustafa U. (2017) "Fitting Endogenous Stochastic Frontier Models in Stata." The Stata Journal
                            Karakaplan, Mustafa U. and Kutlu, Levent (2018) "School District Consolidation Policies: Endogenous Cost Inefficiency and Saving Reversals." Empirical Economics
                            Karakaplan, Mustafa U. and Kutlu, Levent (2017) "Endogeneity in Panel Stochastic Frontier Models." Applied Economics

                            Answers to your questions:
                            1) eta1 (cdi) is a part of eta test outlined in Karakaplan and Kutlu (2017). eta enters the log likelihood function in equation 8 as the coefficient in front of the correction term. "The significance of the k'th component of eta indicates that x_ik (the k'th component of x_i) and v_i are correlated. Hence, a particular variable of interest is endogenous if the corresponding component of eta term is significant." An individual eta indicates the explanatory power of the correction term specifically applied to that particular variable with the IVs you chose. Your individual eta term of cdi is negative and significant at the 0.1% level. The sign indicates in which direction the correction is applied to the model and the significance indicates if the specific correction term is significantly identifying the endogeneity in the model.
                            2) The answer above answers this question too.
                            3) Yes, you are right. Your eta endogeneity test results show that there is endogeneity in your model and correction for the endogeneity of cdi is needed. In order to see in detail how well your excluded IVs are explaining cdi, you can remove the "nicely" option from your command line.
                            4) Yes. sfkk only allows the half-normal distribution. You can modify sfkk program itself to include truncated-normal but I would not expect a major change in your results if you change the distribution. There is research about that.

                            Last edited by Mustafa Ugur Karakaplan; 02 Aug 2018, 15:48.


                            • #15
                              Dear Dr Musafa,

                              Thank you very much for sharing the papers and your detailed answers. This helps a lot.
                              Best wishes.