Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Missing Values for Gravity Model Trade, PPML, Panel Regression

    Hello everyone!

    I am an undergraduate student completing my final thesis on "Static and Dynamic Effects of SADC & COMESA" I have prepared my dataset which consists of period between 1995 to 2015 with 25 countries to be used in Stata (Unfortunately I got stata version 10 which I have updated), however, my dataset is filled with many missing values for trade.
    I was told these are 'zero trade flows' so I have read the paper on 'The Log of Gravity' by J. M. C. Santos Silva and Silvana Tenreyro(2006) and was very well intrigued by the use of PPML. I am then planning to first run a panel regression then use either Heckman or Tobit Estimation for zero trade flows and lastly do the PPML to provide a detailed explanation.

    My question is therefore, how to we account for the missing values in trade? Is it by filling them with '0' or '1'? If I fill them with '0', when I shall run the regression, won't stata
    drop it or won't it cause me any problems, which applies same for filling them with '1'?
    Also, I am not well versed with the stata commands to use, can anyone help with finding the correct stata commands?

    Thank you very much for your time and patience.
    Apologies for any kind of inconveniences caused.

  • #2
    Urvashi:
    provide that you are 100% sure that missing values are, indeed, zeros, you can try:
    Code:
    replace trade=0 if trade==.
    If that were not the case, you have to consider how to deal with them, especially if their missingnes is informative (see -help ipolate- and -help mi-).

    As far as the regression model is concerned, please note that Stata applies by default listwise deletion for all the observations with at least one missing values in any variable; hence, missing values are ruled out from any statistical procedure.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Dear Urvashi,

      The standard practice is to replace those missing observations with zeros. Not all of the missing observations are true zeros, sometimes they correspond to flows that are too small to be reported, in other cases they may be true missing observations. Anyway, as I said, the standard approach is to replace them with zeros.

      Of course, I recommend that you stick to the PPML approach because all the other methods are totally unreliable. To estimate by PPML, you can use the -ppml- command that is available form SSC. The help file contains examples of how to use it and our webpage has the data and code used to replicate the results in the "Log of Gravity" paper.

      Best wishes,

      Joao

      Comment


      • #4
        Hi,
        I am n working on my master thesis and also new to Stata. I am working with gravity model of trade and as trade data is badly reputed to have missing values, I want to drop "imports_var" (dependent variable) if it appear less that 3 times. I found this command

        bysort imports: drop if _N==2

        But I want to drop it if its conditional.
        I tried this commadbr

        sort importer exporter year commoditycode
        by importer exporter year commoditycode : drop imports if _N<3

        I want to drop imports if it appears less than two times in a group of importer exporter year commoditycode.
        but this is not working in my case.
        Can anyone help me please?

        Thank you very much

        Comment


        • #5
          You cannot -drop- a variable under conditions of this kind. You can -drop- observations when there are fewer than three of them. Or, you can -replace- the values of a variable with missing values under such conditions. But a variable is either present or absent in a data set: if you -drop- it, it disappears. No variable can exist in some observations and not in others.

          So what do you want to do? Do you want to drop all observations for importer-exporter-year-commoditycode combinations that have fewer than 3 associated observations? Or do you want to set the variable imports to missing value in such observations? Or something else?

          Comment


          • #6
            Dear Sir Carlo Lazzaro,

            Thank you very much! Actually, since my thesis on African Trade, my supervisor told me there is a certain probability that the missing values are zero trade flows since African trade are quite lower compared to other regions. I contacted the investigators from where I got the database, till now there has been no reply. That is why I am very well confused.

            Nonetheless, Thank you very much for the useful replies

            Comment


            • #7
              Dear Sir Santos Silva,

              I cannot believe I am actually getting replies from the author of the paper 'The log of Gravity'. Thank you very much for your insightful replies. I actually wanted to do the analysis in two parts, first with Panel Regression through OLS (attaching ln/log) then with Heckman estimates and PPML. Do you think it is correct? Or will it be hard or will lead to unambiguous conclusions? Being just a beginner in Stata and doing my first thesis, I want it to be at a good level with econometric analysis but I do not know how to interprete and bring on conclusions.


              So basically, I shall replace the missing values with zeros, and start with the Panel OLS before comparing it with PPML estimates? or I use PPML directly?


              Thank you very very much!
              Many thanks.
              Last edited by Urvashi Bolaky; 22 Mar 2017, 02:01.

              Comment


              • #8
                Dear Urvashi Bolaky (and rasheed sufyan)

                For the missing data issue, i would recommend you use the BACI database of the CEPII, It is constructed by applying econometric techniques to the UN COMTRADE data, which provides corrections to reported trade flows and eliminates all missing trade data. You can find more information about their methodology in Gaulier & Zignago 2010. See more here on Baci database. Note that, to get access to Baci database you need to have full subscription to Uncomtrade data. This is paid, but possibly your university or supervisor is already subscribing it.

                For the estimator, as shown in the Log of Gravity paper, the OLS produce biased and even inconsistent results. For the Heckman style estimates, Mr Joao has a paper in 2015. In short, results obtained by using Heckman (as in HMR paper) are unreliable. So, PPML is the way to go. After the paper of Head and Mayer (2014) some studies also used Multinominal PML (MPML) as a robustness check. You can find broad discussion about the implementation of MPML in this 10 days old paper. As you will see in the paper, MPML and PPML produce the same estimates, so i do not think that it worth to do it.

                Hope this helps
                Dias

                Reference

                Gaulier, G. & Zignago, S. 2010. Baci: international trade database at the product-level (the 1994-2007 version).
                Head, K. & Mayer, T. 2014. Gravity Equations: Workhorse, Toolkit, and Cookbook. Handbook of International Economics.
                Silva, J. S. & Tenreyro, S. 2015. Trading partners and trading volumes: implementing the Helpman–Melitz–Rubinstein model empirically. Oxford bulletin of economics and statistics 77(1): 93-105.
                Last edited by Said Jafar; 22 Mar 2017, 08:07. Reason: reference

                Comment


                • #9
                  Dear Sir Dias Rafaj

                  Thank you very much for those detailed explanations and really supportive materials! Thank you so much! It means a lot. I will surely going to contact my supervisor for my missing values.
                  Coming to the papers, I would like if you could recommend me what to analyse? Actually I was certain to use PPML estimator for my gravity model of trade but then I have seen many papers drew conclusions via performing OLS and PPML. Now with the some recent developments concerning MPML, I really do not know. Whether I stick to only PPML or I perform the analysis of OLS or MPML as well. Laying the fact that my objective is to analyse static and dynamic effects of trade in SADC and COMESA, looking at their trade creation, trade diversion, intra trade level and dynamic effects.

                  However, upon all these, I want you to know that I am very grateful for all the help.

                  Regards,
                  Urvashi.

                  Comment


                  • #10
                    Dear All,

                    The paper @Dias Rafaj refers to may be only 10 days old, but the result is very old; see

                    Guimaraes, P. (2004), "Understanding the multinomial-Poisson transformation," Stata Journal, 4(3), 265-273.

                    So, PPML continues to be the way to go

                    Best of luck with your work, @Urvashi Bolaky.

                    Joao

                    Comment


                    • #11
                      Dear Joao Santos Silva,


                      Thanks for this. It is always nice to learn form you


                      Comment


                      • #12
                        Dear Urvashi Bolaky,

                        In any case, the results from PPML are the most reliable. If you really want, you can do OLS and Heckman as well, but do not forget that results from these are not consistent. So do your discussion based on PPML.

                        All the best,
                        Dias

                        Comment


                        • #13
                          thanks Said Jafar and Joao Santos Silva

                          I have some general questions about the BACI database and other sources of bilateral trade such as IMF and UN Comtrade via WITS.I apologize if my query is not in the suitable format or should be directed elsewhere, but I am considering purchasing a subscription to BACI and this would be a substantial investment to make personally.

                          Originally posted by Said Jafar View Post
                          Dear Urvashi Bolaky (and rasheed sufyan)
                          For the missing data issue, i would recommend you use the BACI database of the CEPII, It is constructed by applying econometric techniques to the UN COMTRADE data, which provides corrections to reported trade flows and eliminates all missing trade data. You can find more information about their methodology in Gaulier & Zignago 2010. See more here on Baci database. Note that, to get access to Baci database you need to have full subscription to Uncomtrade data. This is paid, but possibly your university or supervisor is already subscribing it.
                          Context: I am estimating the trade creation and trade diversion effects (for a number of different trade agreements) using a gravity model (estimated by PPML). I have implemented some database building exercises in STATA, however, I am yet to experiment with merging BACI trade data with a gravity dataset or .

                          1. Do you think BACI can be used to consider total merchandise trade (e.g. by combining product lines)?

                          2. If so, do you think this bilateral trade data could be easily merged with the new USITC dynamic gravity dataset ? (I recognize this is a rather specific question but this new USITC dataset is quite exciting for me - as it seems to even be an improvement on the recently updated CEPII gravity dataset)

                          3. If using raw UN comtrade data (downloaded in bulk via WITS), any countries that have zero trade for a given year appear to be automatically removed (i.e. the bilateral trade data will not necessarily match a CEPII / USITC dataset - I imagine that one could basically forget the zero trade countries and just use gravity variables for those countries that actually have trade, but then all the others would not be included). My understanding, is that if using PPML, one would have to go back line by line and re-introduce those bilateral trade zero values which could be very resource intensive. Is that correct? Is there a simple way to address this issue?

                          4. (an extra question if you have the time as a matter of curiosity) In your opinion, is IMF Data or UN comtrade data more advantageous in gravity analyses (for total merchandise trade between countries)? I have noticed that when IMF data is used, authors will sometimes complete multiple analyses using the values reported by the importer and then the values reported by the exporter (to see if there is any substantial difference in estimates when the reported values do not match), so from a practical standpoint, using IMF data appears more resource intensive.


                          Regards,
                          A

                          Comment


                          • #14
                            Dear Andrew,

                            Welcome to the forum. I will provide short answer to yor questions.

                            1) Yes, you can combine HS lines. You will need to define which products considered merchandise though.

                            2) Yes, can be merged by ISO country codes and other characteristics

                            3) Firstly, I would recommend you use BACI data, as it also account for re-exports. Uncomtrade does not.

                            In any case, you do not have to go line by line. There is a stata code for this, called -fillin-

                            For example, you can -fillin- by exporter and importer, as below:

                            Code:
                            fillin ISO_exporter ISO_importer                 //after this, trade_value for some of the exporters/importers will be . 
                            replace trade_value=0 if trade_value==.    // this will replace those unknown values with zero
                            drop if ISO_exporter=ISO_importer           //country A do not export/import anything from itself, so need to be dropped
                            Please start another thread if you have question on these methodology. I hope others will respond.

                            4) Uncomtrade is specialized on trade data, so i would recommend Uncomtrade or BACI (which is based on Uncomtrade data). Note that IMF DOTs data also partially based on Uncomtrade data.


                            Hope this helps,
                            Said



                            Comment


                            • #15
                              Thanks Said Jafar
                              That's very helpful - I just wanted to run one more thought past you if that's okay? Given that BACI starts at 1992, have you had any issues combining BACI data with data prior to this period?

                              Comment

                              Working...
                              X