Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Dear Statalist,
    I am Giulia and this is my first post. I am working on a paper studying the effects of Non-Tariff Measures (SPS and TBT) on Processed Food Exports. I have 149 countries, 13 years and 56 product lines for a total of 16,053,856 observations (exporter * importer * year *product). I constructed a dummy variable for both SPSjikt and TBTjikt notifications. The dummies pick up value 1 from year t and onwards if the importing country i imposed at least one sps/tbt measure on product line k in year t. The vector of the other variables includes standard gravity covariates.
    I tried to use the ppml_panel_sg command by typing:
    ppml_panel_sg trade sps tbt, ex(iso_o) im(iso_d) y(year) ind(HS) sym cluster(id)
    where id=group(iso_o iso_d HS),the variable I used to set the panel dimension of the dataset.
    What Stata returns is the following:
    “Checking for possible non-existence issues...
    note: sps omitted because of collinearity over lhs>0 (creates possible existence issue)
    note: tbt omitted because of collinearity over lhs>0 (creates possible existence issue)
    Error: all main covariates appear to be collinear with the implied set of fixed effects”
    I have no clue regarding a possible solution to this problem. Alternatively, I was thinking about a two-stage procedure (à la Helpman-Melitz-Rubinstein) to take into account all the zeros. In particular, to run a Probit estimation for the first stage and then using the areg command for the second stage, absorbing the exporter-time, importer-time and pair FE.
    Any suggestion would be greatly appreciated.
    Thanks in advance,
    Giulia

    Comment


    • #17
      Thanks Joao.

      Tom: thank you for the suggestion to use poi2hdfe, I tried the following

      poi2hdfe y_cgt x_cgt _CG*, id1(id_GT) id2(id_CT) cluster(id_CG)

      I am explicitly estimating coefficients for the smallest FE as you suggest (the country-group FE), but after over an hour of waiting the estimator failed to converge. I'm at a loss with what I can do to move forward. Any other suggestions would be most appreciated.

      Thanks,
      Andrew

      Comment


      • #18
        Dear Andrew,

        Unlike -ppml- and -ppml_panel_sg-, poi2hdfe is not guaranteed to converge. One option is to first use -ppml- to selects the sample and variables to use and then do poi2hdfe. The -ppml- help file has an ecemple of how to do this with a tobit, but with -poi2hdfe- should be similar.

        Best wishes,

        Joao

        Comment


        • #19
          Dear Giulia,

          If I understand it correctly, your variables of interest are just characteristics of the importer, and therefore are dropped when you include destination dummies. If that is the case, you simply cannot estimate their effect.

          You will have the same problem with the HMR approach. Anyway, I would strongly advise against that approach for reasons explained here.

          Best wishes,

          Joao

          Comment


          • #20
            Originally posted by Andrew Chan View Post
            Thanks Joao.

            Tom: thank you for the suggestion to use poi2hdfe, I tried the following

            poi2hdfe y_cgt x_cgt _CG*, id1(id_GT) id2(id_CT) cluster(id_CG)

            I am explicitly estimating coefficients for the smallest FE as you suggest (the country-group FE), but after over an hour of waiting the estimator failed to converge. I'm at a loss with what I can do to move forward. Any other suggestions would be most appreciated.

            Thanks,
            Andrew
            Hi Andrew,

            OK, sorry to hear that. I did have another suggestion though that (might?) work for you. Since your data does have a three-way interacted fixed effects structure (similar to panel gravity), is it possible to treat your "group" id as though it were an "importer"/"destination" in a gravity setup.

            If so, you could then run:

            ppml_panel_sg y_cgt x_cgt , ex(id_C) im(id_G) year(id_T) cluster(id_CG)

            which will give you "CT", "GT", and "CG" fixed effects (I think this is what you want, right?)

            I think this should work so long as (c,g,t) uniquely describes your data. If not, you may need to run collapse (sum) beforehand. Anyway, fingers crossed, but I think this should work...

            Regards,
            Tom

            Comment


            • #21
              Hi Tom,

              Thanks for following through, your suggestion worked! I appreciate it.

              I have one last question: is there a reason the -keep- option is not keeping groups with all zeros in the dep var? I realize -ppml_panel_sg- drops these observations normally, but shouldn't -keep- allow for estimation with all observations? Any advice why this is happening would be helpful. Thanks.

              Regards,
              Andrew
              Last edited by Andrew Chan; 11 Jul 2017, 08:05.

              Comment


              • #22
                Originally posted by Andrew Chan View Post
                Hi Tom,

                Thanks for following through, your suggestion worked! I appreciate it.

                I have one last question: is there a reason the -keep- option is not keeping groups with all zeros in the dep var? I realize -ppml_panel_sg- drops these observations normally, but shouldn't -keep- allow for estimation with all observations? Any advice why this is happening would be helpful. Thanks.

                Regards,
                Andrew
                Hi Andrew,

                Very happy to hear that it worked! As to your question, I think the dropped observations you are referring to must be the ones that are dropped because they belong to FE groups for which the LHS is always zero... an example would be if you have country-time fixed effects and you only observe zero values for a particular country and a particular year. It is not possible to include these observations as the FE that corresponds to them is not defined (technically it is negative infinity.)

                Hope this helps.

                Tom

                PS: I got your original post where you said you reached the max number of iterations before convergence, so I had some suggestions for diagnostics. I see now from the edited version of your post this is no longer an issue. So for posterity I will just leave what I originally wrote here:



                For if ppml_panel_sg does not converge within the max number of iterations...

                - Have you tried running either ppml_panel_sg with the "strict" option enabled. If so, do your of the main covariates drop? (This suggestion also applies to -ppml- as well, since this option is taken from the original -ppml- command.)

                - To check if it's actually converging, you can try running the following syntax:

                ppml_panel_sg y_cgt x_cgt , ex(id_C) im(id_G) year(id_T) cluster(id_CG) verb(25) noaccel tol(50000)

                where "verb(25)" will show output from every 25 iterations and tol(50000) can be used to toggle the max number of iterations. Does it look like it is converging?

                - If you run this code this may figure out whether there is a particular variable that may be causing a problem, you can run

                hdfe x_cgt if y_cgt>0, absorb(id_CT id_GT id_CG) gen(test)
                sum test*
                if y_cgt>0
                corr test* if y_cgt>0
                reg test* if y_cgt>0

                This will check the degree of collinearity in x_cgt over y_cgt>0 after netting out fixed effects. (You can also do the same without the if y_cgt>0 for a more general collinearity check.)

                Last edited by Tom Zylkin; 11 Jul 2017, 18:13.

                Comment


                • #23
                  ^ Sorry - to amend the above "tol(50000)" should be "max(50000)"... early morning mistake there: the "tol()" option obviously refers to the tolerance.

                  Comment


                  • #24
                    Dear Joao,

                    sorry for the late reply and thanks for your comment. Basically, when I have an "industry" dimension, my fixed effects become importer-industry-time FE therefore absorbing my variables of interest. I will have to find another empirical strategy then. Thank you again,
                    Giulia

                    Comment


                    • #25
                      Sorry for the slow reply Tom.

                      I appreciate your detailed suggestions. I have experimented a bit with max() and will continue to see what I can do to solve my convergence problems. At the moment I have no questions but I will followup if I have any other problems.

                      Thanks again!

                      Comment


                      • #26
                        Dear Santos-Silva,

                        I have read the post about the topic. I am having the same error message after making my sample a bit smaller. “variance matrix is nonsymmetric or highly singular”.
                        I am estimating a gravity model for migration with time*origin, time*destination and country pair fixed effect. 33 countries of destination and initially 193 origins. I was successful to estimate that model using ppml (takes 6 hours on average).
                        The command eliminated observations and variables (fixed effects). I decided to delete some of these myself and make my sample more "balanced" but after that I can no longer reproduce my results. I get this message: “variance matrix is nonsymmetric or highly singular” When I delete singletons, the problem is not solved.
                        is ppml sensible to sample size? or what could be the issue?

                        Thank you so much in advance if you read my post.

                        Best,
                        Ana

                        Comment


                        • #27
                          Dear Ana,

                          The most likely cause is that you are including variables that should be dropped; Stata sometimes has trouble identifying these. It may be that with the full sample all the coefficients are identified but that may not be the case with the smaller sample. Note that you are using a lot of fixed effects and therefore it will be difficult to identify the coefficients of other regressors. One thing you can do is to try the ppml_panel_sg command that is better at dealing with the fixed effects.

                          Best wishes,

                          Joao

                          Comment


                          • #28
                            Dear Santos Silva,

                            I am running the ppml comand on a cross section gravity model on migrations but I got no standard erros, I got only dots instead.
                            I have 558 regions and used the code:

                            ppml migra ljaffe ldisteuclid contig uf do2-do558 dde2-dde558 , cluster(ldisteuclid)

                            migra are the number of migrations from region o to region d
                            ljaffe is a variable between 0 and 1 (in ln)
                            ldisteuclid is euclidian distance (in ln)
                            contig is a dummy for sharing a commom border
                            uf is a dummy for belonging to the same state
                            do2-do558 are dummy variables for origin

                            dde2-dde558 are dummy variables for destination

                            Stata dropped 785 regressors , which are origin and destination FE dummys, and returned the messages:

                            Warning: variance matrix is nonsymmetric or highly singular
                            WARNING: The model appears to overfit some observations with migra=0


                            Can I use the ppml_panel_sg command instead, even working with cross-section ? And if I can, would the comand below be correct?

                            ppml_panel_sg migra ljaffe ldisteuclid contig uf , ex(o) im(d) cluster(ldisteuclid) nopair

                            Please, I would like any advise on how I can solve this problem.
                            Thank you very much

                            Best
                            Cirlene

                            Comment


                            • #29
                              Dear Cirlene,

                              I do not know about ppml_panel_sg but try the following:

                              - Run the model without the constant but including all the dummies (i.e., do not exclude the first category)
                              - Check for "singletons"

                              Best wishes,

                              Joao

                              Comment


                              • #30
                                Dear Santos Silva,

                                I will do that. Thank you for your advice.

                                Best
                                Cirlene

                                Comment

                                Working...
                                X