Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating simple and conditional probability

    Dear Stata community,

    I hope you can help me out with this. I have a large data set of trade data per product per country, with variables rca (revealed comparative advantage), product code (1200 products), country (150 countries).

    I want to calculate the conditional probability that a country exports a certain product X with rca>1, given that it exports product Y with rca>1. This is simply the number of countries that are specialized in both product X and Y (rca>1), divided by the number of countries specialized in product Y.

    Then I want to compare this to the simple probability that a country is specialized in a certain product X. (Number of countries specialized in X divided by total number of countries)

    My previous experience only covers regression analysis, but this is a completely different type of problem. I think the simple regression should be possible with looping, but despite reading foreach/forvalues manuals and watching Youtube tutorials I have not been able to do this.

    The conditional probabilities I want to calculate for 50 products.

    I would be really grateful if someone could point me in the right direction.

  • #2
    Welcome to Statalist.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question.

    Even the best descriptions of data are no substitute for an actual example of the data. Whoever helps you will quite likely want to be able to test their code in an example of your data.

    Please be sure to use the dataex command to show some example data, for three or four products, with enough observations to be able to do the calculations. If you are running version 15.1 or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use dataex.

    The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

    Comment


    • #3
      Sorry about that, here's a sample of the data:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float export_rca str3 location_code int hs_product_code
      .0004864609 "CHN"  101
         3.174224 "ARG"  101
        2.0216224 "FRA"  101
        .13055277 "MEX"  101
        2.5231106 "LUX"  101
        .02927636 "ARG"  102
         5.830869 "FRA"  102
        3.0281174 "MEX"  102
        .05310592 "CHN"  102
         2.533745 "LUX"  102
        .05087423 "ARG" 2715
      .0021011257 "MEX" 2715
         .2413325 "FRA" 2715
         .7990255 "LUX" 2715
        .02876595 "CHN" 2715
      end
      Please let me know if you need more data.

      Comment


      • #4
        Thanks for the sample of the data, it's a lot easier to show you code than to write an essay explaining what I did. You'll see below that I altered the value of export_rca for LUX product 102 so that product 102 would have a different simple probability than product 101 - it makes it easier to see how the numbers work out. I hope this points you in a useful direction, and hope that if I made any mistakes they're easy enough to figure out. It seemed good to me.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float export_rca str3 location_code int hs_product_code
        .0004864609 "CHN"  101
           3.174224 "ARG"  101
          2.0216224 "FRA"  101
          .13055277 "MEX"  101
          2.5231106 "LUX"  101
          .02927636 "ARG"  102
           5.830869 "FRA"  102
          3.0281174 "MEX"  102
          .05310592 "CHN"  102
           0.533745 "LUX"  102 (altered)
          .05087423 "ARG" 2715
        .0021011257 "MEX" 2715
           .2413325 "FRA" 2715
           .7990255 "LUX" 2715
          .02876595 "CHN" 2715
        end
        
        rename (export_rca hs_product_code) (rcaX prodX)
        generate byte specX = rcaX>1
        
        tempfile ds
        save `ds'
        
        rename (rcaX prodX specX) (rcaY prodY specY)
        
        joinby location_code using `ds'
        order location_code prodX prodY specX specY rcaX rcaY
        sort prodX prodY location_code
        drop if prodX==prodY
        
        generate specXY = specX & specY
        
        list, sepby(prodX prodY) noobs
        
        collapse (count) N=specX (sum) specX specY specXY, by(prodX prodY)
        generate p_cond = specXY/specY
        generate p_simp = specX/N
        format %9.2f p_cond p_simp
        
        list, clean noobs
        Code:
        . list, sepby(prodX prodY) noobs
        
          +-------------------------------------------------------------------------+
          | locati~e   prodX   prodY   specX   specY       rcaX       rcaY   specXY |
          |-------------------------------------------------------------------------|
          |      ARG     101     102       1       0   3.174224   .0292764        0 |
          |      CHN     101     102       0       0   .0004865   .0531059        0 |
          |      FRA     101     102       1       1   2.021622   5.830869        1 |
          |      LUX     101     102       1       0   2.523111    .533745        0 |
          |      MEX     101     102       0       1   .1305528   3.028117        0 |
          |-------------------------------------------------------------------------|
          |      ARG     101    2715       1       0   3.174224   .0508742        0 |
          |      CHN     101    2715       0       0   .0004865    .028766        0 |
          |      FRA     101    2715       1       0   2.021622   .2413325        0 |
          |      LUX     101    2715       1       0   2.523111   .7990255        0 |
          |      MEX     101    2715       0       0   .1305528   .0021011        0 |
          |-------------------------------------------------------------------------|
          |      ARG     102     101       0       1   .0292764   3.174224        0 |
          |      CHN     102     101       0       0   .0531059   .0004865        0 |
          |      FRA     102     101       1       1   5.830869   2.021622        1 |
          |      LUX     102     101       0       1    .533745   2.523111        0 |
          |      MEX     102     101       1       0   3.028117   .1305528        0 |
          |-------------------------------------------------------------------------|
          |      ARG     102    2715       0       0   .0292764   .0508742        0 |
          |      CHN     102    2715       0       0   .0531059    .028766        0 |
          |      FRA     102    2715       1       0   5.830869   .2413325        0 |
          |      LUX     102    2715       0       0    .533745   .7990255        0 |
          |      MEX     102    2715       1       0   3.028117   .0021011        0 |
          |-------------------------------------------------------------------------|
          |      ARG    2715     101       0       1   .0508742   3.174224        0 |
          |      CHN    2715     101       0       0    .028766   .0004865        0 |
          |      FRA    2715     101       0       1   .2413325   2.021622        0 |
          |      LUX    2715     101       0       1   .7990255   2.523111        0 |
          |      MEX    2715     101       0       0   .0021011   .1305528        0 |
          |-------------------------------------------------------------------------|
          |      ARG    2715     102       0       0   .0508742   .0292764        0 |
          |      CHN    2715     102       0       0    .028766   .0531059        0 |
          |      FRA    2715     102       0       1   .2413325   5.830869        0 |
          |      LUX    2715     102       0       0   .7990255    .533745        0 |
          |      MEX    2715     102       0       1   .0021011   3.028117        0 |
          +-------------------------------------------------------------------------+
        Code:
        . list, clean noobs
        
            prodX   prodY   N   specX   specY   specXY   p_cond   p_simp  
              101     102   5       3       2        1     0.50     0.60  
              101    2715   5       3       0        0        .     0.60  
              102     101   5       2       3        1     0.33     0.40  
              102    2715   5       2       0        0        .     0.40  
             2715     101   5       0       3        0     0.00     0.00  
             2715     102   5       0       2        0     0.00     0.00

        Comment


        • #5
          William Lisowski Thank you so much, this is exactly what I needed. It took me a while to get it to work because of memory problems with the joinby command (8GB of RAM), but I finally managed to do it by running joinby on 10 countries at a time, and then joining them with the append command. I would never have been able to come up with this on my own, I owe you a big thank you!

          Comment

          Working...
          X