Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Run test on subsamples of different length

    I'm using STATA/MP 13.0. I have an unbalanced panel data with 37 variables and 756 obs. set on 'Licit' and 'ID'. 'Licit' is the name of the procurement auction and 'ID' is the identifier of the bidding firm. I need to do the following:

    1) Generate all possible pair combinations of 'ID', say 'J' combinations.
    2) Identify, for every pair identified in 1), if both firms are present in the same 'Licit'. Note that the number of bidders in each Licit changes.
    3) Run 'J' spearman correlation tests, between var1 and var2, only on the subset of observations that satisfy 2).

    I leave an example table below for the purpose of clarity:
    Row Licit ID Var1 Var2
    1 10/2010 2 0.3245 0.14578
    2 10/2010 1 0.6521 0.4651
    3 10/2010 3 0.7894 0.2514
    4 10/2010 4 0.9548 0.3625
    5 18/2012 3 0.3411 0.1136
    6 18/2012 1 0.5987 0.4712
    7 18/2012 2 0.6987 0.7321
    8 C/41 1 0.2584 0.1024
    9 C/41 2 0.2901 0.0012
    10 C/41 3 0.1324 0.6325
    In this example I would do the correlation test (between Var1 and Var2) for pair 1,2 on the following observations:
    Row Licit ID Var1 Var2
    1 10/2010 2 0.3245 0.14578
    2 10/2010 1 0.6521 0.4651
    6 18/2012 1 0.5987 0.4712
    7 18/2012 2 0.6987 0.7321
    8 C/41 1 0.2584 0.1024
    9 C/41 2 0.2901 0.0012
    Thank you very much.
    Pablo



  • #2
    I'm confused by your example. Why are you not also interested in ID pairs 1,3; 1,4; 2,3; and 2;4. After all, ID's 1, 2, 3, and 4 are all in licit 10/2010. And there are many more pairs like that you have excluded but I don't understand why.

    Also, what is a J-Spearman correlation? I am familiar with Spearman correlations, but have never heard of the J-Spearman.

    Comment


    • #3
      Thanks for the response. Maybe the example does not help.
      Actually I'm also interested in pairs 1,3; 1,4; 2,3 and 2,4. I just wanted to point out (as an example) the observations that must be considered for pair 1,2.
      I defined 'J' as the number of all possible pair combinations of ID's. The test I want to run is Spearman correlation test.
      Thanks!

      Comment


      • #4
        So, if I understand what you want, this will do it:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte row str7 licit byte id float(var1 var2)
         1 "10/2010" 2 .3245 .14578
         2 "10/2010" 1 .6521  .4651
         3 "10/2010" 3 .7894  .2514
         4 "10/2010" 4 .9548  .3625
         5 "18/2012" 3 .3411  .1136
         6 "18/2012" 1 .5987  .4712
         7 "18/2012" 2 .6987  .7321
         8 "C/41"    1 .2584  .1024
         9 "C/41"    2 .2901  .0012
        10 "C/41"    3 .1324  .6325
        end
        save iorio, replace
        
        //    MAKE A DATA SET WITH ALL PAIRS FOUND IN SAME LICIT
        keep licit id
        rename id id2
        tempfile copy
        save `copy'
        rename id2 id1
        joinby licit using `copy'
        // AVOID SELF-PAIRS & PAIRS IN REVERSE ORDER
        keep if id1 < id2
        //    REDUCE TO JUST PAIRS, REGARDLESS OF LICIT    
        drop licit
        duplicates drop
        //    AND RESHAPE LONG
        gen long pair_num = _n
        reshape long id, i(pair_num) j(_j)
        
        //    PROGRAM TO CALCULATE SPEARMAN'S CORRELATION FOR EACH PAIR
        capture program drop one_pair
        program define one_pair
            merge 1:m id using iorio, assert(match using) keep(match) nogenerate ///
                keepusing(var1 var2)
            spearman var1 var2
            gen spearman_N = r(N)
            gen spearman_rho = r(rho)
            gen spearman_p = r(p)
            keep id _j spearman_*
            gen byte one = 1
            duplicates drop
            reshape wide id, i(one) j(_j)
            drop one
            exit
        end
        
        runby one_pair, status by(pair_num)
        Note: requires -runby-, written by Robert Picard and me, and available from SSC.

        Added: In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use -dataex-.
        Last edited by Clyde Schechter; 31 May 2018, 11:53.

        Comment


        • #5
          Thank you very much Clyde for this help.

          I think I made a little, but not minor, mistake in the example.
          In order to calculate spearman's correlation test, I do not have two variables (in my example var1 and var2), I only have one, say 'var1'.
          This 'var1' computes the residuals of an OLS estimation, so I have obs of 'var1' for all the sample.

          Given this, how could the program be modified in order to calculate spearman's rho?

          Thanks again and apologies for my mistake.

          Comment


          • #6
            If you only have one variable, then there is nothing to calculate a spearman's rho with. By definition, spearman's rho applies to a pair of variables.

            Comment


            • #7
              Although I only have one variable (residuals of OLS estimation) I can take, for example, the residuals of firm 1 and firm 2 whenever they interact in the same 'Licit' and generate two variables (say resid1 and resid2). Once I have computed this two variables (and because they have the same length), I can run the correlation test between residuals of firm 1 and firm 2.

              Then I obviously want to extendthe procedure to all pair combinations of ID's.

              I hope this clarifies my point.

              Kind regards
              Last edited by Pablo Iorio; 31 May 2018, 14:48.

              Comment


              • #8
                I don't follow you. Let's look at IDs 3 and 4. They overlap in licit 10/2010, but nowhere else. So even when we pair up a single variable from those two firms in licit 10/2010, we have only a single observation, and so no correlation of any kind can be calculate. If we just try to pair up all observations of a variable with ID 3 with some observation of that variable from ID4, then we are just creating a data jumble and the results will be irreproducible, varying with the particular pairing. So this doesn't make sense to me.

                Comment


                • #9
                  I'll go further in details with an extended example from my database generated with dataex. The data is the following:
                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str7 Licit byte ID float resid
                  "23/2014" 1   -.08863044
                  "33/2014" 1     .1764896
                  "34/2014" 1     .0672189
                  "5/2009"  1    -.1714207
                  "69/2012" 1    .01927018
                  "8/2009"  1   -.12780762
                  "C/102"   1  -.015892923
                  "C/59"    1   -.02353555
                  "C/60"    1    .00589478
                  "C/61"    1   -.02175379
                  "C/63"    1    -.0886798
                  "C/72"    1    .12476528
                  "C/73"    1 -.0086942315
                  "C/76"    1     .1484325
                  "C/88"    1  -.007280052
                  "C/91"    1  -.014392734
                  "C/96"    1   -.13664109
                  "C/99"    1   -.12697756
                  "M/43"    1   .009680748
                  "M/50"    1  -.066419005
                  "M/55"    1     -.108989
                  "M/71"    1   -.06064075
                  "M/72"    1    -.1232829
                  "23/2014" 2     -.195566
                  "69/2012" 2    -.0100981
                  "8/2015"  2  .0011730194
                  "C/102"   2  .0020121336
                  "C/44"    2   -.03775865
                  "C/48"    2   .006780803
                  "C/49"    2    .08162308
                  "C/59"    2  -.020061016
                  "C/60"    2  .0007587671
                  "C/61"    2    .04443848
                  "C/62"    2    .09426343
                  "C/63"    2   -.05003059
                  "C/64"    2  -.021544933
                  "C/65"    2   -.03327465
                  "C/72"    2    .02283168
                  "C/73"    2  -.007146239
                  "C/74"    2  -.017294884
                  "C/75"    2    .01878369
                  "C/76"    2   .013592482
                  "C/99"    2   -.05736911
                  "M/36"    2    .17096734
                  "M/43"    2    .09160209
                  "M/44"    2    .01725483
                  "M/46"    2    .02561784
                  "M/50"    2  -.067733884
                  "M/55"    2    .08373046
                  "M/56"    2   -.12251318
                  "M/69"    2   -.11111224
                  "M/70"    2   -.09072387
                  "M/71"    2    .14346284
                  "M/72"    2    .08696616
                  end
                  In this example I have the variables 'ID', 'Licit' and 'resid'. There are 54 obs. and only two ID's. ID 1 and ID 2 appear jointly only in 16 'Licit' (despite there are a total of 38 'Licit'). So, I could generate the following variable: 'resid1_12' that would contain the values reported on 'resid' for those 'Licit' in which ID 1 appear together with ID 2 (Note that the first number, in the proposed variables, allude to the ID and the second to the pair combination). This vector will be 16 rows long. I'd do the same for resid2_12 and, as both new variables will be 16 rows long, I can run a spearman test between them.

                  I want to extend this procedure to all pair combinations of ID's (in my dataset, 41). Note that other pair combination, say ID 1 and ID 3 will probably appear together in a different number of 'Licit', compared to pair 1-2.
                  Also note that there may be some pairs with no match in any Licit or only very few, so Spearman test might not be carried out. Dispite this, the procedure should go on with another pair and whenver is possible report the results of the test.

                  I hope this clarifies the point.
                  Tks!

                  Comment


                  • #10
                    OK, I think I get it now. The following code will, I believe, do the trick. For any pair where the calculation of a Spearman correlation leads to an error, that pair will simply be skipped over.

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str7 Licit byte ID float resid
                    "23/2014" 1   -.08863044
                    "33/2014" 1     .1764896
                    "34/2014" 1     .0672189
                    "5/2009"  1    -.1714207
                    "69/2012" 1    .01927018
                    "8/2009"  1   -.12780762
                    "C/102"   1  -.015892923
                    "C/59"    1   -.02353555
                    "C/60"    1    .00589478
                    "C/61"    1   -.02175379
                    "C/63"    1    -.0886798
                    "C/72"    1    .12476528
                    "C/73"    1 -.0086942315
                    "C/76"    1     .1484325
                    "C/88"    1  -.007280052
                    "C/91"    1  -.014392734
                    "C/96"    1   -.13664109
                    "C/99"    1   -.12697756
                    "M/43"    1   .009680748
                    "M/50"    1  -.066419005
                    "M/55"    1     -.108989
                    "M/71"    1   -.06064075
                    "M/72"    1    -.1232829
                    "23/2014" 2     -.195566
                    "69/2012" 2    -.0100981
                    "8/2015"  2  .0011730194
                    "C/102"   2  .0020121336
                    "C/44"    2   -.03775865
                    "C/48"    2   .006780803
                    "C/49"    2    .08162308
                    "C/59"    2  -.020061016
                    "C/60"    2  .0007587671
                    "C/61"    2    .04443848
                    "C/62"    2    .09426343
                    "C/63"    2   -.05003059
                    "C/64"    2  -.021544933
                    "C/65"    2   -.03327465
                    "C/72"    2    .02283168
                    "C/73"    2  -.007146239
                    "C/74"    2  -.017294884
                    "C/75"    2    .01878369
                    "C/76"    2   .013592482
                    "C/99"    2   -.05736911
                    "M/36"    2    .17096734
                    "M/43"    2    .09160209
                    "M/44"    2    .01725483
                    "M/46"    2    .02561784
                    "M/50"    2  -.067733884
                    "M/55"    2    .08373046
                    "M/56"    2   -.12251318
                    "M/69"    2   -.11111224
                    "M/70"    2   -.09072387
                    "M/71"    2    .14346284
                    "M/72"    2    .08696616
                    end
                    save iorio, replace
                    
                    //    MAKE A DATA SET WITH ALL PAIRS FOUND IN SAME Licit
                    keep Licit ID
                    tempfile copy
                    save `copy'
                    rangejoin ID 1 . using `copy', by(Licit)
                    drop if missing(ID_U)
                    rename ID ID1
                    rename ID_U ID2
                    //    AND RESHAPE LONG
                    by ID1 ID2, sort: gen long pair_num = _n == 1
                    replace pair_num = sum(pair_num)
                    reshape long ID, i(pair_num Licit) j(_j)
                    
                    //    PROGRAM TO CALCULATE SPEARMAN'S CORRELATION FOR EACH PAIR
                    capture program drop one_pair
                    program define one_pair
                        merge 1:m ID Licit using iorio, assert(match using) keep(match) nogenerate ///
                            keepusing(resid Licit)
                        reshape wide ID resid, i(Licit) j(_j)
                        spearman resid1 resid2
                        gen spearman_N = r(N)
                        gen spearman_rho = r(rho)
                        gen spearman_p = r(p)
                        keep ID* spearman_*
                        keep in L
                        exit
                    end
                    
                    runby one_pair, by(pair_num) status
                    As before, this requires -rangejoin-, by Robert Picard, and -runby- by Robert Picard and me, both available from SSC.

                    Note: The -status- option of -runby- just gives you a progress report on how much of the data has been processed and how much time is estimated to remain. It is only relevant if you have a very large data set where you might wonder for hours whether Stata is actually doing anything or is hung. If your data can be processed quickly, you can dispense with that option.

                    Comment


                    • #11
                      Thanks Clyde!

                      I run the code exactly as it is in your post. Although the code worked in every line, something went wrong. No variables were generated and the output reports an error. I paste the output below:

                      elapsed ----------- by-groups ---------- ------- observations ------ time
                      time count errors no-data processed saved remaining
                      ------------------------------------------------------------------------------------
                      00:00:00 1 1 0 32 0 00:00:00

                      --------------------------------------
                      Number of by-groups = 1
                      by-groups with errors = 1
                      by-groups with no data = 0
                      Observations processed = 32
                      Observations saved = 0
                      --------------------------------------

                      As there were 16 obs, the test should have been performed and spearman results reported. Any idea why this happened?

                      Comment


                      • #12
                        When I run the exact code from my post (with the exact data there) I get:

                        Code:
                        . runby one_pair, by(pair_num) status
                        
                          elapsed ----------- by-groups ----------    ------- observations ------       time
                             time      count     errors    no-data        processed         saved  remaining
                        ------------------------------------------------------------------------------------
                         00:00:00          1          0          0               32             1   00:00:00
                        
                        --------------------------------------
                        Number of by-groups    =             1
                        by-groups with errors  =             0
                        by-groups with no data =             0
                        Observations processed =            32
                        Observations saved     =             1
                        --------------------------------------
                        
                        . 
                        end of do-file
                        
                        . list
                        
                             +--------------------------------------------+
                             | ID1   ID2   spearm~N   spearm~o   spearm~p |
                             |--------------------------------------------|
                          1. |   1     2         16   .2176471   .4180868 |
                             +--------------------------------------------+
                        I don't know what the problem might be. Either the data are different or you modified the code in some way.

                        I suggest you add the -verbose- option to the -runby- command. That way you will get the output from program one_pair, including any error messages it throws, and you can start to troubleshoot the problem.

                        Comment


                        • #13
                          Perfect! I found the bug and the code does all the work.

                          One last comment. As I mentioned, I have more than ID 1 and ID 2, actually a total of 41. I think the code tested all possible pair combinations of ID's but it always reports, in output, ID 1 and ID 2 (the same happens with variable 'resid' and 'j'.
                          Can this be modified in order to identify which pair of ID's are being tested?

                          Thanks!

                          Comment


                          • #14
                            I can't fix this problem, because I can't replicate it. I created a new data set with more IDs and pairs, and the code behaves as expected correctly reporting the IDs and results.

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str7 Licit byte ID float(resid obs_no rep)
                            "23/2014" 1   -.08863044  1 1
                            "23/2014" 3   -.06837456  1 2
                            "33/2014" 1     .1764896  2 1
                            "33/2014" 3     .1869159  2 2
                            "34/2014" 1     .0672189  3 1
                            "34/2014" 3   .070196025  3 2
                            "5/2009"  1    -.1714207  4 1
                            "5/2009"  3     -.188642  4 2
                            "69/2012" 1    .01927018  5 1
                            "69/2012" 3   .011978185  5 2
                            "8/2009"  1   -.12780762  6 1
                            "8/2009"  3   -.11918936  6 2
                            "C/102"   1  -.015892923  7 1
                            "C/102"   3  -.018286463  7 2
                            "C/59"    1   -.02353555  8 1
                            "C/59"    3   -.01837006  8 2
                            "C/60"    1    .00589478  9 1
                            "C/60"    3  -.012225377  9 2
                            "C/61"    1   -.02175379 10 1
                            "C/61"    3  -.031905033 10 2
                            "C/63"    1    -.0886798 11 1
                            "C/63"    3   -.07772613 11 2
                            "C/72"    1    .12476528 12 1
                            "C/72"    3    .13809177 12 2
                            "C/73"    1 -.0086942315 13 1
                            "C/73"    3  -.012301112 13 2
                            "C/76"    1     .1484325 14 1
                            "C/76"    3    .15578355 14 2
                            "C/88"    1  -.007280052 15 1
                            "C/88"    3   -.01689488 15 2
                            "C/91"    1  -.014392734 16 1
                            "C/91"    3  -.006871434 16 2
                            "C/96"    1   -.13664109 17 1
                            "C/96"    3   -.14130762 17 2
                            "C/99"    1   -.12697756 18 1
                            "C/99"    3   -.15149833 18 2
                            "M/43"    1   .009680748 19 1
                            "M/43"    3   -.01353931 19 2
                            "M/50"    1  -.066419005 20 1
                            "M/50"    3   -.07253471 20 2
                            "M/55"    1     -.108989 21 1
                            "M/55"    3   -.11360715 21 2
                            "M/71"    1   -.06064075 22 1
                            "M/71"    3  -.036277592 22 2
                            "M/72"    1    -.1232829 23 1
                            "M/72"    3   -.12540632 23 2
                            "23/2014" 2     -.195566 24 1
                            "23/2014" 4   -.20368057 24 2
                            "69/2012" 2    -.0100981 25 1
                            "69/2012" 4  -.009893788 25 2
                            "8/2015"  2  .0011730194 26 1
                            "8/2015"  4   .010262764 26 2
                            "C/102"   2  .0020121336 27 1
                            "C/102"   4    .01201681 27 2
                            "C/44"    2   -.03775865 28 1
                            "C/44"    4   -.03735859 28 2
                            "C/48"    2   .006780803 29 1
                            "C/48"    4    .02421416 29 2
                            "C/49"    2    .08162308 30 1
                            "C/49"    4    .07791443 30 2
                            "C/59"    2  -.020061016 31 1
                            "C/59"    4   -.02160858 31 2
                            "C/60"    2  .0007587671 32 1
                            "C/60"    4   -.01478582 32 2
                            "C/61"    2    .04443848 33 1
                            "C/61"    4    .03735816 33 2
                            "C/62"    2    .09426343 34 1
                            "C/62"    4    .09366146 34 2
                            "C/63"    2   -.05003059 35 1
                            "C/63"    4   -.05082072 35 2
                            "C/64"    2  -.021544933 36 1
                            "C/64"    4  -.015489371 36 2
                            "C/65"    2   -.03327465 37 1
                            "C/65"    4   -.04193979 37 2
                            "C/72"    2    .02283168 38 1
                            "C/72"    4    .02713548 38 2
                            "C/73"    2  -.007146239 39 1
                            "C/73"    4   -.01630671 39 2
                            "C/74"    2  -.017294884 40 1
                            "C/74"    4  -.008052998 40 2
                            "C/75"    2    .01878369 41 1
                            "C/75"    4    .02012406 41 2
                            "C/76"    2   .013592482 42 1
                            "C/76"    4   .023830246 42 2
                            "C/99"    2   -.05736911 43 1
                            "C/99"    4   -.05536223 43 2
                            "M/36"    2    .17096734 44 1
                            "M/36"    4    .16673024 44 2
                            "M/43"    2    .09160209 45 1
                            "M/43"    4    .09470539 45 2
                            "M/44"    2    .01725483 46 1
                            "M/44"    4    .01211233 46 2
                            "M/46"    2    .02561784 47 1
                            "M/46"    4   .018126957 47 2
                            "M/50"    2  -.067733884 48 1
                            "M/50"    4   -.08127552 48 2
                            "M/55"    2    .08373046 49 1
                            "M/55"    4    .07720452 49 2
                            "M/56"    2   -.12251318 50 1
                            "M/56"    4   -.12218434 50 2
                            end
                            
                            save iorio, replace
                            
                            //    MAKE A DATA SET WITH ALL PAIRS FOUND IN SAME Licit
                            keep Licit ID
                            tempfile copy
                            save `copy'
                            rangejoin ID 1 . using `copy', by(Licit)
                            drop if missing(ID_U)
                            rename ID ID1
                            rename ID_U ID2
                            //    AND RESHAPE LONG
                            by ID1 ID2, sort: gen long pair_num = _n == 1
                            replace pair_num = sum(pair_num)
                            reshape long ID, i(pair_num Licit) j(_j)
                            
                            //    PROGRAM TO CALCULATE SPEARMAN'S CORRELATION FOR EACH PAIR
                            capture program drop one_pair
                            program define one_pair
                                merge 1:m ID Licit using iorio, assert(match using) keep(match) nogenerate ///
                                    keepusing(resid Licit)
                                reshape wide ID resid, i(Licit) j(_j)
                                spearman resid1 resid2
                                gen spearman_N = r(N)
                                gen spearman_rho = r(rho)
                                gen spearman_p = r(p)
                                keep ID* spearman_*
                                keep in L
                                exit
                            end
                            
                            runby one_pair, by(pair_num) status
                            which produces

                            Code:
                              elapsed ----------- by-groups ----------    ------- observations ------       time
                                 time      count     errors    no-data        processed         saved  remaining
                            ------------------------------------------------------------------------------------
                             00:00:00          6          0          0              212             6   00:00:00
                            
                            --------------------------------------
                            Number of by-groups    =             6
                            by-groups with errors  =             0
                            by-groups with no data =             0
                            Observations processed =           212
                            Observations saved     =             6
                            --------------------------------------
                            
                            . list, noobs clean
                            
                                ID1   ID2   spearm~N   spearm~o   spearm~p  
                                  1     2         14   .4637363   .0948763  
                                  1     3         23   .9782609   7.88e-16  
                                  1     4         14   .4857143   .0782759  
                                  2     3         14   .3714286   .1910208  
                                  2     4         27    .976801   2.99e-18  
                                  3     4         14   .3846154   .1745088
                            I can't help thinking that you have modified the code in some way that is causing problems, or that there is something about your actual data that is materially different from your example. Please use my code exactly as given above. If it does not produce correct outputs, please post back with a -dataex- that demonstrates the problem you are having when used with that code.

                            Comment


                            • #15
                              Great! It works perfectly! I skipped the command:

                              . list, noobs clean

                              Thank you very much for your great patience and knowledge. As I am knew to the forum, maybe there's a better name to describe the post so others users, with a similar problem, can access it easily and benefit from the code you produced. If you feel like there's a better description I could try to edit the name.

                              Thanks again.
                              Kind regards!
                              Pablo
                              Last edited by Pablo Iorio; 01 Jun 2018, 16:53.

                              Comment

                              Working...
                              X