Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • A Problem with Collapse Command

    Dear friends,

    I want to make basic collapsing operation on STATA. I want to take the average of Norway's foreign aid flows to different regions after 2016. However, my output does not reflect the actual values.

    Dataex:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str60 recipient double year str12 newregion float norwaydac2a
    "Afghanistan" 2000 "Asia"     19.95
    "Afghanistan" 2001 "Asia"     63.36
    "Afghanistan" 2002 "Asia"     87.66
    "Afghanistan" 2003 "Asia"     85.37
    "Afghanistan" 2004 "Asia"     75.62
    "Afghanistan" 2005 "Asia"     58.83
    "Afghanistan" 2006 "Asia"     62.61
    "Afghanistan" 2007 "Asia"     75.25
    "Afghanistan" 2008 "Asia"     90.75
    "Afghanistan" 2009 "Asia"     94.62
    "Afghanistan" 2010 "Asia"     89.12
    "Afghanistan" 2011 "Asia"     89.35
    "Afghanistan" 2012 "Asia"     81.52
    "Afghanistan" 2013 "Asia"     80.74
    "Afghanistan" 2014 "Asia"     80.46
    "Afghanistan" 2015 "Asia"     76.05
    "Afghanistan" 2016 "Asia"     78.41
    "Afghanistan" 2017 "Asia"     67.52
    "Afghanistan" 2018 "Asia"     81.01
    "Afghanistan" 2019 "Asia"     71.62
    "Afghanistan" 2020 "Asia"     79.24
    "Afghanistan" 2021 "Asia"     63.61
    "Belize"      2000 "Americas"   .33
    "Belize"      2001 "Americas"   .05
    "Belize"      2002 "Americas"   .16
    "Belize"      2003 "Americas"   .09
    "Belize"      2004 "Americas"   .04
    "Belize"      2005 "Americas"     .
    "Belize"      2006 "Americas"     .
    "Belize"      2007 "Americas"     .
    "Belize"      2008 "Americas"     .
    "Belize"      2009 "Americas"     .
    "Belize"      2010 "Americas"     .
    "Belize"      2011 "Americas"     .
    "Belize"      2012 "Americas"     .
    "Belize"      2013 "Americas"   .11
    "Belize"      2014 "Americas"   .11
    "Belize"      2015 "Americas"     .
    "Belize"      2016 "Americas"     .
    "Belize"      2017 "Americas"     .
    "Belize"      2018 "Americas"     .
    "Belize"      2019 "Americas"     .
    "Belize"      2020 "Americas"     .
    "Belize"      2021 "Americas"     .
    "Albania"     2000 "Europe"    5.63
    "Albania"     2001 "Europe"     4.8
    "Albania"     2002 "Europe"    8.41
    "Albania"     2003 "Europe"    7.44
    "Albania"     2004 "Europe"    8.93
    "Albania"     2005 "Europe"    6.39
    "Albania"     2006 "Europe"     5.1
    "Albania"     2007 "Europe"    4.69
    "Albania"     2008 "Europe"    2.87
    "Albania"     2009 "Europe"     .85
    "Albania"     2010 "Europe"    1.86
    "Albania"     2011 "Europe"    1.78
    "Albania"     2012 "Europe"    1.01
    "Albania"     2013 "Europe"     1.1
    "Albania"     2014 "Europe"     .92
    "Albania"     2015 "Europe"    1.62
    "Albania"     2016 "Europe"     .54
    "Albania"     2017 "Europe"     .21
    "Albania"     2018 "Europe"     .67
    "Albania"     2019 "Europe"    1.22
    "Albania"     2020 "Europe"     1.4
    "Albania"     2021 "Europe"      .1
    "Argentina"   2000 "Americas"   .02
    "Argentina"   2001 "Americas"   .06
    "Argentina"   2002 "Americas"   .13
    "Argentina"   2003 "Americas"   .46
    "Argentina"   2004 "Americas"   .03
    "Argentina"   2005 "Americas"     .
    "Argentina"   2006 "Americas"   .02
    "Argentina"   2007 "Americas"   .03
    "Argentina"   2008 "Americas"   .08
    "Argentina"   2009 "Americas"   .04
    "Argentina"   2010 "Americas"   .07
    "Argentina"   2011 "Americas"   .03
    "Argentina"   2012 "Americas"   .03
    "Argentina"   2013 "Americas"     0
    "Argentina"   2014 "Americas"   .08
    "Argentina"   2015 "Americas"     .
    "Argentina"   2016 "Americas"     .
    "Argentina"   2017 "Americas"     .
    "Argentina"   2018 "Americas"     .
    "Argentina"   2019 "Americas"     .
    "Argentina"   2020 "Americas"     .
    "Argentina"   2021 "Americas"   .06
    "Armenia"     2000 "Asia"      3.35
    "Armenia"     2001 "Asia"      3.88
    "Armenia"     2002 "Asia"      4.72
    "Armenia"     2003 "Asia"      4.16
    "Armenia"     2004 "Asia"      3.73
    "Armenia"     2005 "Asia"      5.08
    "Armenia"     2006 "Asia"      4.69
    "Armenia"     2007 "Asia"      3.71
    "Armenia"     2008 "Asia"      2.33
    "Armenia"     2009 "Asia"      2.55
    "Armenia"     2010 "Asia"      2.65
    "Armenia"     2011 "Asia"      1.98
    end

    What I want to see:

    Code:
    total norwaydac2a if newregion=="Subsaharan" & year>2016
    disp  2662.92/5
    Output:

    Code:
    Total estimation                           Number of obs = 195
    
    --------------------------------------------------------------
                 |      Total   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
     norwaydac2a |    2662.92   267.8005      2134.746    3191.094
    --------------------------------------------------------------
    
    532.584
    Here, the code I used for the collapse:

    Code:
    collapse (mean) norwaydac2a, by(newregion), if year>2016
    The output, which is not correct. Please check subsaharan

    Code:
    newregion    norwaydac2a
    Americas    9.187467
    Asia    9.846723
    Europe    4.5318
    Middle East    42.89514
    North Africa    2.258333
    Ocenia    .7714286
    Subsaharan    13.656
    What should I fix to correct the output of the collapse command?

    Best,






  • #2
    I don't see that your data example lets us test anything in your code. But the answer may simply be that your collapse asks for a mean, which is equal to a total if and only if there is only one value. Or, other way round about a total compared with a mean.

    Comment


    • #3
      total norwaydac2a if newregion=="Subsaharan" & year>2016
      disp 2662.92/5
      The total command sums whereas the -mean()- function of egen calculates the arithmetic mean. So the divisor is the number of observations and it appears that there are more than 5 observations for Sub-Saharan Africa in your sample.

      Comment


      • #4
        Actually, I used total command to manually show what output I wanted to have. I first calculated the total amount after 2016. Then divided it by five. 532 is the output. I tried to do it by using mean function, which did not create the same result. It gives "13" instead of 532, which is not the correct output. I don't know what I it is the problem the command below

        Code:
        collapse (mean) norwaydac2a, by(newregion), if year>2016 & year<2022
        Last edited by Nihat Mugurtay; 13 Feb 2023, 22:52.

        Comment


        • #5
          This is as far as I can get with your data example.


          Code:
          . keep if inrange(year, 2017, 2021)
          (80 observations deleted)
          
          .
          . tabstat norwaydac2a , by(newregion) s(n sum mean)
          
          Summary for variables: norwaydac2a
          Group variable: newregion
          
             newregion |         N       Sum      Mean
          -------------+------------------------------
              Americas |         1       .06       .06
                  Asia |         5       363      72.6
                Europe |         5       3.6       .72
          -------------+------------------------------
                 Total |        11    366.66  33.33273
          --------------------------------------------
          
          .
          . collapse norwaydac2a, by(newregion)
          
          .
          . list
          
               +---------------------+
               | newreg~n   norwa~2a |
               |---------------------|
            1. | Americas        .06 |
            2. |     Asia       72.6 |
            3. |   Europe        .72 |
               +---------------------+
          I can't see that collapse is doing anything that isn't expected.

          The data example we need to look at this is presumably


          Code:
          dataex  norwaydac2a newregion if inrange(year, 2017, 2021)
          which may well be more than 100 observations, so look at the help for dataex.
          Last edited by Nick Cox; 14 Feb 2023, 01:30.

          Comment


          • #6
            I provide the dataex you mentioned. But, the numbers for the corresponding regions are not correct.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float Norwaydac2a str12 newregion
            67.52 "Asia"      
            81.01 "Asia"      
            71.62 "Asia"      
            79.24 "Asia"      
            63.61 "Asia"      
                . "Americas"  
                . "Americas"  
                . "Americas"  
                . "Americas"  
                . "Americas"  
              .21 "Europe"    
              .67 "Europe"    
             1.22 "Europe"    
              1.4 "Europe"    
               .1 "Europe"    
                . "Americas"  
                . "Americas"  
                . "Americas"  
                . "Americas"  
              .06 "Americas"  
              .11 "Asia"      
              .03 "Asia"      
                . "Asia"      
              .54 "Asia"      
              .03 "Asia"      
              .13 "Asia"      
              .03 "Asia"      
              .04 "Asia"      
              .55 "Asia"      
                . "Asia"      
             3.91 "Subsaharan"
             3.74 "Subsaharan"
             3.73 "Subsaharan"
             3.76 "Subsaharan"
              2.5 "Subsaharan"
                . "Subsaharan"
                . "Subsaharan"
              .28 "Subsaharan"
              .71 "Subsaharan"
              .53 "Subsaharan"
             2.45 "Subsaharan"
             2.46 "Subsaharan"
             5.34 "Subsaharan"
             7.55 "Subsaharan"
             6.17 "Subsaharan"
            11.53 "Asia"      
            12.08 "Asia"      
            12.82 "Asia"      
            14.95 "Asia"      
            14.17 "Asia"      
             3.97 "Europe"    
             4.42 "Europe"    
             5.29 "Europe"    
             4.66 "Europe"    
             3.17 "Europe"    
              .06 "Europe"    
              .07 "Europe"    
              .05 "Europe"    
              .57 "Europe"    
              .37 "Europe"    
             2.55 "Americas"  
              2.7 "Americas"  
              2.2 "Americas"  
                2 "Americas"  
             1.68 "Americas"  
            49.19 "Americas"  
            69.87 "Americas"  
            13.11 "Americas"  
            17.45 "Americas"  
            20.35 "Americas"  
              .16 "Asia"      
              .07 "Asia"      
              .06 "Asia"      
              .01 "Asia"      
              .05 "Asia"      
                . "Subsaharan"
                . "Subsaharan"
                0 "Subsaharan"
              .04 "Subsaharan"
              .04 "Subsaharan"
             8.59 "Subsaharan"
             7.08 "Subsaharan"
             6.62 "Subsaharan"
             4.55 "Subsaharan"
             5.78 "Subsaharan"
             9.06 "Asia"      
              9.6 "Asia"      
            10.14 "Asia"      
             9.07 "Asia"      
             7.04 "Asia"      
              .27 "Subsaharan"
              .31 "Subsaharan"
              .12 "Subsaharan"
              .11 "Subsaharan"
              .15 "Subsaharan"
             1.52 "Subsaharan"
             2.22 "Subsaharan"
             2.69 "Subsaharan"
             2.76 "Subsaharan"
             2.36 "Subsaharan"
            end
            Here, I also provide som of the actual numbers specifically goes to recipients.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str60 recipient str15 region double year float Norwaydac2a
            "Afghanistan"                  "Asia"     2017 67.52
            "Afghanistan"                  "Asia"     2018 81.01
            "Afghanistan"                  "Asia"     2019 71.62
            "Afghanistan"                  "Asia"     2020 79.24
            "Afghanistan"                  "Asia"     2021 63.61
            "Belize"                       "Americas" 2017     .
            "Belize"                       "Americas" 2018     .
            "Belize"                       "Americas" 2019     .
            "Belize"                       "Americas" 2020     .
            "Belize"                       "Americas" 2021     .
            "Albania"                      "Europe"   2017   .21
            "Albania"                      "Europe"   2018   .67
            "Albania"                      "Europe"   2019  1.22
            "Albania"                      "Europe"   2020   1.4
            "Albania"                      "Europe"   2021    .1
            "Argentina"                    "Americas" 2017     .
            "Argentina"                    "Americas" 2018     .
            "Argentina"                    "Americas" 2019     .
            "Argentina"                    "Americas" 2020     .
            "Argentina"                    "Americas" 2021   .06
            "Armenia"                      "Asia"     2017   .11
            "Armenia"                      "Asia"     2018   .03
            "Armenia"                      "Asia"     2019     .
            "Armenia"                      "Asia"     2020   .54
            "Armenia"                      "Asia"     2021   .03
            "Azerbaijan"                   "Asia"     2017   .13
            "Azerbaijan"                   "Asia"     2018   .03
            "Azerbaijan"                   "Asia"     2019   .04
            "Azerbaijan"                   "Asia"     2020   .55
            "Azerbaijan"                   "Asia"     2021     .
            "Burundi"                      "Africa"   2017  3.91
            "Burundi"                      "Africa"   2018  3.74
            "Burundi"                      "Africa"   2019  3.73
            "Burundi"                      "Africa"   2020  3.76
            "Burundi"                      "Africa"   2021   2.5
            "Benin"                        "Africa"   2017     .
            "Benin"                        "Africa"   2018     .
            "Benin"                        "Africa"   2019   .28
            "Benin"                        "Africa"   2020   .71
            "Benin"                        "Africa"   2021   .53
            "Burkina Faso"                 "Africa"   2017  2.45
            "Burkina Faso"                 "Africa"   2018  2.46
            "Burkina Faso"                 "Africa"   2019  5.34
            "Burkina Faso"                 "Africa"   2020  7.55
            "Burkina Faso"                 "Africa"   2021  6.17
            "Bangladesh"                   "Asia"     2017 11.53
            "Bangladesh"                   "Asia"     2018 12.08
            "Bangladesh"                   "Asia"     2019 12.82
            "Bangladesh"                   "Asia"     2020 14.95
            "Bangladesh"                   "Asia"     2021 14.17
            "Bosnia and Herzegovina"       "Europe"   2017  3.97
            "Bosnia and Herzegovina"       "Europe"   2018  4.42
            "Bosnia and Herzegovina"       "Europe"   2019  5.29
            "Bosnia and Herzegovina"       "Europe"   2020  4.66
            "Bosnia and Herzegovina"       "Europe"   2021  3.17
            "Belarus"                      "Europe"   2017   .06
            "Belarus"                      "Europe"   2018   .07
            "Belarus"                      "Europe"   2019   .05
            "Belarus"                      "Europe"   2020   .57
            "Belarus"                      "Europe"   2021   .37
            "Bolivia"                      "Americas" 2017  2.55
            "Bolivia"                      "Americas" 2018   2.7
            "Bolivia"                      "Americas" 2019   2.2
            "Bolivia"                      "Americas" 2020     2
            "Bolivia"                      "Americas" 2021  1.68
            "Brazil"                       "Americas" 2017 49.19
            "Brazil"                       "Americas" 2018 69.87
            "Brazil"                       "Americas" 2019 13.11
            "Brazil"                       "Americas" 2020 17.45
            "Brazil"                       "Americas" 2021 20.35
            "Bhutan"                       "Asia"     2017   .16
            "Bhutan"                       "Asia"     2018   .07
            "Bhutan"                       "Asia"     2019   .06
            "Bhutan"                       "Asia"     2020   .01
            "Bhutan"                       "Asia"     2021   .05
            "Botswana"                     "Africa"   2017     .
            "Botswana"                     "Africa"   2018     .
            "Botswana"                     "Africa"   2019     0
            "Botswana"                     "Africa"   2020   .04
            "Botswana"                     "Africa"   2021   .04
            "Central African Republic"     "Africa"   2017  8.59
            "Central African Republic"     "Africa"   2018  7.08
            "Central African Republic"     "Africa"   2019  6.62
            "Central African Republic"     "Africa"   2020  4.55
            "Central African Republic"     "Africa"   2021  5.78
            "China (People's Republic of)" "Asia"     2017  9.06
            "China (People's Republic of)" "Asia"     2018   9.6
            "China (People's Republic of)" "Asia"     2019 10.14
            "China (People's Republic of)" "Asia"     2020  9.07
            "China (People's Republic of)" "Asia"     2021  7.04
            "Côte d'Ivoire"               "Africa"   2017   .27
            "Côte d'Ivoire"               "Africa"   2018   .31
            "Côte d'Ivoire"               "Africa"   2019   .12
            "Côte d'Ivoire"               "Africa"   2020   .11
            "Côte d'Ivoire"               "Africa"   2021   .15
            "Cameroon"                     "Africa"   2017  1.52
            "Cameroon"                     "Africa"   2018  2.22
            "Cameroon"                     "Africa"   2019  2.69
            "Cameroon"                     "Africa"   2020  2.76
            "Cameroon"                     "Africa"   2021  2.36
            end
            I need such an instrument by which I can collapse all countries' last-five-year foreign aid flows (average) to specific regions actually:

            Code:
            collapse (mean) *dac2a, by(newregion) if year>2016

            Comment


            • #7
              Just saying that the results are "not correct" is no help to us, unfortunately. What precisely is wrong? What is puzzling? You need to give an example where we can see what is troubling you. I don't think you've done that yet.

              Note that

              Code:
               
               collapse (mean) *dac2a, by(newregion) if year>2016
              won't work as the if qualifier needs to go before the comma. Otherwise, nothing in your data example looks problematic to me. Note that collapse and tabstat here both ignore the missing values.

              Code:
               tabstat Norwaydac2a , by(newregion) s(n mean sum)
              
              Summary for variables: Norwaydac2a
              Group variable: newregion 
              
                 newregion |         N      Mean       Sum
              -------------+------------------------------
                  Americas |        11  16.46909    181.16
                      Asia |        28  16.97393    475.27
                    Europe |        15  1.748667     26.23
                Subsaharan |        31  2.849677     88.34
              -------------+------------------------------
                     Total |        85  9.070588       771
              --------------------------------------------
              
              . 
              . collapse (count) n=Norwaydac2a (mean) mean=Norwaydac2a (sum) sum=Norwaydac2a, by(newregion)
              
              . 
              . list
              
                   +-------------------------------------+
                   |  newregion    n       mean      sum |
                   |-------------------------------------|
                1. |   Americas   11   16.46909   181.16 |
                2. |       Asia   28   16.97393   475.27 |
                3. |     Europe   15   1.748667    26.23 |
                4. | Subsaharan   31   2.849678    88.34 |
                   +-------------------------------------+
              I guess I wasn't clear enough in #6 dataex won't by default give more than 100 observations. See its count() option. However, I don't sense that that is an issue.

              In essence your hypothesis seems to be that collapse is wrong, but it is usually better to assume that the misunderstanding is either about your data or about quite what collapse is doing.

              Comment

              Working...
              X