Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a cross-sectional dataset from a raw data (includes a SQL code)

    Dear all,

    I often work with SQL to transform any raw material into a time-series or cross-sectional dataset. However, I need to do a lot of merging and other operations after exporting data from SQL. I am looking for a way to do similar operations in STATA 17 to save my time because I do all statistical and visualization operations in STATA.

    1. I have a raw material as below.
    2. I need to take the average of "grants" between 2010 and 2014 (2011, 2012, 2013).
    3. These values should be grouped by countries.
    4. The output will include the country (recipient) and a numerical value of grants between 2010 and 2014 (amountgrant)

    The raw data:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3 recommended str37 recipient int year str12 recipientregion double amountconstant
    "Yes" "Afghanistan"         2017 "Asia"    2015410.812
    "Yes" "Afghanistan"         2017 "Asia"              .
    "Yes" "Afghanistan"         2017 "Asia"        1000000
    "Yes" "Afghanistan"         2017 "Asia"    2663728.108
    "Yes" "Afghanistan"         2017 "Asia"              .
    "Yes" "Afghanistan"         2017 "Asia"              .
    "Yes" "Afghanistan"         2017 "Asia"        1000000
    "Yes" "Afghanistan"         2017 "Asia"    73992447.44
    "No"  "Afghanistan"         2017 "Asia"       50000000
    "Yes" "Afghanistan"         2017 "Asia"    501868.1629
    "Yes" "Afghanistan"         2017 "Asia"              .
    "Yes" "Afghanistan"         2017 "Asia"              .
    "Yes" "Afghanistan"         2017 "Asia"              .
    "Yes" "Afghanistan"         2017 "Asia"    7399244.744
    "Yes" "Africa, regional"    2017 "Africa"    2.500e+08
    "No"  "Africa, regional"    2017 "Africa"      4000000
    "Yes" "Albania"             2017 "Europe"      3000000
    "Yes" "Albania"             2017 "Europe"  371410.2382
    "Yes" "Albania"             2017 "Europe"  148464.4063
    "Yes" "Albania"             2017 "Europe"  148464.4063
    "Yes" "Albania"             2017 "Europe"  1485898.069
    "Yes" "Algeria"             2017 "Africa"  29596978.97
    "No"  "America, regional"   2017 "America"    30000000
    "Yes" "America, regional"   2017 "America"     5000000
    "Yes" "Angola"              2017 "Africa"     60000000
    "Yes" "Angola"              2017 "Africa"            .
    "Yes" "Angola"              2017 "Africa"    1.000e+09
    "Yes" "Angola"              2017 "Africa"    1.000e+09
    "Yes" "Angola"              2017 "Africa"    405053533
    "Yes" "Angola"              2017 "Africa"     65300000
    "Yes" "Angola"              2017 "Africa"     35300000
    "No"  "Angola"              2017 "Africa"  7399244.744
    "Yes" "Angola"              2017 "Africa"            .
    "Yes" "Angola"              2017 "Africa"            .
    "Yes" "Angola"              2017 "Africa"        30000
    "Yes" "Angola"              2017 "Africa"            .
    "Yes" "Angola"              2017 "Africa"            .
    "Yes" "Angola"              2017 "Africa"            .
    "Yes" "Angola"              2017 "Africa"            .
    "No"  "Angola"              2017 "Africa"            .
    "Yes" "Angola"              2017 "Africa"    3.760e+08
    "Yes" "Angola"              2017 "Africa"     38000000
    "Yes" "Angola"              2017 "Africa"     28000000
    "Yes" "Angola"              2017 "Africa"            .
    "Yes" "Antigua and Barbuda" 2017 "America"      175000
    "Yes" "Antigua and Barbuda" 2017 "America"           .
    "Yes" "Antigua and Barbuda" 2017 "America"           .
    "Yes" "Antigua and Barbuda" 2017 "America"           .
    "Yes" "Antigua and Barbuda" 2017 "America"           .
    "Yes" "Antigua and Barbuda" 2017 "America"     2000000
    "Yes" "Argentina"           2017 "America"   1.500e+08
    "Yes" "Argentina"           2017 "America"    50000000
    "Yes" "Argentina"           2017 "America"           .
    "Yes" "Argentina"           2017 "America"    36250000
    "No"  "Argentina"           2017 "America"   2.200e+09
    "Yes" "Argentina"           2017 "America"   3.315e+08
    "No"  "Argentina"           2017 "America"           .
    "Yes" "Argentina"           2017 "America"   3.000e+08
    "Yes" "Argentina"           2017 "America"           .
    "Yes" "Argentina"           2017 "America"   3.473e+08
    "Yes" "Argentina"           2017 "America"   166764178
    "Yes" "Argentina"           2017 "America"           .
    "Yes" "Argentina"           2017 "America"     9600000
    "Yes" "Argentina"           2017 "America"   1.000e+08
    "Yes" "Argentina"           2017 "America"    73330000
    "Yes" "Argentina"           2017 "America"    18000000
    "Yes" "Armenia"             2017 "Asia"    14798489.49
    "Yes" "Armenia"             2017 "Asia"              .
    "No"  "Armenia"             2017 "Asia"              .
    "No"  "Asia, regional"      2017 "Asia"              .
    "Yes" "Asia, regional"      2017 "Asia"        5150000
    "Yes" "Azerbaijan"          2017 "Asia"              .
    "Yes" "Azerbaijan"          2017 "Asia"              .
    "Yes" "Azerbaijan"          2017 "Asia"    3341498.926
    "No"  "Azerbaijan"          2017 "Asia"      4.600e+08
    "No"  "Azerbaijan"          2017 "Asia"              .
    "No"  "Azerbaijan"          2017 "Asia"              .
    "Yes" "Azerbaijan"          2017 "Asia"       21430000
    "Yes" "Bahamas"             2017 "America"           .
    "Yes" "Bahamas"             2017 "America"       50000
    "Yes" "Bangladesh"          2017 "Asia"    84301667.15
    "Yes" "Bangladesh"          2017 "Asia"              .
    "No"  "Bangladesh"          2017 "Asia"      752790000
    "Yes" "Bangladesh"          2017 "Asia"    73992447.44
    "Yes" "Bangladesh"          2017 "Asia"      467840000
    "Yes" "Bangladesh"          2017 "Asia"              .
    "Yes" "Bangladesh"          2017 "Asia"              .
    "Yes" "Bangladesh"          2017 "Asia"              .
    "Yes" "Bangladesh"          2017 "Asia"              .
    "Yes" "Bangladesh"          2017 "Asia"              .
    "Yes" "Bangladesh"          2017 "Asia"              .
    "No"  "Bangladesh"          2017 "Asia"    151548381.7
    "Yes" "Bangladesh"          2017 "Asia"      1.750e+09
    "Yes" "Bangladesh"          2017 "Asia"       59180000
    "Yes" "Bangladesh"          2017 "Asia"    82871.54113
    "Yes" "Bangladesh"          2017 "Asia"              .
    "Yes" "Bangladesh"          2017 "Asia"        4000000
    "Yes" "Barbados"            2017 "America"     3036500
    "Yes" "Barbados"            2017 "America"     4500000
    "Yes" "Barbados"            2017 "America"           .
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 5880 observations
    Use the count() option to list more


    The SQL Code:

    Code:
    SELECT Recipient, AVG(Amountconstant) AS amountgrant
    FROM campaign.stataicinexcelcindis2
    WHERE flowtype="Grant" AND recommended="Yes" AND Year BETWEEN 2010 AND 2014
    GROUP BY Recipient
    ORDER BY Recipient;
    The SQL Output:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str37 recipient float amountgrant
    "Afghanistan"                             2693440
    "Africa, regional"                       17543278
    "Albania"                                211830.6
    "Algeria"                                19045524
    "Angola"                                4872694.5
    "Antigua and Barbuda"                     3005717
    "Argentina"                                     0
    "Armenia"                                 4756718
    "Asia, regional"                        1040752.6
    "Azerbaijan"                            1173111.3
    "Bahamas"                               145344.61
    "Bangladesh"                              5184668
    "Barbados"                              153018.23
    "Belarus"                                 6544690
    "Benin"                                 1701054.5
    "Bolivia"                               2481877.3
    "Bosnia and Herzegovina"                3526002.5
    "Botswana"                              1822016.6
    "Brazil"                                 11866.77
    "Brunei Darussalam"                             0
    "Bulgaria"                               5258.365
    "Burundi"                                 2857723
    "Cabo Verde"                              3632353
    "Cambodia"                              1516418.6
    "Cameroon"                                3352458
    "Central African Republic"              2100562.5
    "Chad"                                  605847.25
    "Chile"                                 2178389.5
    "Colombia"                              26306.467
    "Comoros"                               2940998.5
    "Congo"                                 1374411.3
    "Cook Islands"                          2133152.5
    "Costa Rica"                              9349430
    "Cote d'Ivoire"                         3001891.5
    "Cuba"                                   872077.1
    "Curacao"                                       0
    "Democratic People's Republic of Korea"  82094608
    "Democratic Republic of the Congo"      1535550.5
    "Djibouti"                                3961633
    "Dominica"                                3298041
    "Ecuador"                                 1539770
    "Egypt"                                 2193800.3
    "Equatorial Guinea"                      18328.46
    "Eritrea"                               4003123.5
    "Ethiopia"                              2211797.8
    "Fiji"                                  1360553.3
    "Gabon"                                  451582.8
    "Georgia"                                 3573198
    "Ghana"                                 3365985.5
    "Grenada"                                 2959077
    "Guinea"                                 560153.2
    "Guinea-Bissau"                         1070760.4
    "Guyana"                                4234969.5
    "Haiti"                                   1347462
    "India"                                  402877.7
    "Indonesia"                              636113.1
    "Iran"                                          0
    "Iraq"                                    4692445
    "Israel"                                        0
    "Jamaica"                               1226682.9
    "Jordan"                                1502868.5
    "Kazakhstan"                              9271880
    "Kenya"                                   7436292
    "Kyrgyz Republic"                         3493127
    "Lao People's Democratic Republic"      3407316.5
    "Lebanon"                               3491331.5
    "Lesotho"                                 1818740
    "Liberia"                               3959356.5
    "Libya"                                 1101399.6
    "Madagascar"                            1719735.6
    "Malawi"                                 845972.5
    "Malaysia"                               8245.264
    "Maldives"                                2518328
    "Mali"                                    4312894
    "Mauritania"                            2660211.5
    "Mauritius"                             3308985.5
    "Mexico"                                        0
    "Micronesia"                              2458902
    "Moldova"                                 1420208
    "Mongolia"                                3751241
    "Montenegro"                             513589.5
    "Morocco"                                989962.2
    "Mozambique"                              2733230
    "Multi-Region"                             331014
    "Myanmar"                               3155168.5
    "Namibia"                                 1669605
    "Nepal"                                   4764897
    "Niger"                                  647147.1
    "Nigeria"                               309181.75
    "Niue"                                  1266054.8
    "North Macedonia"                         1339298
    "Pakistan"                                6549959
    "Panama"                                        0
    "Papua New Guinea"                      2500288.5
    "Peru"                                  1302118.4
    "Philippines"                           525402.25
    "Romania"                                       0
    "Russia"                                 446379.4
    "Rwanda"                                  4662713
    "Samoa"                                 2593322.5
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 131 observations
    Use the count() option to list more





  • #2
    This is what collapse can do.

    Comment


    • #3
      Thank you Nick Cox, I applied collapse by the way below, and I think it worked.

      Code:
      collapse (mean) amountconstant , by(recipient), if year>2010 & year<2014 & recommended=="Yes" & flowtype=="Grant"

      Comment


      • #4
        Dear Nick Cox,

        How can we generate a time-series panel data grouped by recipient and year at the same time?

        There might be a solution as below:

        [CODE]
        collapse (mean) amountconstant , by(recipient) & by(year), if flowtype=="Grant"

        Comment


        • #5
          Originally posted by Nihat Mugurtay View Post
          How can we generate a time-series panel data grouped by recipient and year at the same time?

          There might be a solution as below:
          You might be looking for something more like the following.
          Code:
          set type double
          collapse (mean) amountgrant = amountconstant if flowtype == "Grant" & recommend == "Yes", by(recipient year)
          (I don't know whether collapse uses double-precision floating point datatype for its intermediate arithmetic, but you've got some fairly wide ranges in magnitude that you're averaging, and so it might be better to play it safe by increasing the default precision manually.)

          Comment


          • #6
            Dear Joseph Coveney,

            Thank you so much,

            I am very close to what I am looking for. I ran your command with "sum" option as below. However, when I check the STATA output with SQL, I realized that there are some missing years. For instance, STATA does not provide values for the years between 2000 and 2007 for Afghanistan. However, SQL provides all the output by the command below.


            STATA CODE

            Code:
            collapse (sum) amountgrant = amountconstant if flowtype == "Grant" & recommend == "Yes", by(recipient year)

            SQL CODE that I get all years:

            Code:
            SELECT Recipient, Year, SUM(Amountconstant) AS amountgrant
            FROM campaign.stataicinexcelcindis2
            WHERE recommended="Yes" AND flowtype="Grant"
            GROUP BY Recipient, Year
            ORDER BY Recipient, Year;
            Last edited by Nihat Mugurtay; 22 May 2022, 09:01.

            Comment


            • #7
              You don't have those years for Afghanistan in your Stata dataset. You can confirm that with
              Code:
              count if recipient == "Afghanistan" & recommended == "Yes" & inlist(year, 2000, 2007)
              // or
              tabulate year if recipient == "Afganistan" & recommended == "Yes"
              You'll need to fill in the DSN or connection information, but you can get at it with something like the following.
              Code:
              #delimit ;
              local sql_statement SELECT recipient, year, amountconstant
                  FROM campaign.stataicinexcelcindis2
                      WHERE recommended = "Yes" AND flowtype = "Grant";
              #delimit cr
              odbc load, exec("`sql_statement';") dsn("<you need to fill this in>") clear
              set type double
              collapse (mean) amountgrant = amountconstant, by(recipient year)
              and look for those two years for Afganistan.

              Comment


              • #8
                Thank you Joseph Coveney,

                It was great help.

                Best

                Comment


                • #9
                  Originally posted by Nihat Mugurtay View Post
                  Thank you

                  It was great help.
                  You're welcome. By the way, I think that usage of single quotation marks is conventional in SQL, at least in the dialects that I'm familiar with.

                  So:
                  Code:
                  local sql_statement SELECT recipient, year, amountconstant
                      FROM campaign.stataicinexcelcindis2
                          WHERE recommended = 'Yes' AND flowtype = 'Grant';

                  Comment


                  • #10
                    Thank you so much Joseph Coveney,

                    Yes, a single quotation is the usual one. But, it is a STATA convention.

                    Comment


                    • #11
                      Originally posted by Joseph Coveney View Post
                      You don't have those years for Afghanistan in your Stata dataset. You can confirm that with
                      Code:
                      count if recipient == "Afghanistan" & recommended == "Yes" & inlist(year, 2000, 2007)
                      // or
                      tabulate year if recipient == "Afganistan" & recommended == "Yes"
                      You'll need to fill in the DSN or connection information, but you can get at it with something like the following.
                      Code:
                      #delimit ;
                      local sql_statement SELECT recipient, year, amountconstant
                      FROM campaign.stataicinexcelcindis2
                      WHERE recommended = "Yes" AND flowtype = "Grant";
                      #delimit cr
                      odbc load, exec("`sql_statement';") dsn("<you need to fill this in>") clear
                      set type double
                      collapse (mean) amountgrant = amountconstant, by(recipient year)
                      and look for those two years for Afganistan.
                      Dear Joseph Coveney ,

                      I solved the problem. I realized that MySQL automatically used a limit for 10000 observations.
                      Your message helped me a lot. Therefore, I have my correct results now.

                      Best,

                      Comment


                      • #12
                        Dear Friends,

                        I am trying to use "collapse" command for a specific purpose. Here, I provide a data example of China's military support for specific countries. I want to create a cross-sectional data, the units are countires. Each country should correspond to the count of China's military aid/support for those recipients. Suppose, we tabulate "Recipient" variable below (sorting is not significant at this moments) There should be a code like this:

                        Code:
                        preserve
                        
                        collapse count(Recipient), by(Recipient)
                        
                        restore

                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input str32 Recipient str11 RecipientRegion int CommitmentYear
                        "Philippines"                      "Asia"        2017
                        "Kyrgyz Republic"                  "Asia"        2017
                        "Afghanistan"                      "Asia"        2017
                        "Bahamas"                          "America"     2017
                        "Guyana"                           "America"     2017
                        "Central African Republic"         "Africa"      2017
                        "Lesotho"                          "Africa"      2017
                        "Liberia"                          "Africa"      2017
                        "Sudan"                            "Africa"      2017
                        "Mali"                             "Africa"      2017
                        "Niger"                            "Africa"      2017
                        "Trinidad and Tobago"              "America"     2017
                        "Democratic Republic of the Congo" "Africa"      2017
                        "Ghana"                            "Africa"      2017
                        "South Sudan"                      "Africa"      2017
                        "Fiji"                             "Oceania"     2017
                        "Philippines"                      "Asia"        2017
                        "Lao People's Democratic Republic" "Asia"        2017
                        "Papua New Guinea"                 "Oceania"     2017
                        "Lao People's Democratic Republic" "Asia"        2017
                        "Lebanon"                          "Middle East" 2017
                        "Thailand"                         "Asia"        2017
                        "Bosnia and Herzegovina"           "Europe"      2017
                        "Armenia"                          "Asia"        2017
                        "Belarus"                          "Europe"      2017
                        "Myanmar"                          "Asia"        2017
                        "Peru"                             "America"     2017
                        "Cote d'Ivoire"                    "Africa"      2017
                        "Bolivia"                          "America"     2017
                        "Bolivia"                          "America"     2017
                        "Sri Lanka"                        "Asia"        2017
                        "Bosnia and Herzegovina"           "Europe"      2017
                        "Mali"                             "Africa"      2017
                        "Ghana"                            "Africa"      2017
                        "Philippines"                      "Asia"        2016
                        "Guinea"                           "Africa"      2016
                        "Mozambique"                       "Africa"      2016
                        "Liberia"                          "Africa"      2016
                        "Tajikistan"                       "Asia"        2016
                        "Tajikistan"                       "Asia"        2016
                        "Afghanistan"                      "Asia"        2016
                        "Afghanistan"                      "Asia"        2016
                        "Barbados"                         "America"     2016
                        "South Sudan"                      "Africa"      2016
                        "Bolivia"                          "America"     2016
                        "Lao People's Democratic Republic" "Asia"        2016
                        "Iraq"                             "Middle East" 2016
                        "Philippines"                      "Asia"        2016
                        "Philippines"                      "Asia"        2016
                        "Philippines"                      "Asia"        2016
                        "Armenia"                          "Asia"        2016
                        "Serbia"                           "Europe"      2016
                        "Bolivia"                          "America"     2016
                        "Kenya"                            "Africa"      2016
                        "Guyana"                           "America"     2016
                        "Bolivia"                          "America"     2016
                        "Bolivia"                          "America"     2016
                        "Sri Lanka"                        "Asia"        2016
                        "Ghana"                            "Africa"      2016
                        "Mali"                             "Africa"      2016
                        "Afghanistan"                      "Asia"        2016
                        "Afghanistan"                      "Asia"        2016
                        "Papua New Guinea"                 "Oceania"     2016
                        "Africa, regional"                 "Africa"      2016
                        "Tajikistan"                       "Asia"        2016
                        "Cambodia"                         "Asia"        2015
                        "Guinea-Bissau"                    "Africa"      2015
                        "Liberia"                          "Africa"      2015
                        "Somalia"                          "Africa"      2015
                        "Uganda"                           "Africa"      2015
                        "Africa, regional"                 "Africa"      2015
                        "Armenia"                          "Asia"        2015
                        "Djibouti"                         "Africa"      2015
                        "Namibia"                          "Africa"      2015
                        "India"                            "Asia"        2015
                        "Afghanistan"                      "Asia"        2015
                        "Ghana"                            "Africa"      2015
                        "Jordan"                           "Middle East" 2015
                        "Lebanon"                          "Middle East" 2015
                        "Cote d'Ivoire"                    "Africa"      2015
                        "Iraq"                             "Middle East" 2015
                        "Nigeria"                          "Africa"      2015
                        "Cote d'Ivoire"                    "Africa"      2015
                        "Mongolia"                         "Asia"        2015
                        "Armenia"                          "Asia"        2015
                        "Serbia"                           "Europe"      2015
                        "Zambia"                           "Africa"      2015
                        "Pakistan"                         "Asia"        2015
                        "Peru"                             "America"     2015
                        "Mali"                             "Africa"      2015
                        "Bolivia"                          "America"     2015
                        "Liberia"                          "Africa"      2014
                        "Ecuador"                          "America"     2014
                        "Kenya"                            "Africa"      2014
                        "Guyana"                           "America"     2014
                        "Cambodia"                         "Asia"        2014
                        "Cambodia"                         "Asia"        2014
                        "South Sudan"                      "Africa"      2014
                        "Namibia"                          "Africa"      2014
                        "Mauritania"                       "Africa"      2014
                        end
                        ------------------ copy up to and including the previous line ------------------

                        Listed 100 out of 446 observations
                        Use the count() option to list more

                        Comment


                        • #13
                          Actually,

                          I couldn't solve the problem without creating a numerical "1" for each row. I created a variable and assigned a numerical "1" for each row, then used the collapse option, as below. In this way, I get the count of aid operations to specific recipients. But, if anyone offers another option (without creating a numerical), it will be the answer.

                          Best,

                          Code:
                          gen newnumerical=1
                          collapse (sum) newnumerical, by(Recipient)

                          Comment

                          Working...
                          X