Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapse (sum) while making a loop

    Hi

    I am working with a dataset that has 75 years of data. The countries are observations - so every country's name appears 75 times. There are indicator variables that I need to sum for each country.

    So I used the following command: collapse (sum) signed enforced terminated signed_high enforced_high, by (country1 year)

    /* All the variables above are indicator variables */

    However, I do not get the desired result. The summation takes place across all countries. For example, if the sum after the first country is 3, the sum for the second country starts from 4. I'd like to have individual sums for countries. I was wondering if I should be creating a loop for each country but not sure how to do that either.

    Here is a glimpse of the data I am currently working with: ​​​​​https://docs.google.com/spreadsheets...it?usp=sharing

    Thanks

  • #2
    Well, I don't know how you get the result you say you have, but I can tell you why you don't have the one desired:

    Using collapse is not appropriate here. If you have one observation per country per year, making something collapse (sum), by (country year) won't change anything (except dropping other variables.)

    You could use collapse ,by(country) to compute the sum by country (what you want), but this will damage your dataset since you'll have only one observation per country remaining.

    I think you should rather use the sum function, combined with bysort:
    Code:
    bysort country : gen sum_var1 =sum(var1)
    This new variable would be the sum of past values of var1 (up to the current year).
    If by sum you meant total (i.e. the total of all years) you can run

    Code:
    bysort country : egen tot_var1 =total(var1)
    Hope this helps,
    Best,
    Charlie

    Comment


    • #3
      I tried that. But Collapse command allows me to drop country2 and as well as have a single year entry for each country as opposed to having the same year appear multiple number of times. With the sum command I am left with having to drop random observations and the summation is over multiple sets of years for each year as opposed to a single set of years.

      Comment


      • #4
        Ok, taking a closer look to your data I see what wasn't right.
        Actually, you don't have 75 observations per country (whereas you said every country's name appears 75 times)
        You have bilateral data between country 1 and country 2, over 58 years (1959-2016), so each country (either in country1 and country2) appear much more than 75 times (depending on your countries sample).

        In the data you posted, you report only Afghanistan for country1, but I assume this changes in your whole data. Otherwise simply delete it, you know you're working on Afghan relationship. So in my further codes, I assume country1 to vary.

        The main problem here is that I didn't precisely understand what data structure you want. You have a bilateral panel data (country1-country2 as identifier, and year as time variable).

        You could want to have only bilateral data, so every bilateral relationship only to be observed once instead of 58 times, and the values of the bilateral variables to be summed over the 58 years.
        So your initial code was close enough, simply try:
        Code:
        collapse (sum) signed enforced terminated signed_high enforced_high, by (country1 country2)
        This will give you data like:
        (In the data you gave us, all kept variables are null, so the sum is also null)
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str11 country1 str40 country2 double(signed enforced terminated signed_high enforced_high)
        "Afghanistan" "Albania"                                  0 0 0 0 0
        "Afghanistan" "Algeria"                                  0 0 0 0 0
        "Afghanistan" "Angola"                                   0 0 0 0 0
        "Afghanistan" "Antigua and Barbuda"                      0 0 0 0 0
        "Afghanistan" "Argentina"                                0 0 0 0 0
        "Afghanistan" "Armenia"                                  0 0 0 0 0
        "Afghanistan" "Australia"                                0 0 0 0 0
        "Afghanistan" "Austria"                                  0 0 0 0 0
        "Afghanistan" "Azerbaijan"                               0 0 0 0 0
        "Afghanistan" "BLEU (Belgium-Luxembourg Economic Union)" 0 0 0 0 0
        "Afghanistan" "Bahamas"                                  0 0 0 0 0
        "Afghanistan" "Bahrain"                                  0 0 0 0 0
        "Afghanistan" "Bangladesh"                               0 0 0 0 0
        "Afghanistan" "Barbados"                                 0 0 0 0 0
        "Afghanistan" "Belarus"                                  0 0 0 0 0
        "Afghanistan" "Belgium"                                  0 0 0 0 0
        "Afghanistan" "Belize"                                   0 0 0 0 0
        "Afghanistan" "Benin"                                    0 0 0 0 0
        "Afghanistan" "Bolivia"                                  0 0 0 0 0
        end
        But in your last post you tend to say that you want to get rid of country2.

        You could want you want simply panel data (country1-year), in this case your initial post is correct, and I don't understand what's wrong.


        Third solution : you want a simple panel data (country2-year)
        Code:
        collapse (sum) signed enforced terminated signed_high enforced_high, by (country2 year)
        Last solution : you want to keep the bilateral panel structure (country1-country2 ; year), then don't change anything, expect perhaps creating a bilateral identifier variable with the goup command


        What is precisely the data structure you're aiming to? And please post (using dataex) some sample of you dataset where the issue you reported in #1 appears, here we only have zeros, so I don't see any sum issues.

        Best,
        Charlie



        Comment

        Working...
        X