Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transforming data to panel data.

    Hello Statalist

    I am currently working with the Global Sanction Database, which lists every country that has been hit by sanctions from 1950 to 2022. My issue is that I need this to be panel data. It currently lists one observation pr. instance of sanctions against a country (often multiple instances pr. year), and no observations in the years that the country has not been sanctioned. However I need to create observations for all years, for all countries, with a simple dummy variable (sanctioned/ not sanctioned). Not sanctioned is all the missing years for each country, and sanctioned is all the years where there is observations in the dataset.

    Does anyone have a way to to this, without having to do it manually? Ideally I would like to have an extra variable that counts how many instances of sanctions against a country, there has been in any given year.

    Thanks in advance.

    [CODE]
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str46 sanctioned_state int year byte(arms military trade financial travel other) long country_id
    "Afghanistan" 1979 0 1 0 1 0 0 1
    "Afghanistan" 1996 1 0 0 0 0 0 1
    "Afghanistan" 1996 1 0 0 0 0 0 1
    "Afghanistan" 1996 1 0 0 0 0 0 1
    "Afghanistan" 1996 1 0 0 0 0 0 1
    "Afghanistan" 1996 1 0 0 0 0 0 1
    "Afghanistan" 1996 1 0 0 0 0 0 1
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1950 0 0 1 0 0 0 3
    "Albania" 1951 0 0 1 0 0 0 3
    "Albania" 1951 0 0 1 0 0 0 3
    "Albania" 1951 0 0 1 0 0 0 3
    "Albania" 1951 0 0 1 0 0 0 3
    "Albania" 1951 0 0 1 0 0 0 3
    "Algeria" 1962 0 0 0 1 0 0 50
    "Algeria" 1965 0 0 1 1 0 0 50
    "Algeria" 1966 0 0 1 1 0 0 50
    "Algeria" 1967 0 0 1 1 0 0 50
    "Algeria" 1968 0 0 1 1 0 0 50
    "Algeria" 1969 0 0 1 1 0 0 50
    "Algeria" 1970 0 0 1 1 0 0 50
    "Algeria" 1971 0 0 1 1 0 0 50
    "Algeria" 1976 0 0 0 1 0 0 50
    "Algeria" 1977 0 0 0 1 0 0 50
    "Algeria" 1978 0 0 0 1 0 0 50
    "Algeria" 1979 0 0 0 1 0 0 50
    "Algeria" 1980 0 0 0 1 0 0 50
    "Algeria" 1981 0 0 0 1 0 0 50
    "Algeria" 1982 0 0 0 1 0 0 50
    "Algeria" 1983 0 0 0 1 0 0 50
    "Algeria" 1984 0 0 0 1 0 0 50
    "Algeria" 1985 0 0 0 1 0 0 50
    "Algeria" 1986 0 0 0 1 0 0 50
    "Algeria" 1987 0 0 0 1 0 0 50
    "Algeria" 1988 0 0 0 1 0 0 50
    "Algeria" 1989 0 0 0 1 0 0 50
    "Algeria" 1990 0 0 0 1 0 0 50



  • #2
    Well, the code would be like this:
    Code:
    fillin sanctioned_state year
    foreach v of varlist arms-other {
        replace `v' = 0 if _fillin
    }
    by sanctioned_state (country_id), sort: assert inlist(country_id, country_id[1], .)
    by sanctioned_state (country_id), sort: replace country_id = country_id[1]
    isid sanctioned_state year, sort
    But there is something wrong with your data. The example contains large numbers of completely duplicate observations. For panel data you will need to have only a single observation for each country in each year. You already have multiple observations for many such combinations, and the expansion will only make matters worse. Moreover, I notice that the surplus observations are exact duplicates--they agree on all variables. So they convey no additional information. While it is easy enough to remove these duplicates from the data set, their very presence usually indicates that errors were made in the data management that created the data set. And where one error is found, others may lurk. So before you proceed, you should carefully review the code that created this data set and identify and fix the errors you find. If you determine that the duplicates were already present in the source data from which you created this data set, then you should request whoever provided that data to fix the problem.

    Comment


    • #3
      Thank you for your response Clyde. The fault regarding the duplicate observations is mine entirely. I have not been thorough enough with my dataexample. The observations are not exact duplicates, as you will see in the new dataex; they represent different states which have sanctioned the country in a given year.

      [CODE]
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str33 sanctioning_state str46 sanctioned_state int year byte(arms military trade financial travel) long(country_id ss_id)
      "United States" "Afghanistan" 1979 0 1 0 1 0 1 200
      "Kazakhstan" "Afghanistan" 1996 1 0 0 0 0 1 94
      "Tajikistan" "Afghanistan" 1996 1 0 0 0 0 1 185
      "Congo, Republic of the" "Afghanistan" 1996 1 0 0 0 0 1 43
      "Eritrea" "Afghanistan" 1996 1 0 0 0 0 1 62
      "Nicaragua" "Afghanistan" 1996 1 0 0 0 0 1 137
      "Bahrain" "Afghanistan" 1996 1 0 0 0 0 1 13
      "Micronesia, " "Afghanistan" 1996 1 0 0 0 0 1 122
      "Singapore" "Afghanistan" 1996 1 0 0 0 0 1 167
      "Grenada" "Afghanistan" 1996 1 0 0 0 0 1 75
      "Venezuela" "Afghanistan" 1996 1 0 0 0 0 1 206
      "Uganda" "Afghanistan" 1996 1 0 0 0 0 1 196
      "Rwanda" "Afghanistan" 1996 1 0 0 0 0 1 156
      "Burundi" "Afghanistan" 1996 1 0 0 0 0 1 29
      "Comoros" "Afghanistan" 1996 1 0 0 0 0 1 41
      "Estonia" "Afghanistan" 1996 1 0 0 0 0 1 63
      "Gambia, The" "Afghanistan" 1996 1 0 0 0 0 1 69

      Does this change anything with regards to the code you sent?

      Comment


      • #4
        For reference, my ideal data structure would look like this:

        sanctioned_state int year byte(sanctioned(yes/no) instances of sanctions arms military trade financial travel other)

        "Afghanistan" 1979 1 1 0 1 0 1 0 0
        "Afghanistan" 1980 0 0 . . . . . .
        "Afghanistan" 1981 0 0 . . . . . .
        "Afghanistan" 1982 0 0 . . . . . .
        "Afghanistan" 1983 0 0 . . . . . .
        "Afghanistan" 1984 0 0 . . . . . .
        "Afghanistan" 1985 0 0 . . . . . .
        "Afghanistan" 1986 0 0 . . . . . .
        "Afghanistan" 1987 0 0 . . . . . .
        "Afghanistan" 1988 0 0 . . . . . .
        "Afghanistan" 1989 0 0 . . . . . .
        "Afghanistan" 1990 0 0 . . . . . .
        "Afghanistan" 1991 0 0 . . . . . .
        "Afghanistan" 1992 0 0 . . . . . .
        "Afghanistan" 1993 0 0 . . . . . .
        "Afghanistan" 1994 0 0 . . . . . .
        "Afghanistan" 1995 0 0 . . . . . .
        "Afghanistan" 1996 1 6 1 0 0 0 0 0 1

        And then this in the full time period 1950-2022, for all countries in the existing data

        I hope this makes sense.

        Comment


        • #5
          It appears that in the result you want, as illustrated in #4, you do not want to distinguish observations by sanctioning_state. You are just interested in how things like to the sanctioned_state in each year. So I would also assume that if a sanctioned_state has a particular type of sanction, say arms, from some sanctioning states but not others, you would want a 1 for arms in that observation. In other words, you want to disregard the sanctioning state, and simply report the presence of a sanction in the category from any state (1) vs. no such sanction from any state at all (0).

          So if I have that right, it would be a slight modification to the original code, just adding a -collapse- at the beginning, and, as you apparently do not need the country_id variable in your result, eliminating the little dance with that variable at the end. So:
          Code:
          collapse (max) arms-travel, by(sanctioned_state year)
          
          fillin sanctioned_state year
          foreach v of varlist arms-travel {
              replace `v' = 0 if _fillin
          }
          Notes:
          1. This code assumes, but does not verify, that every year in the range of years of the data is instantiated in at least one country. If there is some year for which no country has any observations, that year will also be missing in the result.
          2. It is a bad practice to use missing value as a code for "no" in yes/no variables. It's a recipe for serious problems when you come to analyze the data. The coding of yes/no variables that works best in Stata is 1 = yes 0 = no, and that is what is implemented here.
          3. Your example in #4 also appears to contain a variable number_of_sanctions. I'm not entirely sure what that's supposed to represent, so I've ignored it. It could simply be the count of the number of 1's in the same observation--which can be calculated using -egen, rowtotal()-. But it might also want to count up the total number of sanctioning countries imposing any sanction--which would require a different approach. Or maybe it's something else entirely. If it's important, post back with an explanation. From the example shown, I couldn't figure out what it's supposed to be.

          Comment


          • #6
            That is entirely correct. Thank you. With regard to the "variable number_of_sanctions" my goal was to count the amount of states that have sanctioned a given state in a given year. This would enable me to differentiate between a state which was only sanctioned by one state in a year, and a sate (like Afghanistan in my dataex) which has been sanctioned by many states in ex. 1996. This would require counting the duplicates observation pr. country pr. year and adding a variable which describes this number of duplicates. This is what you describe as "count up the total number of sanctioning countries imposing any sanction"


            Your help is much appreciated.

            Comment


            • #7
              Thanks for the explanation of number_of_sanctions. So it just requires a small modification of the code:
              Code:
              encode sanctioning_state, gen(sanctioner)
              collapse (max) arms-travel (count) number_of_sanctions = sanctioner,  ///
                  by(sanctioned_state year)
              
              fillin sanctioned_state year
              foreach v of varlist arms-travel {
                  replace `v' = 0 if _fillin
              }

              Comment


              • #8
                Perfect. Thank you very much. Everything looks perfect now

                Comment

                Working...
                X