Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Gen. new variables based on values for multiple existing variables, TSCS data

    Dear members,

    I have put together a data-set with information about military alliances btw states in the post WW2 era. The data-set is organized by country year, going from 1946-2016.

    The data-set also includes information about which countries a country shares a land border with at any given year (variable border1 border2 etc, up to border14).

    For each alliance a country is a member of any given year, I have created variables with country-codes for members of that alliance (this means that a country may appear more than once as an ally for a given year).

    (Example: The variable a1mem1 shows the country-code for the first member for alliance 1, the variable a16mem8 shows the 8th member for alliance 16, and so on.)

    The maximum number of alliances a country has been a member of in the post WW2 era is 53. The maximum number of members for any alliance in the post WW2 era is 59. This means that I have many variables, where in most instances the value is missing (very few states are members of more than a couple of alliances, and most alliances involves only a relatively small number of states).

    To be precise, the data-set consists of 9720 observations (country-year), and 3196 variables. Of these, the variables with information about members for each alliance make up 3127 (53*59) of these variables (again, variables a1mem1 – a53mem59).

    Now, what I would like is to create new variables with the name alliance_b1 alliance_b2 alliance_b3 etc.

    The value for the variable alliance_b1 should be “1” if the country in this particular year has an alliance with the country that appears in the variable border1 (and 0 otherwise). Similarly, the value for the variable alliance_b2 should be “1” if a country has an alliance with the country that appear in the variable border2, etc.

    Put differently, I would like to run a code that goes through all the 3127 variables from a1mem1 to a1mem59, a2mem1 to a2mem59… up to a53mem59; checks if any of the country-codes there also appears in the variables border1-border14, and if so, codes the new variables alliance_b1, alliance_b2 etc accordingly.

    Below I have included a heavily reduced version of my data-set for illustrative purposes (both table and code).

    All advice and recommendations are deeply appreciated. Also, since I am new to the forum, any suggestions for how I can improve my questions are most welcome.

    Best,
    Magnus Åsblad


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double cowcode float(border1 border2 border3) double year int(a1mem1 a1mem2 a1mem3 a2mem1 a2mem2 a2mem3 a3mem1 a3mem2 a3mem3)
      2  20  70   . 1946 . 235   .   . 40 41   .   .   .
      2  20  70   . 1947 .  40  41   . 31 40   .   .   .
      2  20  70   . 1948 .  31  40   . 31 40   .   .   .
      2  20  70   . 1949 .  31  40   . 31 40   .  20 200
     20   2   .   . 1946 .   .   .   .  .  .   .   .   .
     20   2   .   . 1947 .   .   .   .  .  .   .   .   .
     20   2   .   . 1948 .   .   .   .  .  .   .   .   .
     20   2   .   . 1949 2   . 200   .  .  .   .   .   .
     70  80  90   2 1946 2  40  41   .  .  .   .   .   .
     70  80  90   2 1947 2  40  41   2 31 40   .   .   .
     70  80  90   2 1948 2  31  40   2 31 40   .   .   .
     70  80  90   2 1949 2  31  40   2 31 40   .   .   .
    220 221 211 232 1946 . 365   .   .  .  .   .   .   .
    220 221 211 232 1947 . 365   . 200  .  .   .   .   .
    220 221 211 232 1948 . 365   . 200  .  . 200 210 211
    220 221 211 232 1949 . 365   . 200  .  . 200 210 211
    end
    cowcode border1 border2 border3 year a1mem1 a1mem2 a1mem3 a2mem1 a2mem2 a2mem3 a3mem1 a3mem2 a3mem3
    2 20 70 1946 235 40 41
    2 20 70 1947 40 41 31 40
    2 20 70 1948 31 40 31 40
    2 20 70 1949 31 40 31 40 20 200
    20 2 1946
    20 2 1947
    20 2 1948
    20 2 1949 2 200
    70 80 90 2 1946 2 40 41
    70 80 90 2 1947 2 40 41 2 31 40
    70 80 90 2 1948 2 31 40 2 31 40
    70 80 90 2 1949 2 31 40 2 31 40
    220 221 211 232 1946 365
    220 221 211 232 1947 365 200
    220 221 211 232 1948 365 200 200 210 211
    220 221 211 232 1949 365 200 200 210 211

  • #2
    Welcome to Stata list. I appreciate that you provided data. It will also help if you provide some code you have tried to make solve the problem. You will also find that a shorter more precise question is far more likely to elicit an answer. You have a very long complicated posting which means is quite likely that whoever reads it won't interpret it the way you meant.

    It looks to me like you need to loop over the alliance variables and replace something. But I don't fully understand what you want. I think it is something like:
    forvalues b=1/3 {
    g a_`b'=.
    forvalues a=1/3 {
    forvalues mem=1/3 {
    replace a_`b'=1 if border`a'==a`a'mem`mem'
    }
    }
    }
    list


    Comment


    • #3
      Dear Phil,

      thank you so much for your reply. I will definitely try to be shorter and more precise when asking my questions in the future (no doubt my inability to pose a concise question is linked to my difficulty in solving the problem myself).

      Your code looks intuitively right: It creates new variables, it goes through the alliance_member variables, and then tells Stata to code the new variables a_1 etc "1" if the value for the border-variables correspond to the values found in the alliance_member variables.

      Unfortunately, it doesn't do what I want it to do. When running the code, I get the output below. In many instances, the value for the new variables a_1, a_2 and a_3 are clearly incorrect (given what I would like the code to do).

      For example, the observations with the cowcode 2 have two borders (cowcode 20 and cowcode 70). In the year 1946, 1947 and 1948, the values "20" or "70" does not appear anywhere in the variables a1mem1 - a3mem3. Thus, the value for a_1 and a_2 should be missing for those years (and not "1"). And since the value for border3 is missing for the country with cowcode "2", the value for the variable a_3 should be missing by default (now it is "1").

      Like I said, your code seem intuitively "right", so I'm not sure what's really going on.

      What I would like is a code that asks the question "for each observations, does the values that appear in the variable border1 appear in variable a1mem1, a1mem2 etc? If so, code a new variable a_1 "1"."

      And then the same thing for each border-variable... But this seem to be exactly what your code is set up to do (but doesn't).
      Nevertheless, I appreciate the help and will continue trying to solve the problem.

      Best wishes,
      Magnus




      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input double cowcode float(border1 border2 border3) double year int(a1mem1 a1mem2 a1mem3 a2mem1 a2mem2 a2mem3 a3mem1 a3mem2 a3mem3) float(a_1 a_2 a_3)
      2 20 70 . 1946 . 235 . . 40 41 . . . 1 1 1
      2 20 70 . 1947 . 40 41 . 31 40 . . . 1 1 1
      2 20 70 . 1948 . 31 40 . 31 40 . . . 1 1 1
      2 20 70 . 1949 . 31 40 . 31 40 . 20 200 1 1 1
      20 2 . . 1946 . . . . . . . . . 1 1 1
      20 2 . . 1947 . . . . . . . . . 1 1 1
      20 2 . . 1948 . . . . . . . . . 1 1 1
      20 2 . . 1949 2 . 200 . . . . . . 1 1 1
      70 80 90 2 1946 2 40 41 . . . . . . . . .
      70 80 90 2 1947 2 40 41 2 31 40 . . . . . .
      70 80 90 2 1948 2 31 40 2 31 40 . . . . . .
      70 80 90 2 1949 2 31 40 2 31 40 . . . . . .
      220 221 211 232 1946 . 365 . . . . . . . . . .
      220 221 211 232 1947 . 365 . 200 . . . . . . . .
      220 221 211 232 1948 . 365 . 200 . . 200 210 211 . . .
      220 221 211 232 1949 . 365 . 200 . . 200 210 211 . . .
      end
      cowcode border1 border2 border3 year a1mem1 a1mem2 a1mem3 a2mem1 a2mem2 a2mem3 a3mem1 a3mem2 a3mem3 a_1 a_2 a_3
      2 20 70 1946 235 40 41 1 1 1
      2 20 70 1947 40 41 31 40 1 1 1
      2 20 70 1948 31 40 31 40 1 1 1
      2 20 70 1949 31 40 31 40 20 200 1 1 1
      20 2 1946 1 1 1
      20 2 1947 1 1 1
      20 2 1948 1 1 1
      20 2 1949 2 200 1 1 1
      70 80 90 2 1946 2 40 41
      70 80 90 2 1947 2 40 41 2 31 40
      70 80 90 2 1948 2 31 40 2 31 40
      70 80 90 2 1949 2 31 40 2 31 40
      220 221 211 232 1946 365
      220 221 211 232 1947 365 200
      220 221 211 232 1948 365 200 200 210 211
      220 221 211 232 1949 365 200 200 210 211

      Comment


      • #4
        What I would like is a code that asks the question "for each observations, does the values that appear in the variable border1 appear in variable a1mem1, a1mem2 etc? If so, code a new variable a_1 "1"."
        See

        Code:
        help inlist()
        How you automate this depends on aspects such as your variable names, but this is possible in your current setup. Otherwise just understand how the -inlist()- function works and then proceed from there.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input double cowcode float(border1 border2 border3) double year int(a1mem1 a1mem2 a1mem3 a2mem1 a2mem2 a2mem3 a3mem1 a3mem2 a3mem3)
          2  20  70   . 1946 . 235   .   . 40 41   .   .   .
          2  20  70   . 1947 .  40  41   . 31 40   .   .   .
          2  20  70   . 1948 .  31  40   . 31 40   .   .   .
          2  20  70   . 1949 .  31  40   . 31 40   .  20 200
         20   2   .   . 1946 .   .   .   .  .  .   .   .   .
         20   2   .   . 1947 .   .   .   .  .  .   .   .   .
         20   2   .   . 1948 .   .   .   .  .  .   .   .   .
         20   2   .   . 1949 2   . 200   .  .  .   .   .   .
         70  80  90   2 1946 2  40  41   .  .  .   .   .   .
         70  80  90   2 1947 2  40  41   2 31 40   .   .   .
         70  80  90   2 1948 2  31  40   2 31 40   .   .   .
         70  80  90   2 1949 2  31  40   2 31 40   .   .   .
        220 221 211 232 1946 . 365   .   .  .  .   .   .   .
        220 221 211 232 1947 . 365   . 200  .  .   .   .   .
        220 221 211 232 1948 . 365   . 200  .  . 200 210 211
        220 221 211 232 1949 . 365   . 200  .  . 200 210 211
        end
        
        foreach border of varlist border*{
            local code =substr("`border'", -1, 1)
            gen a_`code'= .
        }
        foreach border of varlist border*{
            local code =substr("`border'", -1, 1)
            ds a`code'mem*
            local list= trim(subinstr("`r(varlist)'", " ", ",",.))
            replace a_`code'= 1 if inlist(border`code', `list') & !missing(border`code')
        }
        Res.:

        Code:
        . l border1 a1mem1 a1mem2 a1mem3 year a_1
        
             +-------------------------------------------------+
             | border1   a1mem1   a1mem2   a1mem3   year   a_1 |
             |-------------------------------------------------|
          1. |      20        .      235        .   1946     . |
          2. |      20        .       40       41   1947     . |
          3. |      20        .       31       40   1948     . |
          4. |      20        .       31       40   1949     . |
          5. |       2        .        .        .   1946     . |
             |-------------------------------------------------|
          6. |       2        .        .        .   1947     . |
          7. |       2        .        .        .   1948     . |
          8. |       2        2        .      200   1949     1 |
          9. |      80        2       40       41   1946     . |
         10. |      80        2       40       41   1947     . |
             |-------------------------------------------------|
         11. |      80        2       31       40   1948     . |
         12. |      80        2       31       40   1949     . |
         13. |     221        .      365        .   1946     . |
         14. |     221        .      365        .   1947     . |
         15. |     221        .      365        .   1948     . |
             |-------------------------------------------------|
         16. |     221        .      365        .   1949     . |
             +-------------------------------------------------+

        Comment

        Working...
        X