Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating (many) dyads

    I have a time-series dataset with the following form:

    ID Year
    1 2000
    2 2000
    3 2000
    1 2001
    2 2001
    3 2001
    4 2001
    ...

    I need to create dyads so that the dates looks as follows:

    IDa IDb Year
    1 2 2000
    1 3 2000
    ...
    1 4 2001
    ...

    The original dataset contains more than 17,000 unit-years. I have tried using Ferguson's DYADS, but I have run into a couple problems:
    • The dyads command cannot be used with either by or if, so I cannot create dyads within years. (This means dyads creates dyads containing, for example, (1,2000) and (1,2001), which I do not want;
    • This results in the computer attempting to create more than 224 million observations (dyads begins by creating N*N observations, then reducing that to N(N-1)/2 dyads), which should be far fewer when taking years into account;
    • I have tried using a loop with foreach, but I am unsure about my coding as Stata begins with creating all 224+ million observations.
    Any suggestions would be helpful, so thanks in advance. (For reference, I am starting with the Correlates of War system membership dataset. EUGene used to be my go-to source for this, but the program has not been updated in years, and I've converted to Apple, but EUGene only runs on Windows. Anyway, I would like to eventually have my own code that produces useful interstate data.)

  • #2
    I'm not entirely sure I understand what you want, but is it this:

    Code:
    preserve
    rename ID IDb
    tempfile holding
    save "`holding'"
    restore
    rename ID IDa
    joinby year using `holding'
    keep if IDb > IDa

    Comment


    • #3
      That did it! Thanks. I'd only add that this creates a nondirected dyad dataset and replacing the final line with
      Code:
      drop if IDb == IDa
      will leave it as a directed dyadic dataset.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        I'm not entirely sure I understand what you want, but is it this:

        Code:
        preserve
        rename ID IDb
        tempfile holding
        save "`holding'"
        restore
        rename ID IDa
        joinby year using `holding'
        keep if IDb > IDa

        Hi Clyde,
        I hope you see this reply.
        I have an similar issue with Jeremy but in a slightly different way.
        It seems that the quoted code matches all the possible pairs of combinations based on not only matching IDs but also Years.
        Then, the number of observations is going to be i*i*t (The number of i * The number of i * The number of time periods).
        How do I use your code if I want to create pairs based on IDs only so that I create i*i?

        Do I just joinby ID?


        Best,
        Ryan


        Comment


        • #5
          Your question is unclear. Either I do not grasp what you want, or you do not understand what the code does.

          The code above matches all possible pairs of observations and then retains only those that agree on Year. The resulting data set is necessarily no bigger than just pairing all IDs, and in most applications is much smaller than a "year-free" ID X ID pairing.

          What data do you have? Is there even a Year variable in your data set? If so, can a given ID have observations in more than one year? Do you want to just reduce the original data set to just one observation per ID, and then do a pairing of the ID's? Or do you have something else in mind.

          Please respond showing an example of your data using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

          Also show what you want the final result to look like.

          When asking for help with code, always show example data. When showing example data, always use -dataex-.

          Comment

          Working...
          X