Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate variable from map between pair of variables

    I have a dataset with variables cusip, cusip_, and sic. cusip and cusip_ have the same values, but are arranged so they can be put into a correlation matrix where each pair (cusip, cusip_) shows up once. Variable sic depends on the value in cusip_. A snapshot would look like:
    cusip cusip_ sic_ sic
    2 1 10 20
    1 1 10 10
    3 1 10 30
    ... ...
    1 2 20 10
    2 2 20 20
    3 2 20 30
    I want to generate a new variable sic (included above) which corresponds to cusip, not cusip_. I want to somehow label each of the 609 values of cusip_ by the corresponding sic_, and map the sic_ to the same values in cusip. And the match will not be perfect, as there are 810 values in cusip, some of which are not in cusip_, and in those cases I'd like the entry to be a missing observation. The data makes a 810 x 609 matrix of cusip x cusip_.

    Do you have any suggestions? Thank you in advance for the help! And if my question seems basic, I also apologize, as I'm relatively new to Stata and have not yet found a past question that addresses this.

  • #2
    I find your post confusing. After re-reading it several times, I think I know what you want, but this may not be it.

    I assume you are actually starting with a data set that contains cusip cusip_ and sic. In the code shown below I have created a new data set modeled on the table from your post, but corresponding to my assumptions. I assume, and verify in the code, that each cusip has a unique value of sic associated with it. That is, if two observations have the same value of cusip, then they also have the same value of sic. I also assume that each observation pairs a cusip with a value of cusip_, and that what you want to do is create a new variable, sic_ (not sic, which I assume you started with), which contains the value of sic that is associated with the value of cusip_ when it appears in cusip.

    If I have that right, then this will work:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(cusip cusip_ sic)
    1 1 10
    1 2 10
    1 3 10
    2 1 20
    2 2 20
    2 3 20
    3 1 30
    3 2 30
    3 3 30
    end
    
    //    VERIFY CUSIP-SIC CORRESPONDENCE IS ONE TO ONE
    by cusip (sic), sort: assert sic[1] == sic[_N]
    
    //    PULL VALUE OF SIC CORRESPONDING TO CUSIP_
    rangestat (first) sic_ = sic, interval(cusip cusip_ cusip_)
    
    //    LOOK AT THE RESULTS
    sort cusip cusip_
    list, noobs clean
    Note: This requires the -rangestat- command, written by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have done in this response. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.



    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Dear Clyde, thank you so much for this! You have it exactly right - that is what I'm trying to do. This works perfectly. In the future I'll be sure to use the -dataex- command and show example data, as well as try to make the original question less confusing. Thanks again!

      Comment

      Working...
      X