Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Distribute attribute to same IDs

    Hi all, I'm struggling with how to phrase this search, so please feel free to link to a solution if you think my q has been answered.

    I have a simple dataset that looks something like this:
    ID Attribute 1 Attribute 2
    a y
    a
    b y
    b y
    c y
    c
    d y
    d
    In order to QA, I'm trying to create code that will distribute Attributes 1 & 2 to every ID, such that the data looks like this in the end:
    ID Attribute 1 Attribute 2
    a y
    a y
    b y y
    b y y
    c y
    c y
    d y
    d y
    I know I did this years ago, using a couple of very simple lines of code...I think an egen command was involved. But I can't find the do file and can't remember the code (my brain is cheese). Any thoughts?

    Again, many thanks in advance for your help!

    ~Nereida

  • #2
    You are starting with "attribute" variables that apparently are coded y/missing. This is a bad idea to start with. Attribute variables in Stata are much more useful if coded y/n, with missing values used only for those situations where the value of the attribute is actually not known. Using y/missing attributes is going to set you up for mistakes when you start doing analyses involving these variables. You will really regret it.

    Next, your data display, a composed table, does not disambiguate whether your attribute variables are string variables, or value-labeled numeric variables. This matters crucially: the code is different for what you want to do. In the code below, I assume they are actually string variable. Do not try to use this if they are not: it will not only not work correctly, it will wipe out much of your data. Your table also doesn't show the variable names: substitute them for the placeholder variable names I use in the code.

    Code:
    foreach v of varlist attribute* {
        by ID (`v'), sort: replace `v' = `v'[_N]
    With that done, you will now have consistent values of the attribute variables within IDs. I strongly urge you to replace the missing values with n, except in those cases where the values is actually unknown. The data example shown (and perhaps the real data set itself) provides no information enabling one to distinguish between n and truly unknown--but perhaps you know that information in some other way.

    In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      Thanks so much, Clyde! Appreciate your guidance.

      Comment

      Working...
      X