No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assigning values to 1 var based on values of another

    I'm new to the forum, so apologies in advance if I don't exactly follow proper protocol. I have a 3-year panel. The attached contrived example contains the varnames mentioned here. There is a var in the most recent year (2011) that assigns each household (hhid) to a geographic sub region (sregion). Many of the same households appear in the earlier two years, but the sregion variable appears only in 2011. I want to assign a value for sregion to observations where the same hhid in 2011 appears in the earlier years. For example, I want to assign sregion=6 for hhid=106 in the earlier years. Similarly, I want sregion=1 for hhid=101, and sregion=2 for hhid=102 in the earlier years. I want the other values of sregion to remain missing. NOTE: The actual hhid values are 11 digits long, and there are too many to use something like 'replace sregion=6 if hhid==106' for all the values.

    The sregion variable is crucial to my work. My goal here is to increase n by pulling in observations from earlier years, but they need an sregion value.

    Thanks in advance for any help you can provide.
    Attached Files
    Last edited by Bill Herrin; 08 Nov 2018, 18:15.

  • #2
    It's very easy to learn the "proper protocol" here. Just read the Forum FAQ (click on the black bar that says FAQ just under the Statalist banner.) No expertise needed. And it isn't a matter of protocol, really. We don't have anybody "enforcing the rules." But if you read the FAQ you will see that the advice there is designed to make the communications here effective and efficient. If you post a question in a clear way with an ready-to-use display of the relevant information, including data/code/results, you help those who want to help you and increase your chances of getting a timely and helpful response. That's what the FAQ are all about.

    With regard to example data, good that you tried to show it. But Excel spreadsheets are not a good way. Some people don't use Microsoft Office products. Many of us are unwilling to download anything from strangers. Most important, a spreadsheet is not a Stata data set and for many kinds of problem, the spreadsheet does not have all the information needed for a solution. Sometimes the metadata are very important. The solution to that, as you will learn from FAQ #12 is the -dataex- command.

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.


    • #3

      Here is the data entered into Stata:
      * Example generated by -dataex-. To install: ssc install dataex
      input int(year hhid) byte(sregion x)
      2011 105 5 88
      2011 106 6 55
      2011 106 6 66
      2011 101 1 22
      2011 102 2 44
      2010 107 . 12
      2010 106 . 45
      2010 108 . 89
      2010 108 . 78
      2010 101 . 56
      2010 109 . 34
      2010 107 . 67
      2009 106 . 57
      2009 108 . 68
      2009 102 . 24
      You could do something like:
      sort hhid sregion year  // This puts the obs with sregion as 1st obs for the hhid
      by hhid: replace sregion = sregion[1] if sregion[1] !=.
      You could also create a master file that had one line per hhid with its sregion, and then merge that into file that has all the various years. If you specify -update- in the merge, it will replace the blank sregions with the good values. See the Statalist post here for a similar solution. (His question / context was different than yours, but the solution was very similar).

      Hope that helps!


      • #4
        Clyde, points well taken. David, thanks for your help.