Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with using expand>2 while replacing values in duplicates generated

    Hi,

    I am trying to use the expand command to create duplicates and replacing one of the variables in the row.

    For example,
    expand 2 if state=="S" & district=="D" & year=="2009", generate (new)
    One the duplicate is created, I apply:
    replace district="D1" if district=="D" & state=="S" & year==2009 & new==1

    This works perfectly only if I want to use expand 2.
    Now I want to expand a row 9 times, the replace command will not work as all the new duplicated are assigned the value 1.

    To elucidate:
    expand 9 if state=="S" & district=="D" & year=="2009", generate (new)
    This created the necessary duplicate rows but I can not do the following:

    replace district="D1" if district=="D" & state=="S" & year==2009 & new==1
    replace district="D2" if district=="D" & state=="S" & year==2009 & new==1
    and so on.

    I tried generating a case id and replacing it but it requires me to manually check the id created which is not feasible as I need to do this for various states and have a million rows of data.

    I am sure there is a better way of doing this which I am missing.

    Any help would be appreciated.

    Thank you
    Regards,
    Purnima





  • #2
    Lacking sample data to demonstrate on, here's simplified data and corresponding code that may show you a useful approach to take on your real data.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str8 district
    101 "D1"
    102 "D1"
    103 "D1"
    end
    
    expand 3 if id==102, generate(new)
    sort id new 
    by id (new): replace district = "D2" if id==102 & _n==2
    by id (new): replace district = "D3" if id==102 & _n==3
    list, noobs sepby(id)
    Code:
    . list, noobs sepby(id)
    
      +----------------------+
      |  id   district   new |
      |----------------------|
      | 101         D1     0 |
      |----------------------|
      | 102         D1     0 |
      | 102         D2     1 |
      | 102         D3     1 |
      |----------------------|
      | 103         D1     0 |
      +----------------------+

    Comment


    • #3
      Hi William,

      Thank you for your reply. Below are further details about my data and the exact code:

      About my data:
      My data has several years (in no particular pattern e.g. 1991, 1992, 1996, 1998, 1999 and so on), 35 states, 700 districts and in total around a million observations over 16 variables.

      What am trying to do:
      The state of Delhi only had one district named Delhi but in 1998, it was later divided into 9 parts - I wish to replace district value of Delhi with its 9 divisions: i.e. replace Delhi with (1) Central Delhi (2) North Delhi (3) East Delhi (4) North-East Delhi (5) North-West Delhi (6) New Delhi (7) South-West Delhi (8) South Delhi and (9) West Delhi.

      What I have done:
      I tried to use the code you provided but I think the sorting is not happening correctly. It may be because of sorting issue perhaps.

      I tried the following:
      expand 9 if state=="Delhi" & district=="Delhi" & year==1998, generate(new)

      sort district new

      by district (new): replace district="Central Delhi" if district=="Delhi" & state=="Delhi" & year==1998 & _n==2
      by district (new): replace district="North Delhi" if district=="Delhi" & state=="Delhi" & year==1998 & _n==3
      by district (new): replace district="East Delhi" if district=="Delhi" & state=="Delhi" & year==1998 & _n==4

      and so on for all 9 divisions which came into existence.

      Problem:

      While the expand command works perfectly, when I try the replace command, it only changes one value i.e. when _n==4 (East Delhi)

      Thank you for your time and help

      Regards,
      Purnima



      Comment


      • #4
        Let me start with some general advice about getting the most benefit out of Statalist.

        Please take the time to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you have used to create your posts. Note especially sections 9-12 on how to best pose your question.

        The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

        In particular, even the best descriptions of data are no substitute for an actual example of the data. In order to get a helpful response, you usually need to help with some example data.

        Be sure to use the dataex command when providing example data. If you are running version 15.1 or a fully updated version 14.2, dataex is already part of your official Stata installation. If not, run ssc install dataex to get it. Either way, run help dataex and read the simple instructions for using it. dataex will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use dataex.

        With that said, my simplified code was meant as a learning example to be read, understood, and adapted to your actual data. I would have given actual code in post #2 if you had, as I hinted, given example data for it to be tested on. If I can't test code I'm not inclined to give untested code.

        In particular, your data has district and state and year, and my code only has district. I sort by district new so that suggests you should sort by district state year new, and it suggests your by: qualifiers should be district state year (new), with the "new" enclosed in parentheses so that the code is run on each combination of district and state and year, for all values of new.

        Comment


        • #5
          Hi William,

          Thank you for your reply.
          I did try the sorting but I tried year state district (new) which was wrong. My problem is solved now. Thank you for your help and I will keep your suggestions in my mind when I post next. I am new to the forum and my apologies as I did not write the full example.

          Regards,
          Purnima

          Comment

          Working...
          X