Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using a loop to generate a new variable based on other variable with many distinct values

    Hi,

    In my data I have a variable that has a variable ID with values such as: 220, 240, 470, 1120, 2840, 8710, etc. I want to generate a more simple identifying variable, say ID2, with numbers running from 1 to [number of distinct values in ID].

    I know I can do this by hand using a number of replace commands, but as my data contains 60+ distinct values in ID, I'd prefer to do with using a loop. But unfortunately I am not very experienced with Stata so I could use some help with this.

  • #2
    There is no need for any looping here.

    Go straight to the FAQ https://www.stata.com/support/faqs/d...p-identifiers/ to see why not.

    Code:
    egen ID2 = group(ID), label
    Our own Statalist FAQ Advice enjoins searching of the FAQs before posting.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      There is no need for any looping here.

      Go straight to the FAQ https://www.stata.com/support/faqs/d...p-identifiers/ to see why not.

      Code:
      egen ID2 = group(ID), label
      Our own Statalist FAQ Advice enjoins searching of the FAQs before posting.
      Thank you, Nick, for your prompt reply.

      However, using that code it seems that ID2 simply copies ID. ID2 contains the exactly the same values as ID, and is not replaced by 1,2,3,4, etc.

      Edit: If I remove
      Code:
      , label
      the desired ID2 values appear. Thank you!
      Last edited by Dougie Jones; 16 Jul 2018, 06:11.

      Comment


      • #4
        No; as you've realised you see the value labels, but the underlying values are what you are asked for. Conversely, the labels are optional.

        Comment


        • #5
          An additional question:
          I have another ID variable called AdditionalID which is populated by the same set of numbers as my original ID variable. Now I want to generate a more simple variable, SimpleAdditionalID. If an observation has for both ID and AddtionalID a value of 210, and for a ID2 a value of 1, SimpleAdditionalID should also take value 1. And in cases where ID and AdditionalID are not equal, ID2 and SimpleAdditionalID should also not be equal.

          I've tried and checked
          Code:
           egen SimpleAdditionalID = group(AdditionalID)
          Although this gets some values right, it does not get them all right. What is the proper way of doing this?
          Last edited by Dougie Jones; 16 Jul 2018, 06:45.

          Comment


          • #6
            Code:
            gen SimpleAdditionalId = ID2 if ID == AdditionalID
            Otherwise, what's your rule for where they differ? Not being equal is not a rule I can translate into code.

            Comment


            • #7
              If I understand correctly, what you want to do is apply the same renumbering scheme to the values of both ID and AdditionalID, so if the original values are the same, the simple values will also be the same. If so, here is some code that may point you in a useful direction.
              Code:
              // create pretend master dataset in temporary file
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(id otherid)
              101 102
              102 104
              103 101
              end
              tempfile master
              save `master'
              list, clean noobs
              
              // start from here using your dataset
              use `master', clear
              keep id otherid
              rename (id otherid) (oldid1 oldid2)
              generate seq = _n
              reshape long oldid, i(seq) j(j)
              drop seq j
              list, clean noobs
              
              duplicates drop oldid, force
              sort oldid
              generate simple = _n
              tempfile ids
              save `ids'
              list, clean noobs
              
              use `master', clear
              rename id oldid
              merge 1:1 oldid using `ids', assert(match using) keep(match)
              drop _merge
              rename (oldid simple) (id simpleid)
              rename otherid oldid
              merge 1:1 oldid using `ids', assert(match using) keep(match)
              drop _merge
              rename (oldid simple) (otherid simpleotherid)
              
              sort id otherid
              list, clean noobs
              Code:
              . list, clean noobs
              
                   id   otherid   simpleid   simp~rid  
                  101       102          1          2  
                  102       104          2          4  
                  103       101          3          1

              Comment


              • #8
                Originally posted by William Lisowski View Post
                If I understand correctly, what you want to do is apply the same renumbering scheme to the values of both ID and AdditionalID, so if the original values are the same, the simple values will also be the same. If so, here is some code that may point you in a useful direction.
                Code:
                // create pretend master dataset in temporary file
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(id otherid)
                101 102
                102 104
                103 101
                end
                tempfile master
                save `master'
                list, clean noobs
                
                // start from here using your dataset
                use `master', clear
                keep id otherid
                rename (id otherid) (oldid1 oldid2)
                generate seq = _n
                reshape long oldid, i(seq) j(j)
                drop seq j
                list, clean noobs
                
                duplicates drop oldid, force
                sort oldid
                generate simple = _n
                tempfile ids
                save `ids'
                list, clean noobs
                
                use `master', clear
                rename id oldid
                merge 1:1 oldid using `ids', assert(match using) keep(match)
                drop _merge
                rename (oldid simple) (id simpleid)
                rename otherid oldid
                merge 1:1 oldid using `ids', assert(match using) keep(match)
                drop _merge
                rename (oldid simple) (otherid simpleotherid)
                
                sort id otherid
                list, clean noobs
                Code:
                . list, clean noobs
                
                id otherid simpleid simp~rid
                101 102 1 2
                102 104 2 4
                103 101 3 1
                Thank you William, you understood correctly what I want to do. However, the code is giving me an error at
                Code:
                use `master', clear
                . Invalid file specification is the error I get. In my do-file I first specify the working directory of course.
                Last edited by Dougie Jones; 17 Jul 2018, 02:32.

                Comment


                • #9
                  Try

                  Code:
                  save "`master'"
                  It's likely that your directory or folder for temporary files contains embedded spaces, so the double quotes are necessary.

                  On the original problem, how about

                  Code:
                  egen newid = group( ID AdditionalID) 

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    Try

                    Code:
                    save "`master'"
                    It's likely that your directory or folder for temporary files contains embedded spaces, so the double quotes are necessary.
                    This does not fix the problem.
                    Originally posted by Nick Cox View Post

                    On the original problem, how about

                    Code:
                    egen newid = group( ID AdditionalID) 
                    This generates one variable based on the ID and AdditionalID, but I want two new seperate variables, but using the same numbering scheme in both of them. If it is any easier, creating value labels for the original ID variables is also fine.

                    Comment


                    • #11
                      Absent a real or realistic data example (FAQ Advice #12) I am still fuzzy on what your goal is. As for

                      This does not fix the problem.
                      that is not very informative. Do you get the same error message? Does the code fail at the same point? I took William's code and wrapped all the temporary filenames implied by local macros in double quotes and this code then ran through.on the Windows machine I am at without error. Can you test the same code?

                      Code:
                      // create pretend master dataset in temporary file
                      * Example generated by -dataex-. To install: ssc install dataex
                      clear
                      input float(id otherid)
                      101 102
                      102 104
                      103 101
                      end
                      tempfile master
                      save "`master'"
                      list, clean noobs
                      
                      // start from here using your dataset
                      use "`master'", clear
                      keep id otherid
                      rename (id otherid) (oldid1 oldid2)
                      generate seq = _n
                      reshape long oldid, i(seq) j(j)
                      drop seq j
                      list, clean noobs
                      
                      duplicates drop oldid, force
                      sort oldid
                      generate simple = _n
                      tempfile ids
                      save "`ids'"
                      list, clean noobs
                      
                      use "`master'", clear
                      rename id oldid
                      merge 1:1 oldid using "`ids'", assert(match using) keep(match)
                      drop _merge
                      rename (oldid simple) (id simpleid)
                      rename otherid oldid
                      merge 1:1 oldid using "`ids'", assert(match using) keep(match)
                      drop _merge
                      rename (oldid simple) (otherid simpleotherid)
                      
                      sort id otherid
                      list, clean noobs
                      Finally,

                      Code:
                      display "`c(tmpdir)'"
                      will show you where Stata is trying to open temporary files: is it somewhere you have write permission?

                      Naturally, if you've modified the code you need to show us the code you tried.

                      Last edited by Nick Cox; 17 Jul 2018, 03:32.

                      Comment


                      • #12
                        I'm reading this as:
                        Code:
                        clear
                        input float(n idorig idother)
                        1 101 102
                        2 102 104
                        3 103 101
                        end
                        reshape long id, i(n) j(idtype) string
                        egen newid = group(id)
                        reshape wide id newid, i(n) j(idtype) string
                        Code:
                        . list
                        
                             +--------------------------------------------+
                             | n   idorig   newido~g   idother   newido~r |
                             |--------------------------------------------|
                          1. | 1      101          1       102          2 |
                          2. | 2      102          2       104          4 |
                          3. | 3      103          3       101          1 |
                             +--------------------------------------------+

                        As indicated, please use dataex to provide a data example for future questions. This makes it easier for other to answer your questions and takes a away the need for a whole lot of guess work. See the FAQ for more on why and how to use dataex: https://www.statalist.org/forums/help#stata

                        Comment


                        • #13
                          Originally posted by Nick Cox View Post
                          Absent a real or realistic data example (FAQ Advice #12) I am still fuzzy on what your goal is. As for



                          that is not very informative. Do you get the same error message? Does the code fail at the same point?
                          My apologies. Indeed, I get the same error as before.
                          I took William's code and wrapped all the temporary filenames implied by local macros in double quotes and this code then ran through.on the Windows machine I am at without error. Can you test the same code?

                          Code:
                          // create pretend master dataset in temporary file
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input float(id otherid)
                          101 102
                          102 104
                          103 101
                          end
                          tempfile master
                          save "`master'"
                          list, clean noobs
                          
                          // start from here using your dataset
                          use "`master'", clear
                          keep id otherid
                          rename (id otherid) (oldid1 oldid2)
                          generate seq = _n
                          reshape long oldid, i(seq) j(j)
                          drop seq j
                          list, clean noobs
                          
                          duplicates drop oldid, force
                          sort oldid
                          generate simple = _n
                          tempfile ids
                          save "`ids'"
                          list, clean noobs
                          
                          use "`master'", clear
                          rename id oldid
                          merge 1:1 oldid using "`ids'", assert(match using) keep(match)
                          drop _merge
                          rename (oldid simple) (id simpleid)
                          rename otherid oldid
                          merge 1:1 oldid using "`ids'", assert(match using) keep(match)
                          drop _merge
                          rename (oldid simple) (otherid simpleotherid)
                          
                          sort id otherid
                          list, clean noobs
                          This code works without a hitch, but when I add in another row of data like so:
                          Code:
                          input float(id otherid)
                          101 102
                          102 104
                          103 101
                          103 102
                          I get the following error:
                          Code:
                          variable oldid does not uniquely identify observations in the master data

                          Finally,

                          Code:
                          display "`c(tmpdir)'"
                          will show you where Stata is trying to open temporary files: is it somewhere you have write permission?

                          Naturally, if you've modified the code you need to show us the code you tried.
                          I am working on a remote desktop and my guess is that I do not have writing permission in the directory Stata uses for temporary files.

                          This is the following code I have used that has resulted in the aforementioned error:
                          Code:
                          clear
                          * Import dataset
                          import delimited "G:\filepath\data.csv", delimiter(";") varnames(1)
                          
                          tempfile master
                          save "`master'"
                          list, clean noobs
                          
                          // start from here using your dataset
                          use "`master'", clear
                          keep id otherid
                          rename (id otherid) (oldid1 oldid2)
                          generate seq = _n
                          reshape long oldid, i(seq) j(j)
                          drop seq j
                          list, clean noobs
                          
                          duplicates drop oldid, force
                          sort oldid
                          generate simple = _n
                          tempfile ids
                          save "`ids'"
                          list, clean noobs
                          
                          use "`master'", clear
                          rename id oldid
                          merge 1:1 oldid using "`ids'", assert(match using) keep(match)
                          drop _merge
                          rename (oldid simple) (id simpleid)
                          rename other_id oldid
                          merge 1:1 oldid using "`ids'", assert(match using) keep(match)
                          drop _merge
                          rename (oldid simple) (otherid simpleotherid)
                          
                          sort id otherid
                          list, clean noobs

                          Comment


                          • #14
                            This works, also with repeated id values
                            Code:
                            clear
                            input float(id otherid)
                            101 102
                            102 104
                            103 101
                            103 102
                            end
                            gen n=_n
                            rename (id otherid) (idorig idother)
                            reshape long id, i(n) j(idtype) string
                            egen newid = group(id)
                            reshape wide id newid, i(n) j(idtype) string

                            Comment


                            • #15
                              Jorrit's approach in post #14 is preferable to mine. I started from the common situation where the identifiers exist in two separate datasets and modified that approach to having them in a single dataset.

                              I believe the problem in post #8 is that you copied the code into the do-file editor window, and then rather than running everything at once, you ran it by selecting a few lines and running them, then selecting the next few lines and running them, and so on.

                              Consider the following example. In the do-file editor window, I have a two-line program that I run in its entirety.
                              Code:
                              . do "/var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//SD30770.000000"
                              
                              . local message Hello, world.
                              
                              . display "`message'"
                              Hello, world.
                              
                              .
                              end of do-file
                              Now I run the same two lines by selecting the first line and running it, then selecting the second line and running it.
                              Code:
                              . do "/var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//SD30770.000000"
                              
                              . local message Hello, world.
                              
                              .
                              end of do-file
                              
                              . do "/var/folders/xr/lm5ccr996k7dspxs35yqzyt80000gp/T//SD30770.000000"
                              
                              . display "`message'"
                              
                              
                              .
                              end of do-file
                              The important thing to keep in mind is that local macros vanish when the do-file within which they were created ends. If you look carefully at the results above, you'll see that when I selected a single line to run, it was copied into a temporary do-file and run, so even though both lines are in the same window in the do-file editor, they are run as separate do-files, and local macro defined in the first line vanishes at the end of that do-file, and is undefined when the second line is run.

                              So, when you ran
                              Code:
                              use `master', clear
                              Stata saw
                              Code:
                              . use , clear
                              invalid file specification
                              r(198);
                              The problem with the merge in post #13 is solved by changing both occurrences of
                              Code:
                              merge 1:1 oldid using "`ids'", assert(match using) keep(match)
                              to
                              Code:
                              merge m:1 oldid using "`ids'", assert(match using) keep(match)
                              or better yet, by adopting Jorrit's approach.

                              Comment

                              Working...
                              X