No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create an id when merging data

    Hey Guys,

    I would like to merge Stata files. The observations I would like to merge have not an id yet. My Intention is to allocate an id to each Observation which is composed of the year, the id of the house and the id of the Refrigerator.

    I have the variables year, houseid, refrigeratorid and now I would like to create a variable id [year][houseid][refrigeratorid]

    With the command generate I tried to do that but regrettably this does not work:

    gen id = "year""houseid""refrigeratorid" but stata does not recognize that year is a variable and therefore gives the variable always the value "year".

    Moreover I wonder whether there is a command in that which creates new Folders. If I want stata to save a file or to access a path which does not exist yet, Stata aborts running through my dofile.

    Could somebody please help me?

    Thank you very much in advance.


  • #2

    I would like to create a variable id [year][houseid][refrigeratorid]
    Generally speaking I'd recomand you to proceed with the following:
    egen newid = group(year houseid refrigeratorid)
    BUT : If you want to have the same id for the same year / house /refrigerator from two different dataset, the previous command wont work.

    You can concatenate variables as you tried to do but let me remind two things.
    -Concatenation only works with string variables, so you probably will have to use a tostring command for the year variable (and maybe for the two others.)
    -The command is
    gen newid=string_year+string_houseid+string_refrigeratorid
    where string_var is the variable coded in string format.



    • #3
      What you say you typed is just illegal, so I wonder what you really typed. Double quotes " " are used in Stata for specifying literal strings, and only for that purpose.

      You can concatenate identifers using egen, concat().

      Watch out for ambiguities. You don't say anything about your house and refrigerator identifiers, but any concatenation must be reversible. Using separators is a way to ensure that.


      • #4
        Charlie's comment that "concatenation only works for string variables" is certainly correct in spirit, but some footnotes are possible.

        egen, concat() was written as a convenience egen function to allow numeric variables to be concatenated too. It doesn't violate that principle as string() is used internally when needed.

        More generally, tostring is itself a convenience command. It was made public originally because destring was public and people were asking for its inverse. (I'd written a rudimentary tostring as a programming exercise.) That request was a bit of a puzzle as string() already existed but tostring was made public too. Now tostring is more general (e.g. in allowing you to work on several numeric variables at once, which would otherwise require a loop). But I see unnecessary uses of tostring too. For example

        gen funnydate = string(year) + string(month, "%02.0f") + string(day, "%02.0f")
        would be one way to get string dates such as "20150330". For that kind of problem string() is much more direct than tostring. That might be true here too, but Thomas doesn't tell us anything about variable types.
        Last edited by Nick Cox; 30 Mar 2015, 05:12.


        • #5
          Sorry, I did not clarify enough that I don't want Stata do create a "randomn" variable but where you can already read in the id which year the Refrigerator is from

          To illustrate which Id I desire, I provide an example:

          year houseid refrigeratorid newid
          2000 101 3 20001013
          2001 101 4 20001014
          2002 103 2 20001032


          • #6
            We get that point, this is why instead of the convenient group command, I recomanded you to concatenate the three other variables.
            Nick also gave you an elegant concatenation function, that will do what you want.


            • #7
              There is a mkdir command in Stata.


              • #8
                ok, thank you
                concerning the variable Format and type:
                the bad Thing is that the Format and the type of the variables vary. the data has been imported from Excel
                i have found those dataformats and types:
                long int str5 str6 str4 str3
                %10.0g %10.0g %9s %9s %9s %9s


                • #9
                  Thomas: You have not said which of your key variables is of which type. But you should have enough information to solve your problem now.


                  • #10
                    thank you, egen, concat()as well as mkdir works. As Nick mentioned above one has to pay Attention to ambiguities with regard to the new id
                    when merging some files I got this Problem and now I would like Stata to Show me These observations where the id is the same so that I get an idea why there are ambigous/equal ids
                    is there a function showing the identical ids in Stata

                    for instance, with".list" one can only set criteria concerning the variables values but cannot enter an info that stata should only list obs with identical ids
                    I'd appreciate if you can help me another time


                    • #11
                      It exists : -duplicates report -, or -duplicates list- (-help duplicates- will help you to understand the differences).


                      • #12
                        thank you very much that helped me!
                        I am sorry for my questions but it is quite difficult for me to look for a command when I do not kno if it exists and how it can be named

                        I have still one question whether you can write in your dofile to save the same dofile to another path and to run this dofile in the new path
                        I want to create a dofile which first Downloads the data then treats it etc. In the third line one can Chose the datapath by means of local datapath
                        however I would like to save this dofile in this datapath in order to avoid using `datapath' every time when I Access a data file

                        Hoping that my explanations are clear enough I look Forward to your answers


                        • #13
                          I would think that the- cd- (change directory) command is what you need (as usual see -help cd- ).
                          This command tells stata the directory you want to settle in, and then the use or save command already happend in this directory. It doesn't prevent you to do some sub-directories to organize your data, but you won't have to re-write the whole path.
                          example :
                          use "C:\download\temp\~~~verycomplicatedpath~~~~.dta"
                          cd "C:\User\Work\statastuff"
                          save "initialdata.dta"
                          some operation
                          save "\myresults\outcome.dta"
                          Then you won't have to write the datapath everytime, and your initial dataset in saved in a clear directory, easy to use.


                          • #14
                            cd is definitely an Option for what I intended to do, thank you!

                            Nick: mkdir youcreates a new path that quite cool but in case I want to run my dofile another time, Stata aborts running through the dofile since the path already exists and mkdir does not encompass an Option to overwrite or replace paths, is there maybe another command which might contribute to the solution I desire?


                            • #15
                              The capture command can be used to prevent Stata from stopping if the directory already exists.