Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rename active dataset in a program without saving it

    I'm just writing an ado file doing data mangement tasks which generates two new variables. For different reasons the active data file has to be saved inbetween, which is done using tempfile, in order to make no changes at the users' data and environment. Because of this, the active dataset has tempfile names and save locations at the end of my ado file. But I want it to have the same name as it had when the user applied the ado file (much the same as if she had used the "generate"-command. I do not want to implement a "save"-command into the program, because I do not know if the user made changes before applying the ado file which are not to be saved under the original file name.
    I just want the data set to have the same name as before executing the ado file, leaving it to the user to save or not to save.
    I did not find any hint in the documentation or with findit or google how this might be done. Did I just overlook a stata command or option? Has someone an idea how this could be done?

  • #2
    To be honest, I do not fully get it. It sounds like you want to create (two) new variables and not make changes to (user's) datset at the same time, which is impossible. Have you looked at preserve?

    Could tell us more about the "different reasons" the datafile has to be saved?

    Also

    because I do not know if the user made changes before applying the ado file
    You can look at c(changed), which will (partly) answer this questions.

    Best
    Daniel
    Last edited by daniel klein; 02 Mar 2015, 07:07.

    Comment


    • #3
      This isn't really clear to me (like Daniel) but preserve .. restore leaves the dataset as it was.

      If you want to create new variables leaving the original dataset unchanged I would save to a different file while under the aegis of preserve or shunt that stuff somewhere else using postfile. I would export identifiers or other variables allowing the new stuff to be merged back in

      Comment


      • #4
        Hello daniel,

        I know that I change the data file in the memory, but I want to leave it to the user, if she wants to save the changes including the new variables to the disk or not. Therefore I need a way to only change the name of the active dataset in memory back to the name that dataset had before my ado file was executed.
        One of the reasons for temporary data saving is that I want to use just the involved variables for the transformations, because this makes it quicker when using very large data sets.

        Greetings, Klaudia

        Comment


        • #5
          but I want to leave it to the user, if she wants to save the changes including the new variables to the disk or not
          Well if it is up to the user, then I really do not see why you opt against a save() option? Maybe a generate() option would do?

          I need a way to only change the name of the active dataset in memory back to the name that dataset had before my ado file was executed
          The name of the dataset is stored in c(filename).

          One of the reasons for temporary data saving is that I want to use just the involved variables for the transformations, because this makes it quicker when using very large data sets.
          I would guess that this is unlikely to outweigh the time needed to save (or preserve) and re-load (or restore) the very large dataset.

          Anyway, I still do not understand what you are doing. Giving a little more details and/or maybe some commented code snippets might help us help you.

          Best
          Daniel

          Comment


          • #6
            Thanks for your answerts, Nick and Daniel, but maybe I have to clarify my question.
            It is only about the data set in memory what I'm talking about.
            The situation now is as follows:
            the user opens his data set, let it be mydata.dta
            He applies my ado file --> the dataset in memory includes mydata.dta plus 2 new variables, but is named something like "ST0100002.tmp"
            I would like to make my ado file as user friendly as possible - which means I want the dataset in memory still be named "mydata.dta" much the same as when you generate two new variables with "egen" or "generate".
            And similarly as with generate or egen I don't want to implement a save-command, so that the user can decide wether to save the changed data set or not.

            Comment


            • #7
              I don't get why you are using any temporary files in the first place.

              A standard design across hundreds of Stata commands is

              1. User has dataset in memory from named file.

              2. User runs a command that generates new variables in that dataset

              3. User decides whether to save dataset with new variables.

              If you are doing something different you may well have a good reason, but it's not clear what that good reason is.

              Comment


              • #8
                Here are the related code snippets. The save "`orgfile'" (which is a tempfile) causes the data being named to "ST0100002.tmp" (or so)

                /* prepare the working file */

                if strpos("`md'", "m") == 0 { /* if mode n or s desired (unique entries), duplicates are dropped */
                quietly save "`orgfile'"
                keep `gr' `varlist'
                by `gr' `varlist': gen byte `temp' = cond(_n == 1, 1, 0)
                quietly drop if `temp' != 1
                capture drop `temp'
                /* used this syntax because "duplicates drop" is slower */
                }
                if strpos("`md'", "m") > 0 { /* if mode m desired (multiple entries), establish uniqueness by variable "temp" */
                by `gr' `varlist': gen byte `temp' = _n
                quietly save "`orgfile'"
                keep `gr' `varlist' `temp'
                }


                .... (Program syntax)


                if strpos("`md'", "m") > 0 { /* if multiple entries were ordered */
                quietly merge 1:1 `gr' `varlist' `temp' using "`orgfile'"
                drop _merge
                }
                if strpos("`md'", "m") == 0 { /* if unique entries (numeric and strings) were ordered */
                quietly merge 1:m `gr' `varlist' using "`orgfile'"
                drop _merge
                }


                /* restore settings, sort and variable order */
                set varabbrev `varabr'
                capture confirm variable `con', exact
                if _rc == 0 { /* if `con' exists */
                order `seq' `con' `costr'
                }
                if _rc != 0 {
                order `seq' `costr'
                }
                quietly describe, varlist
                local sortnow = r(sortlist)
                if "`sortnow'" != "`sortorig'" {
                sort `sortorig'
                }


                And in the last part I would like to restore not only the original variable and sort order, but also the original file name!

                Comment


                • #9
                  It seems you are actually still omitting the reason you think destroying the original dataset is necessary. Why do you not merge your created files to the original file?


                  Also some comments, probably not directly related to your question.

                  Without a preserve statement, you risk losing the original dataset all together if your program fails before conclusion. This is dangerous.

                  You might also think about type long for your `temp' variable, as a byte variable will fail whenever there are more than 228 distinct values in the defined groups. This is likely to happen in a (very) large dataset.

                  Best
                  Daniel
                  Last edited by daniel klein; 02 Mar 2015, 08:15.

                  Comment


                  • #10
                    Hello Daniel,

                    The answer to your question is: if the user has done changes to the original file (e.g. created a new variable with generate) these would be lost if I merge my files to the original file. Even if I would use c(changed) I had no solution to the problem if there were changes.

                    my comments to your comments:

                    1. Without a preserve statement, I risk to lose the active dataset in the memory, that is true. But the dataset on disk will still be available (I know it, I had several program failures during developping the syntax

                    2. probably you were reading too quickly: the temp variable can only be 0 or 1, thats what is defined in the cond() function, no matter how big the data set is.

                    I would like to remind my original quesion to anybody reading this: is there any possibility with stata to rename the active data set in stata without actually saving it?

                    greetings, Klaudia

                    Comment


                    • #11
                      hello Daniel, I am sorry, it was me who was reading too quickly. You surely were relating to
                      gen byte `temp' = _n
                      and you are right, I'll change it to long.

                      Greetings, Klaudia

                      Comment


                      • #12
                        On your bottom line: I believe the answer to be No. The name of a dataset is precisely that of its .dta file if there is one.

                        Comment


                        • #13
                          You may be right, but I still wonder... I think it is unusual that the property <name> of the object <active data set> can not be changed by stata programs

                          Comment


                          • #14
                            "unusual": what does that mean?

                            What properties as described by describe or listed by creturn list are you thinking of?

                            Comment


                            • #15
                              I think of c(filename) - can this be changed by stata syntax?

                              Comment

                              Working...
                              X