Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by daniel klein View Post
    Moreover, telling users not to keep ssc install commands in their code (source) might not be the best approach. Why not set up a (sub-)command that looks into the set ado-folder and merely does nothing when a community-contributed command is already found? Then tell users to use this instead of ssc install in their code. A basic version could look like
    I see two problems with this approach:
    1. Checking if a command is already installed and installing it if that's not the case does not solve the problem of having different versions of a command. While I agree that this is not the root of the problem of the replication crisis, it is an issue I have encountered more than once when checking if I could reproduce the results of a paper (it's happened on no more than 5% of the rep checks I've done, but still it did happen).
    2. In general, it is not good practice to install things on other people's machines, and it is definitely bad practice to do so without making it very clear to whoever is running any code that the code does so. This could bring upon the very same issue that version-controlling packages is trying to prevent. There is some discussion about this in this thread, but this is one of the reasons why virtual environments and package managers were created in the first place.

    Comment


    • #17
      Okay, I was not clear.


      Originally posted by Luiza Andrade View Post
      [*]Checking if a command is already installed and installing it if that's not the case does not solve the problem of having different versions of a command.
      I was not suggesting that as a complete solution. I was suggesting this to complement - improve, really - what seems to be the current workflow. I understand that the current workflow suggests reducing the ado-path to only BASE and PLUS, then changing PLUS to point to a project-specific folder. This is supposed to ensure that (a) community-contributed commands are installed into the project-specific folder and (b) no other community-contributed commands can be run. I have pointed out that (a) might fail because the place into which community-contributed commands are installed is ultimately controlled by net set ado - not by where the PLUS directory points to. Anyway, as I understand the current workflow, it suggests sprinkling in ssc install commands wherever needed in do-files but then removing those commands before saving/archiving/submitting/whatever the final "reproducible" package. This will fail; sooner or later you forget to remove one of those commands. As an alternative, I have suggested writing a simple wrapper that would replace ssc install in the do-files. When the ado-path is set up in the suggested way, there is no need to remove those commands from the final do-file(s) because they will never try to install another version of existing, i.e., already installed commands.


      Originally posted by Luiza Andrade View Post
      [*]In general, it is not good practice to install things on other people's machines, and it is definitely bad practice to do so without making it very clear to whoever is running any code that the code does so.
      I was not suggesting installing anything on anyone's machines, especially not without asking for permission. Arguably, throwing any number of community-contributed commands into a project-specific folder, then distributing it, and changing the ado-path to point exclusively to this folder is, technically, pretty-much installing software on other people's machines. Are you going to tell them what exactly is in your project-specific folder? From an even more general point of view, if the argument is that replication requires using the very same software then that implies installing the software, doesn't it?
      Last edited by daniel klein; 12 Apr 2023, 13:20.

      Comment


      • #18
        I do not see how
        Code:

        ieboilstart , version(14.1) `r(version)' is an improvement over
        Code:

        version 14.1 ieboilstart whatever_else_it_does

        Having to reference a local macro (technically, a global here) is error-prone and arguably makes the code harder to read.
        I think this is a fair critique. And I went back and forth about that myself. In the end, my reasoning was that both cases need a human action that we can't simplify by abstracting away into a command. And including it in ieboilstart was an opportunity to remind users of the importance of doing this (for the sake of the argument, lets all agree that it is important).

        There is nothing ieboilstart breaks for any user that still prefers to do this (apart from that it throws an error if you do not include option version)

        Code:
        version 14.1
        ieboilstart whatever_else_it_does
        The one draw back might be that a user might incorrectly be made to believe that "ieboilstart , version(14.1)" without the subsequent row is enough, but in my subjective opinion I think the pros outweigh the cons here.

        Comment


        • #19
          Changing the PLUS directory, which is what the adopath() option does, will not suffice for installing community-contributed commands where you want them if net ado has been set to a different place. Unfortunately, there is no easy way of obtaining the directories to which net is set (yes, you could work around that with log files). This implies that you probably do not want to net set in ieboilstart or, if you do, explicitly tell users about that.
          Thank you for this feedback. This was exactly what I was hoping to get by sharing in this forum. I will dig into this and see what can be done. Perhaps some recommendation in the help files is the best that can be done here.
          Last edited by Kristoffer Bjarkefur; 13 Apr 2023, 02:18.

          Comment


          • #20
            Responding to several things said here regarding workflow on when and where to use ssc install. For full disclosure, Luiza, Benjamin and I are colleagues so we know each other.

            First, I agree that sharing the "project ado-folder" (see the post shared in the original post for definition) is sharing a folder where something has been per-installed and that that is in some senses the same as installing something on their computer. However, in one sense this is different. I consider "C:\ado\plus" (where my default PLUS folder is) to be my space that I want to manage. I do a lot of testing of community-contributed commands so I want to be able to know exactly what I have there. The "project ado-folder" is a project managed folder where the team behind the project manage what commands are installed. And with "adopath()" it is possible to manage that in parallel with the default PLUS locations like the "C:\ado\plus". So I agree with Luiza that it is bad practice to install things in users default location without prompting them, but I think it is fine to share a project managed ado-folder where commands are pre-installed.

            Second, regarding sprinkling "ssc install" across code. We never suggested that. In the blog post we say:

            you install commands to the project ado-folder by using net install/ssc install in your main Stata window. Do not run net install/ssc install in the do-file editor as you should not include that in code you share.
            (Before anyone feels the need to correct me, yes, I know there is nothing called the "main Stata window". It is called the result window where you see output and where you write commands is called the command window, but when doing outreach we think it is more important to use the terminology that makes most sense for that specific target audience.)

            Installing command as the need arises means to add them to the ado-path by running ssc install in the command window in a session set up in strict mode using the "adopath()" option. Then sharing the project ado-folder with the exact version used.

            Third, your suggested wrapper. I agree that this is better than some practices, such as sprinkling "ssc install" across the code. We suggested something similar in our book published a few years back. But we are about to update this recommendation and recommend "adopath()" instead as we think version controlling community-contributed commands is important. (It is ok to disagree with that, but there are other threads where that can be discussed.)

            Comment


            • #21
              Originally posted by Kristoffer Bjarkefur View Post
              I consider "C:\ado\plus" (where my default PLUS folder is) to be my space [, t]he "project ado-folder" is a project managed folder [a]nd with "adopath()" it is possible to manage that in parallel

              I am not sure about that. Once set to what you call "strict mode", you would have to restart Stata to restore the original ado-path, don't you? Switching between "strict" and "nostrict" modes does not seem seamless in that sense. That is another minor thing I would probably change. Before changing the ado-path, make a copy of global S_ADO so you can later more easily restore the original ado-path (i.e., the ado-path before invoking iebilstart).

              If you feel comfortable with changing the ado-path and requiring users to restart to go back to the original state, then you might also feel comfortable merely changing the net setting in ieboilstart to net set ado PLUS, which should be the default whenever Stata is started (except, of course, someone says differently in their profile.do). That would probably be the easiest solution to the problem.


              Originally posted by Kristoffer Bjarkefur View Post
              Second, regarding sprinkling "ssc install" across code.

              Sorry, I must have misunderstood

              Keep installing commands in the project-ado folder throughout the project
              (source)


              Originally posted by Kristoffer Bjarkefur View Post
              Third, your suggested wrapper. I agree that this is better than some practices, such as sprinkling "ssc install" across the code. We suggested something similar in our book published a few years back. But we are about to update this recommendation and recommend "adopath()" instead as we think version controlling community-contributed commands is important.

              Once again, this was not suggested as an alternative but as a complement to your workflow. It does not interfere with (in fact, i believe it helps) version controlling community-contributed commands. If you add the noisily option in my code, it has the additional benefit to display - perhaps captured by a log file - the exact version that was used.

              Comment


              • #22
                Originally posted by daniel klein View Post

                If you feel comfortable with changing the ado-path and requiring users to restart to go back to the original state, then you might also feel comfortable merely changing the net setting in ieboilstart to net set ado PLUS, which should be the default whenever Stata is started (except, of course, someone says differently in their profile.do). That would probably be the easiest solution to the problem.
                This sounds promising. If you want, you can follow our work addressing this here: https://github.com/worldbank/ietoolkit/issues/337

                Comment


                • #23
                  This is a recurring topic that keeps coming up in Stata conferences.

                  Last discussion we had with StataCorp was to add some return variable which stores the distribution date of the package. e.g. when using "which <packagename>". Distribution date is a requirement for submitting to SSC so the information is there.

                  This allows us to test if a newer version is installed or not. Something along the lines:
                  Code:
                  if r(distdate) < 20230413 di as err "Older <package> version detected. Please update!"
                  etc.


                  cc: KitBaum Hua Peng (StataCorp)

                  Comment


                  • #24
                    Originally posted by Asjad Naqvi View Post
                    Last discussion we had with StataCorp was to add some return variable which stores the distribution date of the package. e.g. when using "which <packagename>". Distribution date is a requirement for submitting to SSC so the information is there.

                    This allows us to test if a newer version is installed or not. Something along the lines:
                    Code:
                    if r(distdate) < 20230413 di as err "Older <package> version detected. Please update!"
                    etc.
                    This is neat. For the packages I work on (for example ietoolkit that ieboilstart is part of) I always add a command with the same name as the package and return that information as well as the version number. See example here. But it would be awesome if this would be a standard across SSC.

                    Comment


                    • #25
                      Originally posted by Kristoffer Bjarkefur View Post
                      But it would be awesome if this would be a standard across SSC.
                      It is already standard that packages on SSC have a distribution date. ado update uses it. But, the information is stored in a .trk file. I deliberately said "a .trk file" not "the .trk file" as there can be multiple versions of those on one machine. I would not pin my hopes on this solving the problem as presented here.
                      Last edited by daniel klein; 13 Apr 2023, 09:28.

                      Comment


                      • #26
                        Asjad Naqvi , I will think about it and report back.

                        Comment

                        Working...
                        X