Announcement

Collapse
No announcement yet.
This is a sticky topic.
X
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I would like to see Stata supporting the import and export to OpenDocument Spreadsheet format (https://en.wikipedia.org/wiki/OpenDocument).

    Comment


    • The top item in my wishlist would be for SSC to post multiple versions of packages. There would need at least to be the possibility of one package version per package name per Stata version, as this would enable users of obsolete Stata versions to use obsolete package versions. I was conferring earlier today with a Stata 15 user in Argentina, where Stata upgrades might be hard to come by, owing to economic problems there.

      Fortunately, obsolete versions of all my packages (in the most recent package version compatible with the user's Stata version can be downloaded from my website at www.rogernewsonresources.org, where I distribute do-files with names like instasisay_xy to download the latest version of all my packages compatible with Stata version xy. Unfortunately, obsolete package versions are not available on SSC, which is an issue for users in institutions/groups where SSC is whitelisted and my website isn't.

      This might be a job for SSC rather than for Stata 19, though.

      Comment


      • Originally posted by Roger Newson View Post
        This might be a job for SSC rather than for Stata 19, though.
        I would think so.

        Technically, you could simply put all versions into one package, using different names for the ado-files. You could then have the (main) ado-file call the appropriate version.

        Say your command is called foo. The package on SSC contains foo.ado. It might also contain foo_17.ado, foo_16.ado, etc. You set up foo.ado as

        Code:
        program foo
            
            // version intentionally omitted or lowest supported
            
            if (c(stata_version) < 18) {
                
                local version = floor(c(stata_version))
                foo_`version' `0'
                exit
                
            }
            
            version 18
            
            ...
            
        end
        With this setup, those using your commands could always type

        Code:
        ssc install foo
        foo ...
        They never need to change their (old) code and they always get the most up-to-date version that is compatible with their current version of Stata. You could have different versions of help files with main help that links to prior version, too.

        If you do not like having all versions in the up-to-date package, you could set up one more package, say, foo_old, that contains foo_17.ado, foo_16.ado, etc. and send it to Kit. foo.ado, from the up-to-date package, would then (ask the user to) install the foo_old package if needed.

        Comment


        • Roger Newson I think there is a bit more to it than just archiving older versions. Version control systems can already handle that use case extremely well. For example, -git tag- can tag a release in a git repository which will persist as long as that repository does; hosting a remote repository via GitHub then makes it possible for others to find in a similar manner. The broader difficulty, however, is dependency management. If someone needs an older version of a program, but it has a dependency it can make matters significantly more challenging. That said, if the Stata package specification allowed specifying the specific version to install, then it would just be a matter of ensuring older versions of programs are saved in order to be served up as needed.

          Comment


          • Restore legacy (i.e., pre-Stata 17) "replace" option for table command. I installed Stata 18 today and was stunned to discover that this functionality has disappeared entirely.

            Comment


            • Originally posted by Dan Palmer View Post
              Restore legacy (i.e., pre-Stata 17) "replace" option for table command. I installed Stata 18 today and was stunned to discover that this functionality has disappeared entirely.
              No need to wish for that. Simply type:

              Code:
              version 16 : table ... , replace
              and everything will work as it did.

              Read more about Stata's version control in

              Code:
              help version
              You get the help-file for the old table command as

              Code:
              help table_16

              Comment


              • Yes, version control would do it, but it seems preferable to have a solution that is native to the current version.

                I'm a bit disappointed in Stata for abandoning the legacy table command. I don't dispute the potential utility of the newer syntax for certain users, but I wish that functionality could have been developed under a new command name. My organization uses Stata for the production of a great many automated reports that rely on the prior table, replace functionality. It's so fundamental to our work that it leaves me feeling insecure about other current capabilities that may be discarded in the future.

                Comment


                • Just to make sure there is no misunderstanding here. Nothing is discarded -- now or in the future. As long as your organization has declared the version they were using when setting up the automation, which they are supposed to, there is nothing to be insecure or worried about.

                  My guess is that StataCorp felt the name table is by far the best, i.e. most natural, word to describe what the command is now doing; hence, they did not choose another name. Such drastic changes to how a command works are very rare, though. I have been using Stata for a while now, and I cannot readily remember a similar change since the new graphic system was introduced (which was introduced before I started using Stata). They kept the old command under the name graph7; perhaps they could include a table16 command that you could directly call; would that be acceptable?

                  Comment


                  • Yes, I suppose something like that would be preferable to the current state of things. Agreed that a revision of this magnitude to the functioning of an existing command is atypical for Stata, which further contributes to my sense of surprise.

                    Comment


                    • I think Stata really needs to do something about how D.() gets applied to interactions. In my work, I always want to difference the interaction; I do not want the interaction of the difference, which is identically zero in staggered difference-in-differences applications. D.() should do exactly what one would get by creating the interactions manually. And I still can't figure out a good reason why D.() cannot be applied when using factor notation, i. Again, this seems to be a misunderstanding about what differencing is supposed to achieve. I want to write a model in levels and then difference the entire equation to remove heterogeneity. In a DiD setting, the treatment indicators are dg#fs, where g is treatment cohort and s is time period. If I use D.(c.dg#c.fs), State replaces it with D.c.dg#D.c.fs, which is identically zero. If I use the more natural D.(i.dg#i.fs), Stata doesn't allow it. I assume the reason it doesn't it because it wants to use the interaction of the differences.

                      Comment


                      • I posted this in the general forum, but, at Stephen Jenkins suggestion, I'm posting it here. I really think it's important enough to add to a future version of Stata:

                        This is something that has bothered me for many years about built-in Stata commands for panel data. For small T, large N, if I want an alternative to to xtreg, re to allow for an unrestricted T x T within-unit variance-covariance matrix, I seem to be limited to xtgee. There are some important limitations of xtgee. For one, while one can specify corr(uns) to allow an unrestricted T x T correlation matrix, the variances are assumed constant across time. One can get around this by weighting all variables by the inverse of a time-period specific standard deviation, but this is clunky. Even if one does that, the estimation underlying estimation of GEE has its problems. When I run a simulation with high AR(1) serial correlation (say, rho = 0.75), the number of iterations where xtgee fails can be as high as 20%. If GEE were implemented with a two-step feasible GLS procedure, it would almost always run -- the exception being the extremely rare case where the estimated correlation matrix is not positive definite.

                        I don't think I've seen a user-written command that implements an unrestricted FGLS estimation for the small T, large N case (and allows for fully robust inference). I'd be happy to learn otherwise. Currently, xtgee is closest but doesn't do enough, and computationally, it's unnecessarily hard.

                        I think adding an option to xtgee, such as "fgls" (for feasible GLS) would be easy. It would be understood that the specified correlation matrix is to be used (such as unstructured) and, in addition, the variances are to be estimated separately in each time period. Then, these are used to construct the T x T variance-covariance matrix. The correlation matrix would be estimated using the usual pairwise sample correlations rather than the more complicated GEE nonlinear moment equations. I talk about this in the general nonlinear case in Chapter 12 of my MIT Press book.

                        Comment


                        • Originally posted by Jeff Wooldridge View Post
                          I think Stata really needs to do something about how D.() gets applied to interactions. In my work, I always want to difference the interaction; I do not want the interaction of the difference, which is identically zero in staggered difference-in-differences applications. D.() should do exactly what one would get by creating the interactions manually. And I still can't figure out a good reason why D.() cannot be applied when using factor notation, i. Again, this seems to be a misunderstanding about what differencing is supposed to achieve. I want to write a model in levels and then difference the entire equation to remove heterogeneity. In a DiD setting, the treatment indicators are dg#fs, where g is treatment cohort and s is time period. If I use D.(c.dg#c.fs), State replaces it with D.c.dg#D.c.fs, which is identically zero. If I use the more natural D.(i.dg#i.fs), Stata doesn't allow it. I assume the reason it doesn't it because it wants to use the interaction of the differences.
                          My understanding is that the main purpose of factor variables in Stata is to enable all the things the margins command can do. If D.(i.dg#i.fs) was referring to the differenced interaction term, then margins would run into problems.
                          https://twitter.com/Kripfganz

                          Comment


                          • Sebastian: This is a good point, but there are many cases where I'm not actually interested in the marginal effects. For example, I want to control for unit-specific time trends, which I get by i.id#c.year. There should be some way to put that into an equation and estimate the equation by differencing. And I can't see much justification for replacing the difference of the interaction with the interaction of the difference, which completely changes the model. I know I can generate these, but it's cumbersome and time consuming in a simulation. I guess there must be tradeoffs.

                            Comment


                            • Originally posted by Rasmus Green View Post
                              A built-in Stata code co-pilot GPT!

                              OpenAI recently released their GPT-store (https://chat.openai.com/gpts). There is one GPT for Stata, "The Stata GPT" by Jose RA Sanchez Ameijeiras. I haven't tried it (I'm not a ChatGPT Plus user) so I cannot attest to it's usability. But a GPT that is trained on everything relating to Stata, all commands, all help files, all Statalist posts and every paper from Stata Journal and made built-in into Stata 19 for autocompletion of code, proof reading etc etc would be highly useful. VSCode has a Github Co-pilot (https://github.com/features/copilot) if one is looking for inspiration.
                              I second this suggestion, I already use the Github Co-pilot in VSCode and although it works much better in Python due to the amount of training data, it is already incredibly helpful in Stata, for example with repetitive code sections, loops, autocompletions, macros,...

                              I just tried out "The Stata GPT", very unimpressive performance, basically just suggests to check Stata's help files or search online.

                              My Prompt: Explain the "dsregress" command to me

                              Reply: The "dsregress" command doesn't match any standard command in Stata as of my last update. Stata commands for regression analysis typically start with prefixes like "regress" (for linear regression), "logit" (for logistic regression), "probit" (for probit regression), and so on. The command you're asking about might be a custom command developed by a user or part of a specific package that extends Stata's capabilities beyond its built-in commands.
                              However, given the prefix "ds" might suggest something related to "difference in differences" (DiD) estimations or data smoothing techniques, but without more context, it's hard to pinpoint exactly what "dsregress" does. It's also possible that there's a typo or a misunderstanding in the command name.
                              To find more information on a custom command like "dsregress" in Stata, you can try the following approaches:
                              1. Help Command: If "dsregress" is indeed a command installed in your Stata environment, you can type help dsregress in the Stata command window. This would bring up the documentation for the command if it exists.
                              2. Search Online: Often, user-written Stata commands are shared through forums, repositories, or the SSC Archive. You can search online for "Stata dsregress command" to see if there are any references or documentation available.
                              3. Stata Journal and SSC Archive: Some user-written commands are submitted to the Stata Journal or the SSC Archive. You can check these resources for more information on "dsregress".
                              4. Net Search: In Stata, you can use the command net search dsregress to search for packages related to the keyword. If "dsregress" is a user-written package available through the official channels, this command might help you find and install it.
                              If "dsregress" is specific to a particular field of study or a custom implementation for a specific analysis, it would be beneficial to refer to the documentation or the source where you encountered the command for more detailed information.

                              My Prompt: What are you trained on

                              Reply: The dsregress command in Stata is used for double-selection lasso linear regression. This technique involves a two-step procedure where the lasso (Least Absolute Shrinkage and Selection Operator) method is used twice: first to select controls (variables) and second to perform regression with the selected controls. This approach helps in addressing model selection biases and is particularly useful when dealing with high-dimensional data where the number of predictors (variables) is large in relation to the number of observations.
                              The dsregress command specifically refers to the linear regression model, implying that it is used to predict a continuous outcome variable based on a set of predictor variables. The double-selection aspect refers to the process of using lasso first to select variables that are related to the outcome and then again to select variables that are related to the treatment or exposure of interest. This double use of lasso helps in reducing bias in the estimation of the treatment effect, particularly in observational studies where the treatment is not randomly assigned.
                              The command is part of a suite of commands related to lasso techniques for variable selection and regularization in Stata. These commands are designed to handle various types of regression models including logistic regression (dslogit), Poisson regression (dspoisson), and others, providing a comprehensive toolset for dealing with complex, high-dimensional datasets in various contexts, including causal inference and prediction modeling​​.

                              My Prompt: Give a code example

                              Reply: The specific syntax for dsregress in Stata would depend on the options and variables you're working with. However, since dsregress is not a standard Stata command and appears to be a user-defined or package-specific command, I'll provide a generic example based on the typical use case for double-selection lasso regression. Please note, this example might not exactly match the syntax of dsregress if it comes from a specific package or implementation. You would need to consult the documentation or help files for dsregress in your Stata installation for precise syntax and options.
                              Assuming dsregress works similarly to other regression commands in Stata but incorporates double-selection lasso techniques, a hypothetical code example might look like this:
                              stata
                              // Double-selection lasso regression of outcome Y on treatment T and controls X
                              dsregress Y T X, options
                              Where:
                              • Y is your outcome variable.
                              • T is the treatment or variable of interest.
                              • X represents a set of control variables.
                              • options could include various options for the lasso procedure, such as specifying cross-validation criteria, the penalty term, or other model-specific options.
                              Since this is a hypothetical example, the actual options and the way you specify control variables or groups of variables may differ. Always refer to the actual command documentation for accurate syntax and extended functionality.
                              If dsregress is part of a package you've installed, you might also need to include a command to install or load the package before using dsregress, something like:
                              stata
                              // Example command to install a Stata package, replace `packagename` with the actual package name
                              ssc install packagename

                              // Then you could use dsregress as specified
                              dsregress Y T X, options
                              Remember, this example is speculative and intended to give you a starting point for understanding how the command might be used. For accurate syntax and options, please look up the command's documentation in your Stata environment or the source where you found the command.
                              Last edited by Alexander Koplenig; 12 Feb 2024, 01:05.

                              Comment


                              • Originally posted by Jeff Wooldridge View Post
                                I think Stata really needs to do something about how D.() gets applied to interactions. In my work, I always want to difference the interaction; I do not want the interaction of the difference, which is identically zero in staggered difference-in-differences applications. D.() should do exactly what one would get by creating the interactions manually. And I still can't figure out a good reason why D.() cannot be applied when using factor notation, i. Again, this seems to be a misunderstanding about what differencing is supposed to achieve. I want to write a model in levels and then difference the entire equation to remove heterogeneity. In a DiD setting, the treatment indicators are dg#fs, where g is treatment cohort and s is time period. If I use D.(c.dg#c.fs), State replaces it with D.c.dg#D.c.fs, which is identically zero. If I use the more natural D.(i.dg#i.fs), Stata doesn't allow it. I assume the reason it doesn't it because it wants to use the interaction of the differences.
                                Jeff Wooldridge there's a very good reason for a difference operator to not work on nominal and ordinal scaled data. Differences are not mathematically defined on those measurement scales. See McDonald, R. P. (1999). Test Theory: A Unified Treatment. Lawrence Erlbaum Associates. p. 408-419.

                                Comment

                                Working...
                                X