Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by JanDitzen View Post
    I am thinking about a protected mata space, similar to the r() and e().
    You might need to elaborate on the meaning of the term 'protected'. Protected from what; and in which sense?

    Comment


    • Re #328

      I can imagine that there are Mata objects other than matrices that one might wish to make available in a way that mata clear would not delete them, in the same way that clear preserves ("protects") r() and e() results. I can further imagine this to be possible for the results of python calculations.

      So I'd generalize this request to suggest somehow expanding the capabilities of the current return and ereturn to support the storage of essentially arbitrary data structures, that don't fit perfectly within the Stata data structures currently supported, of which mata matrices would be one such thing, mata structures would be another, and something from python (sorry, I don't speak the language to be able to provide a concrete example) would be yet another. I can imagine doing this with a new command with extended syntax, or with language-specific return commands, allowing the existing return infrastructure to be rewritten as special cases of the more general commands.

      Comment


      • William Lisowski exactly what I meant!

        Comment


        • I second the proposal by Jan and William. In my own estimation commands, I often generate Mata structures that contain model definitions and estimation results (not just in matrix form). It would be great if I could ereturn those Mata structures to make them available for postestimation commands, in particular to keep them available after mata clear of when someone saves estimation results and at a later time restores them.
          https://twitter.com/Kripfganz

          Comment


          • A display format that adds a leading "+" to positive numbers. I frequently need to label change-scores in graphs with leading plus/minus signs. That's doable by creating a string version of the value, but would be much simpler with something like this:

            Code:
            . di %+6.2f 1.2
             +1.20
            
            . di %+6.2f -1.2
             -1.20
            Last edited by Nicholas Winter; 29 Aug 2020, 07:49.

            Comment


            • A post I read earlier today caused me to look for a command that could produce a nice table of basic descriptive statistics that includes mean, median and mode. I was unable to find an official Stata command that includes the mode. Is there one? If not, I suggest adding mode as another statname to include in the statistics() option for -tabstat-.

              Code:
              help tabstat##statname
              PS- I am aware that one can use egen to compute the mode. I'm also aware of user-contributed commands like mmodes and modes.
              --
              Bruce Weaver
              Email: [email protected]
              Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
              Version: Stata/MP 18.0 (Windows)

              Comment


              • Bruce, how would you suggest to handle instances with more than one mode? I suspect this is the reason for it being missing from commands like -tabstat-.

                Comment


                • Re #337, fair question, Leonardo. In SPSS, one can obtain a mode via the FREQUENCIES command. I just tried an example with two modes (3 and 4). It showed mode = 3 in the output, but included this warning below the table:

                  Code:
                  a. Multiple modes exist. The smallest value is shown.
                  That made me curious to see what -egen- does when there are multiple modes, so I tried this example:

                  Code:
                  . clear *
                  
                  . set obs 10
                  number of observations (_N) was 0, now 10
                  
                  . generate byte y = 3 if _n < 6
                  (5 missing values generated)
                  
                  . replace y = 4 if _n > 5
                  (5 real changes made)
                  
                  . egen ymode = mode(y)
                  Warning: multiple modes encountered.  Generating missing values for the mode.  Use the maxmode, minmode, or nummode() options to control this
                  behavior.
                  (10 missing values generated)
                  
                  . list, clean
                  
                         y   ymode  
                    1.   3       .  
                    2.   3       .  
                    3.   3       .  
                    4.   3       .  
                    5.   3       .  
                    6.   4       .  
                    7.   4       .  
                    8.   4       .  
                    9.   4       .  
                   10.   4       .

                  Perhaps maxmode, minmode, and nummode() options could be included for -tabstat-. Alternatively, a warning could be issued when there are multiple modes.
                  --
                  Bruce Weaver
                  Email: [email protected]
                  Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                  Version: Stata/MP 18.0 (Windows)

                  Comment


                  • Hm..that's interesting to see egen's behaviour here. I hadn't used it before with multiple modes, to my memory.

                    Comment


                    • Unfortunately, I had to drop out early from the wishes and grumbles portion of the Stata UK conference, so I’ll mention these here.

                      Two things:

                      1) Spatial tobit
                      2) Seasonal adjustment (X-12-ARIMA or X-13-ARIMA )

                      I would like to see an official Stata command for seasonal adjustment. There was a menu-driven X-12-ARIMA seasonal adjustment Stata Journal article from 2012, but it would be nice to have an official command that’s more flexible (e.g. able to adjust multiple frequencies) and not menu-driven.

                      There’s tssmooth shwinters, but this isn’t how most seasonal adjustment (it’s technically doing seasonal smoothing) is done and in my opinion, it’s not easy to work with. Attempting to seasonally adjust something that’s highly seasonal can produce some weird results and fail to converge. (worked example below).


                      Code:
                      * NSA Monthly Construction Employment
                      import delimited using "https://fred.stlouisfed.org/series/CEU2000000001/downloaddata/CEU2000000001.csv", clear
                      gen month = mofd(date(date, "YMD"))
                      format month %td
                      tsset month, monthly
                      tssmooth shwinters shwinters_sa =value
                      Here's the example from the help file, this doesn't look like a seasonally-adjusted series. Although as noted, this isn't seasonal adjustment, it's seasonal smoothing, which is the closest thing official Stata has to seasonal adjustment to my knowledge.

                      Code:
                      webuse turksales, clear 
                      tssmooth shwinters shw1=sales
                      tsline shw1 sales

                      Wang, Qunyong, and Na Wu. "Menu-driven X-12-ARIMA seasonal adjustment in Stata." The Stata Journal 12, no. 2 (2012): 214-241.


                      Comment


                      • Some ideas for expansion of the -meta- suite.

                        1) Meta-analysis of single proportions -- a common exercise for trialists looking to understand/update placebo rates.

                        2) diagnostic test accuracy models -- the two popular models are the so-called "bivariate" and HSROC methods. Indeed, both are bivariate normal models and under some condiitons, are exactly the same). These models are implemented in -metandi-, -diagma- and -midas-, but their implementation and use are quite varied in capability. Of these -midas- has implemented the most features, and -metandi- has postestimation commands, but they're all using xtmelogit under the hood). Some of these packages are a little buggy and my personal gripe is that they don't represent results in the common way expected from estimation commands or make it easy to save out and manipulate graphics programmatically. Placing these models in the -meta- framework would go a long way to working with their estimates, model checking, derived graphics, etc.

                        Comment


                        • #336 #337 #338 #339 See also hsmode from SSC as posted at https://www.stata.com/statalist/arch.../msg00912.html

                          Comment


                          • Originally posted by Nick Cox View Post
                            #336 #337 #338 #339 See also hsmode from SSC as posted at https://www.stata.com/statalist/arch.../msg00912.html
                            Thanks Nick. I had not heard of -hsmode-. For the example I gave in #338, it returns a mode of 3, but does not warn that there are two modes.
                            Code:
                            clear *
                            set obs 10
                            generate byte y = 3 if _n < 6
                            replace y = 4 if _n > 5
                            egen ymode = mode(y)
                            list, clean
                            hsmode y
                            Output from the last 3 commands:

                            Code:
                            . egen ymode = mode(y)
                            Warning: multiple modes encountered.  Generating missing values for the mode.  Use the maxmode, minmode, or
                            nummode() options to control this behavior.
                            (10 missing values generated)
                            
                            . list, clean
                            
                                   y   ymode  
                              1.   3       .  
                              2.   3       .  
                              3.   3       .  
                              4.   3       .  
                              5.   3       .  
                              6.   4       .  
                              7.   4       .  
                              8.   4       .  
                              9.   4       .  
                             10.   4       .  
                            
                            . hsmode y
                            
                            (n = 10)
                                   mode
                            -----------
                            y         3
                            But to be fair, the help for -hsmode- does include this sentence:

                            Moreover, if your interest is in the existence or extent of bimodality or multimodality, it will be best to look directly at suitably smoothed estimates of the density function.


                            --
                            Bruce Weaver
                            Email: [email protected]
                            Web: http://sites.google.com/a/lakeheadu.ca/bweaver/
                            Version: Stata/MP 18.0 (Windows)

                            Comment


                            • hsmode gives you its best guess at the mode.

                              There is a good reason why Stata doesn't report a mode as a standard summary. What would be a standard summary?

                              Get 3 statistically-minded people in a room and ask how to determine the mode, and you will get about 12 answers.

                              1. The mode is the single most common value in a dataset.

                              2. But if there are ties for that, it is not well determined, so you can't say what it is.

                              3. No, if there are ties for that you have bi- or multimodality and should report that.

                              4. No, if there are ties for that, nevertheless sometimes there is an easy answer: you can just average equally common values if they are adjacent. .

                              5. No, in practice looking at frequencies of reported values is a poor method, especially with measured data, so you need to bin data first before looking at a histogram and choosing a modal class.

                              6. A mode is in practice a pronounced peak. If the leading mode is 10 times more frequent than the second mode, no-one experienced calls that bimodal. (Discuss!) But all that implies is this is a judgment call and not suitable for automated reporting.

                              7. All these methods are superseded for measured data by density estimation, including kernel density estimation. But best practice is to report those estimates and comment informally on modes if they are evident.

                              8. Density estimation allows a report of the mode. But it is contingent on which kernel and which bandwidth you use and on any other decisions made before or after density estimation.

                              9. Yet another approach is to use hsmode which follows a precise algorithm to give an estimate. It doesn't seem well known or widely used, but it is well defined.

                              I said 12 answers and I am naturally being a little flippant, but I doubt that I have remembered or know all the possible answers here.

                              A case in point is data 1, 2, 3, 4, 5 where every person who has passed a first course should be able to say that the mean and median are 3, but what is the mode?

                              I respect any answer that the mode is indeterminate here, so a Stata command should return missing, but I also respect the answer of hsmode (which returns 3!).

                              Underlying all this is a idea that equal frequencies for most common values are usually a quirk of small samples. Given more data, most distributions turn out to be unimodal with a definite peak. (Discuss!)

                              Comment


                              • #343 Why hsmode reports mode 3 for 5 instances of 3 and 5 instances of 4 is discussed in the help under Ties.

                                Comment

                                Working...
                                X