Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dtable difference columns

    Hello everyone, I am new into Stata, so sorry if the question is at the elementary level. So it has been said, I want to add differences in means into my table, but I cant. I tried to do it with collect but I just cant extract it. Please if someone knows about it, help me.

    Here is my code

    dtable, continuous(TNA netreturn expense turnover fee_12b1 management_fee average_age weighted_age num_of_funds num_of_etfs share_etfs num_of_index_all share_index_all num_of_active_etf share_active_etf style_diversity_number style_diversity_weighted alpha4f flow, statistics(mean sd median min max)) by(active_etf_family, tests) export(summary.xlsx, replace)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[TNA]) : ttest TNA, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[netreturn]) : ttest netreturn, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[expense]) : ttest expense, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[turnover]) : ttest turnover, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[fee_12b1]) : ttest fee_12b1, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[management_fee]) : ttest management_fee, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[average_age]) : ttest average_age, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[weighted_age]) : ttest weighted_age, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[num_of_funds]) : ttest num_of_funds, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[num_of_etfs]) : ttest num_of_etfs, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[share_etfs]) : ttest share_etfs, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[num_of_index_all]) : ttest num_of_index_all, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[share_index_all]) : ttest share_index_all, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[num_of_active_etf]) : ttest num_of_active_etf, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[share_etfs]) : ttest share_etfs, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[num_of_index_all]) : ttest num_of_index_all, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[share_index_all]) : ttest share_index_all, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[num_of_active_etf]) : ttest num_of_active_etf, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[share_active_etf]) : ttest share_active_etf, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[style_diversity_number]) : ttest style_diversity_number, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[style_diversity_weighted]) : ttest style_diversity_weighted, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[alpha4f]) : ttest alpha4f, by(active_etf_family)
    collect r(mu_1) r(mu_2) r(diff) r(p), tags(varname[flow]) : ttest flow, by(active_etf_family)
    collect levelsof result
    collect style header result, level(label)
    collect layout (var) (active_etf_family#result[mean sd median min max _dtable_test)
    collect export summary.xlsx, replace


    in the collect layout part, I just do not know how to do. Thanks in advance and for your attention,.

  • #2
    Welcome to Statalist. It would be helpful if you created a small example (possibly using one of Stata's datasets) to illustrate what you want. In this way, you increase your chances of obtaining a helpful reply. See FAQ Advice #12 for tips on this.

    Comment


    • #3
      Since you don't provide a data example, I am using an inbuilt Stata dataset.

      Consider:

      Code:
      sysuse auto, clear
      
      local tabvars price mpg rep78
      qui dtable, continuous(`tabvars', statistics(mean sd median min max)) by(foreign, tests nototal)
      
      foreach v of local tabvars {
          collect get diff = r(mu_1)-r(mu_2), tags(var[`v']): ttest `v', by(foreign)
      }
      
      collect label levels result mean "Mean" sd "Std. Dev." median "Median" min "Min" max "Max" diff "Difference", replace
      collect style header result, level(label)
      collect style cell result#cell_type[column-header], border(top) 
      collect layout (var) (foreign#result[mean sd median min max] result[diff p])
      which produces:
      Code:
      . collect preview
      
      ---------------------------------------------------------------------------------------------------------------------------------------------
                                                                         Car origin                                                Difference   p  
                                               Domestic                                              Foreign                                       
                         ---------------------------------------------------------------------------------------------------------                 
                            Mean    Std. Dev.    Median     Min        Max       Mean    Std. Dev.    Median     Min        Max                    
      ---------------------------------------------------------------------------------------------------------------------------------------------
      Price              6,072.423 (3,097.104) 4,782.500 3,291.000 15,906.000 6,384.682 (2,621.915) 5,759.000 3,748.000 12,990.000   -312.259 0.680
      Mileage (mpg)         19.827     (4.743)    19.000    12.000     34.000    24.773     (6.611)    24.500    14.000     41.000     -4.946 0.001
      Repair record 1978     3.021     (0.838)     3.000     1.000      5.000     4.286     (0.717)     4.000     3.000      5.000     -1.265 0.000
      ---------------------------------------------------------------------------------------------------------------------------------------------

      Comment


      • #4
        Incidentally, while coming up with this solution, I found something that appears to be a bug. Named expressions in collect get do not seem to work with spaces. With the same auto dataset:

        Code:
        . qui ttest price, by(foreign)
        
        . collect get diff = r(mu_1) - r(mu_2)
        invalid syntax
        r(198);
        
        . collect get diff = r(mu_1)-r(mu_2)
        The last command works fine, but not the middle one. The same happens with collect get used as a prefix. This behaviour is unexpected because expressions usually are unaffected by spaces, and there is no documentation that I am aware of, that says that named expressions are so affected. Marking Jeff Pitblado (StataCorp) for Stata Corp's attention.

        Comment


        • #5

          Typically, when a command accepts/expects a list of expressions, they must be bound in parentheses. For examples, see the documentation for postfile, frame post, bootstrap, jackknife, permute, simulate, ...

          collect get's parsing code relaxes this requirement since it also allows you to specify/identify individual result identifiers to also be added the list of automatic results. Consider the following
          Code:
          ttest price, by(foreign)
          collect get mean = r(mu_1) r(p)
          collect query autolevels result
          Your example simply needs parentheses to help the parsing code understand where your expression begins and ends.
          Code:
          qui ttest price, by(foreign)
          collect get diff = (r(mu_1) - r(mu_2))
          collect query autolevels result

          Comment


          • #6
            Jeff Pitblado (StataCorp) thanks for that explanation! I would still request that some guidance to this effect be included in the help and manual entries for collect get. The help for postfile and frame post explicitly shows the parentheses (I think because in those cases, they are required), while for bootstrap, jackknife etc, the help contains links to help exp_list, which once again shows the use of parentheses. For collect get however, there is nothing similar in the documentation, and so one would not guess that parentheses may be needed to parse the expression correctly.

            Comment


            • #7
              Thanks a lot for the solution!!!

              Comment


              • #8

                Documentation changes are rarely mentioned in help whatsnew, so I'm mentioning it here for those interested.

                In the Stata 19 update on 21may2025, the help file and PDF documentation for named expressions in collect get were updated to mention using parentheses when the expression contains spaces.

                named expressions are specified as name = exp, where name may be any valid Stata name and exp is an expression, typically an expression that involves one or more result identifiers. Expressions with spaces must be bound in parentheses. Examples of named expressions are diff = (r(mean_2) - r(mean_1)) and sd = sqrt(r(variance)).

                Comment

                Working...
                X