Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sort on the number of observations

    Hi,

    Is there a command that can sort the order of variables based on the number of observations in each column?

    Cheers,
    Tom

  • #2
    Mata would probably be your best bet for something like that. I'm curious why you'd want to. Is if for saving paper in a data listing or something?

    Comment


    • #3
      Dataset is on stock prices, Im tying to sort firms by it's age since IPO.

      Comment


      • #4
        If I understand you correctly, then you've got a dataset where the variable names are firm names, and they're probably ordered alphabetically or something, and you'd like to order them by the total number of trading days. If that's the case, then you could do something like that below, which uses Stata alone. Start at the "Begin here" comment; what comes before it is just creating the toy dataset for illustration.

        .ÿversionÿ14.0

        .ÿ
        .ÿclearÿ*

        .ÿsetÿmoreÿoff

        .ÿsetÿseedÿ`=date("2015-08-13",ÿ"YMD")'

        .ÿ
        .ÿinputÿstr6ÿfirm_nameÿintÿtrading_day

        ÿÿÿÿÿfirm_nameÿÿtradin~y
        ÿÿ1.ÿZebcoÿ5
        ÿÿ2.ÿMelbacÿ2
        ÿÿ3.ÿAarlinÿ3
        ÿÿ4.ÿend

        .ÿ
        .ÿquietlyÿexpandÿtrading_day

        .ÿquietlyÿbysortÿfirm_name:ÿreplaceÿtrading_dayÿ=ÿ_n

        .ÿ
        .ÿgenerateÿdoubleÿstock_priceÿ=ÿruniform()

        .ÿformatÿstock_priceÿ%04.2f

        .ÿ
        .ÿquietlyÿreshapeÿwideÿstock_price,ÿi(trading_day)ÿj(firm_name)ÿstring

        .ÿrenameÿstock_price*ÿ*

        .ÿ
        .ÿorderÿtrading_day

        .ÿlist,ÿnoobsÿseparator(0)ÿabbreviate(20)

        ÿÿ+---------------------------------------+
        ÿÿ|ÿtrading_dayÿÿÿAarlinÿÿÿMelbacÿÿÿZebcoÿ|
        ÿÿ|---------------------------------------|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿ0.92ÿÿÿÿÿ0.09ÿÿÿÿ0.06ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿÿ0.53ÿÿÿÿÿ0.15ÿÿÿÿ0.55ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿ0.87ÿÿÿÿÿÿÿÿ.ÿÿÿÿ0.43ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ4ÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿ.ÿÿÿÿ0.71ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ5ÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿ.ÿÿÿÿ0.97ÿ|
        ÿÿ+---------------------------------------+

        .ÿ
        .ÿ*
        .ÿ*ÿBeginÿhere
        .ÿ*
        .ÿ
        .ÿ//ÿNormalizeÿdata
        .ÿforeachÿvarÿofÿvarlistÿ_allÿ{
        ÿÿ2.ÿÿÿÿÿÿÿÿÿifÿ"`var'"ÿ==ÿ"trading_day"ÿcontinue
        ÿÿ3.ÿÿÿÿÿÿÿÿÿrenameÿ`var'ÿstock_price`var'
        ÿÿ4.ÿ}

        .ÿquietlyÿreshapeÿlongÿstock_price,ÿi(trading_day)ÿj(firm_name)ÿstring

        .ÿquietlyÿdropÿifÿmissing(stock_price)

        .ÿ
        .ÿ//ÿGetÿlistÿofÿfirm_namesÿinÿorderÿofÿlongevity
        .ÿpreserve

        .ÿcontractÿfirm_name,ÿfreq(day_tally)

        .ÿgsortÿ-day_tally

        .ÿforvaluesÿdayÿ=ÿ1/`=_N'ÿ{
        ÿÿ2.ÿÿÿÿÿÿÿÿÿlocalÿorderÿ`order'ÿ`=firm_name[`day']'
        ÿÿ3.ÿ}

        .ÿrestore

        .ÿ
        .ÿ//ÿGetÿfirmsÿasÿvariableÿnamesÿ(denormalize)
        .ÿquietlyÿreshapeÿwideÿstock_price,ÿi(trading_day)ÿj(firm_name)ÿstring

        .ÿrenameÿstock_price*ÿ*

        .ÿ
        .ÿ//ÿOrderÿfirmsÿinÿorderÿofÿlongevity
        .ÿorderÿtrading_dayÿ`order'

        .ÿ
        .ÿlist,ÿnoobsÿseparator(0)ÿabbreviate(20)

        ÿÿ+---------------------------------------+
        ÿÿ|ÿtrading_dayÿÿÿZebcoÿÿÿAarlinÿÿÿMelbacÿ|
        ÿÿ|---------------------------------------|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿ0.06ÿÿÿÿÿ0.92ÿÿÿÿÿ0.09ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ2ÿÿÿÿ0.55ÿÿÿÿÿ0.53ÿÿÿÿÿ0.15ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿ0.43ÿÿÿÿÿ0.87ÿÿÿÿÿÿÿÿ.ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ4ÿÿÿÿ0.71ÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿ.ÿ|
        ÿÿ|ÿÿÿÿÿÿÿÿÿÿÿ5ÿÿÿÿ0.97ÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿ.ÿ|
        ÿÿ+---------------------------------------+

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .ÿ


        If you're comfortable with Mata, then you could use it to more elegantly return a local macro back to Stata for use in ordering the variables.

        Comment


        • #5
          Code:
          clear
          set obs 100
          * Some numbers of obs, not ordered
          glo numbers "10 40 26 76 24 49 96 12"
          * Generate variables with these numbers of observations
          foreach n in $numbers {
              gen x`n'=.
              replace x`n'=uniform() in 1/`n'
          }
          
          * Append the number of observations to the beginning of each variable name
          foreach x of varlist * {
              count if !mi(`x')
              glo n=r(N)
              ren `x' _$n`x'    
          }
          * Order alphabetically
          order _all, alphabetic
          * Remove prefixes
          rename _*x* x*
          Jorge Eduardo Pérez Pérez
          www.jorgeperezperez.com

          Comment


          • #6
            Same code reworked to match Joseph's example with arbitrary variable names

            Code:
            clear
            input trading_day Aarlin Melbac Zebco 
            1 0.92 0.09 0.06
            2 0.53 0.15 0.55
            3 0.87 . 0.43
            4 . . 0.71
            5 . . 0.97
            end
            
            * Append the number of observations to the beginning of each variable name
            glo vars ""
            foreach x of varlist Aarlin-Zebco {
                count if !mi(`x')
                glo n=r(N)
                * I use a double underscore here to avoid clashes with the single underscore of trading_day, you can use any character
                ren `x' a${n}__`x'    
            }
            * Order alphabetically
            order _all, alphabetic
            * Remove prefixes
            rename *__* *[2]
            Jorge Eduardo Pérez Pérez
            www.jorgeperezperez.com

            Comment


            • #7
              Thanks guys!

              Comment


              • #8
                Jorge, that looks great. I misunderstood the original poster and thought that he wanted the longest-lived firms first. I can't tell, but would a2049ABCo order after a365XYZInc in your code? If not, then maybe something like
                Code:
                global n : display %05.0f r(N)
                would help, if I've got that correct.

                Comment


                • #9
                  I agree, great correction. The leading zeros on the number of observations would be needed.

                  Code:
                  glo list = "a2049ABCo a365XYZInc"
                  glo olist : list sort global list
                  di "$olist"
                  glo list = "a2049ABCo a0365XYZInc"
                  glo olist : list sort global list
                   di "$olist"
                  So, to recap, the fixed code would be

                  Code:
                  clear
                  input trading_day Aarlin Melbac Zebco  
                  1 0.92 0.09 0.06
                  2 0.53 0.15 0.55
                  3 0.87 . 0.43
                  4 . . 0.71
                  5 . . 0.97
                  end
                  
                  * Append the number of observations to the beginning of each variable name
                  foreach x of varlist Aarlin-Zebco {
                      count if !mi(`x')
                      glo n: di %05.0f r(N)
                      * I use a double underscore here to avoid clashes with the single underscore of trading_day, you can use any character
                      ren `x' a${n}__`x'    
                  }
                  * Order alphabetically
                  order _all, alphabetic
                  * Remove prefixes
                  rename *__* *[2]
                  Last edited by Jorge Eduardo Perez Perez; 13 Aug 2015, 07:45.
                  Jorge Eduardo Pérez Pérez
                  www.jorgeperezperez.com

                  Comment


                  • #10
                    Pretty late to the game, but I set up a small program a couple of years ago, to do a somehow related thing in response to a question on old Statalist.

                    Anyway, here is a revised version of my code that I will also send to Kit Baum along with the help file

                    Code:
                    *! version 1.3.0 13aug2015 daniel klein
                    
                    pr vorter
                        vers 11.2
                        
                        gettoken sign 0 : 0 , p("+-")
                        if (!inlist(`"`sign'"', "+", "-")) {
                            loc 0 `sign' `0'
                            loc sign +
                        }
                        loc sign = (0 `sign' 1)
                        
                        gettoken stat : 0 , m(par)
                        if mi("`par'") {
                            loc stat // reset empty
                            loc inq [in/]
                        }
                        else {
                            gettoken stat 0 : 0 , m(par)
                            vorter_set_stat `stat'
                            loc num num
                        }
                        
                        syntax varlist(`num' min = 2) `inq' ///
                            [ , Return * ]
                        
                        cap conf numeric v `varlist'
                        loc isstr = _rc
                        if (`isstr') {
                            conf str v `varlist'
                        }
                        
                        if mi("`in'") {
                            if mi("`stat'") {
                                loc in 1
                            }
                            else {
                                loc in .
                            }
                        }
                        else {
                            numlist "`in'" , max(1)
                            loc in `r(numlist)'
                        }
                        
                        m : mf_vorter("varlist", `in', `sign')
                        
                        if mi("`return'") {
                            if (`"`macval(options)'"' != "") {
                                loc cma ,
                            }
                            order `r(varlist)' `cma' `options'
                        }
                    end
                    
                    pr vorter_set_stat
                        vers 11.2
                        
                        args stat void
                        
                        if mi(`"`stat'"') | (`"`void'"' != "") {
                            err 198
                        }
                        
                        loc 0 , `stat'
                        syntax ///
                        [ , ///
                            Mean     ///
                            COUnt     ///
                            N         /// not documented
                            MAx     ///
                            MIn     ///
                            SUm     ///
                            * ///
                        ]
                        
                        loc stat `mean' `count' `n' `max' `min' `sum'
                        if mi("`stat'") {
                            di as err `"`macval(options)' unknown {it:stat}"'
                            e 198
                        }
                        
                        c_local stat `stat'
                    end
                    
                    vers 11.2
                    
                    m :
                    
                    void mf_vorter(string rowvector vars,
                                    real scalar obsno,
                                    real scalar sign)
                    {
                        real scalar isstr
                        string scalar stat
                        transmorphic matrix sx
                        
                        vars     = tokens(st_local(vars))
                        isstr     = (st_local("isstr") == "1")
                        stat     = st_local("stat")
                        
                        /*
                            we add the variable indices
                            as the second/last row to sx
                        */
                        
                        if (isstr) {
                            sx = st_sdata(obsno, vars)
                            sx = sx\ strofreal(st_varindex(vars))
                        }
                        else {
                            sx = st_data(obsno, vars)
                            sx = sx\ st_varindex(vars)
                        }
                        
                        if (stat != "") {
                            sx = mf_vorter_add_stat(sx)
                        }
                        
                        sx = sort(sx', sign)'
                        
                        if (isstr) {
                            sx = strtoreal(sx)
                        }
                        
                        vars = st_varname((sort(st_varindex(vars)', 1)'))
                        
                        st_rclear()
                        st_global("r(oorder)", invtokens(vars))
                        st_global("r(varlist)", invtokens(st_varname(sx[2, .])))
                        
                        if (stat != "") {
                            st_matrix("r(" + stat + ")", sx[1, .])
                            st_matrixrowstripe("r(" + stat + ")", ("", stat))
                            st_matrixcolstripe("r(" + stat + ")", ///
                            (J(cols(vars), 1, ""), st_varname(sx[2, .])'))
                        }
                    }
                    
                    real matrix mf_vorter_add_stat(real matrix x)
                    {
                        real matrix xlast
                        string scalar stat
                        
                        /* strip variable indices */
                        
                        xlast = x[rows(x), .]
                        x = x[(1::rows(x) - 1), .]
                        
                        stat = st_local("stat")
                        
                        if (stat == "mean") {
                            x = (colsum(x) :/ colnonmissing(x))
                        }
                        else if (anyof(("count", "n"), stat)) {
                            x = colnonmissing(x)
                        }
                        else if (stat == "max") {
                            x = colmax(x)
                        }
                        else if (stat == "min") {
                            x = colmin(x)
                        }
                        else if (stat == "sum") {
                            x = colsum(x)
                        }
                        else {
                            assert(0)
                        }
                        
                        /* add variable indices */
                        
                        x = x\ xlast
                    
                        return(x)
                    }
                    
                    end
                    e
                    The results above are obtained with

                    Code:
                    vorter (count) Aarlin-Zebco
                    Best
                    Daniel

                    Comment

                    Working...
                    X