Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a simple table with my own specifications?

    Hi,

    I want to create my own table with column names and observations in stata. I have been searching online but everywhere i see the option of creating a table with prespecified options such as in table or tabulate. How can I just build out a simple table like in excel using my own column names and inserting observations into them?

  • #2
    Not easily. However, there are ways of doing so. The best way depends on what it is exactly what you want to do, and your skill level.

    You could look at tabdisp. The strategy here is to manipulate the dataset such that it contains your table (in long format), and use tabdisp to display it as a table

    Another option is matlist. The strategy here is to collect the things you want to display in a matrix, with row and column names and maybe even equation names, and use matlist to display that matrix.

    Finally, you can get maximum flexibility with _tab, but the price of flexibility is complexity
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      ... not forgettting list ...

      Note: in Stata an observation is, in other terms, an entire row or record in the dataset.

      If you mean a table with individual values of variables in the cells, then again that could just be an application of
      list.

      Generic comment: experienced users who answer many questions here often know little or nothing about MS Excel. Either way, a concrete example of what you want to do with data you show us would speed a specific answer to the question.

      Comment


      • #4
        Thanks Maarten and Nick. What I am trying to do is the below. I want to create a table such as the following in Stata. I am using the variablenames from my initial data set and trying to create a table which lists the startdate and enddate for each variable ie I am trying to see from what date to what date do I have non-missing observations for the below variables.
        VariableName StartDate EndDate
        Loan 1992q3 1994q1
        Deposit 1991q1 1995q3
        Since I am not being able to create this directly on Stata, I have found a workaround by creating it on Excel using Stata using the below code:

        Code:
        putexcel set missingdata2.xlsx, sheet(Sheet1) replace
        putexcel A1 = "Variable"
        putexcel B1 = "StartDate"
        putexcel C1 = "EndDate"
        
        //finding first and last date of observations for each variable
        foreach var of varlist _all{
        forvalues i=2/1088{
        drop if missing(`var')
        keep date `var'
        putexcel A`i' = "`var'"
        summdate date
        local min = r(min)
        format min %tq
        local max = r(max)
        format max %tq
        putexcel B`i' = min
        putexcel C`i' = max
        }
        }
        Also Nick Cox as an aside, I am using the command summdate (written by you) and it is giving me the results I want namely min and max date but I am unable to get r(min) or r(max) in the %tq format. So when I run summdate, I get the below results:
        Number of obs 960002
        Mean date (rounded) 2004q1
        Minimum date 1992q4
        Maximum date 2019q3

        SD 30.545

        Now I want to directly use the minimum date, maximum date as is outputted here but when I do return list, I get back the below:
        r(N) = 960002
        r(sum_w) = 960002
        r(mean) = 176.6939016793715
        r(Var) = 932.9810140593518
        r(sd) = 30.54473791112557
        r(min) = 131
        r(max) = 238

        r(sum) = 169626499

        Here, r(min) and r(max) are in numbers and not date format so I can't use it. I tried to convert them by storing them in a local macro and then formatting it but that is not working and is giving the below error:
        factor-variable and time-series operators not allowed

        I am happy to post the last part of my question on summdate as a separate post if that is more appropriate. Just thought I'd mention here because that is part of what I am trying to do and the original command was written by you.
        Last edited by Shreya Dutt; 23 Jan 2020, 01:53.

        Comment


        • #5
          That depends on what your data looks like. Can you give an extract? See help dataex
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Maarten Buis Yes, Please see below an example with one variable named 'cert'. So from the below, I would want to make a new table such as the one I have mentioned before with variablename as cert, startdate as 1992q4 and enddate as 2019q3.

            P.S. I am facing this issue where the date is appearing as numbers and I don't know how to get it in normal Stata date format. My original date is in Stata date format %tq.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input long cert float date
              8 131
              9 131
             14 131
             28 131
             35 131
             39 131
             41 131
             43 131
             45 131
             46 131
             47 131
             49 131
             50 131
             51 131
             52 131
             53 131
             54 131
             56 131
             57 131
             58 131
             59 131
             60 131
             63 131
             64 131
             68 131
             70 131
             72 131
             73 131
             74 131
             77 131
             79 131
             80 131
             81 131
             82 131
             83 131
             84 131
             85 131
             86 131
             87 131
             88 131
             89 131
             90 131
             91 131
             92 131
             93 131
             94 131
             95 131
             96 131
             97 131
             99 131
            100 131
            101 131
            102 131
            103 131
            104 131
            105 131
            107 131
            108 131
            109 131
            110 131
            114 131
            127 131
            130 131
            131 131
            132 131
            133 131
            134 131
            138 131
            144 131
            145 131
            146 131
            147 131
            148 131
            149 131
            151 131
            152 131
            153 131
            155 131
            156 131
            157 131
            158 131
            159 131
            160 131
            161 131
            163 131
            166 131
            168 131
            169 131
            170 131
            172 131
            173 131
            175 131
            176 131
            178 131
            182 131
            183 131
            184 131
            186 131
            187 131
            188 131
            end
            format %tq date

            Comment


            • #7
              All the dates in #6 are the same.

              At a guess you want to show the first and last dates for each distinct cert. That could be just this at the simplest. More variables mean a loop over more variables. You should never need to loop over observations or even panels. .

              Code:
              clear
              input cert date
              1  131
              1  133
              1  134
              2  140
              2  141
              end
              
              bysort cert (date) : gen start = date[1]
              by cert : gen end = date[_N]
              format start end %tq
              
              list cert start end if date == start, noobs  
              
                +------------------------+
                | cert    start      end |
                |------------------------|
                |    1   1992q4   1993q3 |
                |    2   1995q1   1995q2 |
                +------------------------+

              Comment


              • #8
                Thanks Nick Cox I don't want to list the start date for every value of cert. I just want the start and end date for the entire cert time series ie when does the first non -missing value for the variable start and till when do we have non-missing observations available. Could you let me know how I must modify what you suggested to do that?
                I would want my list to show as column names 'variablename' 'start' 'end' and then the first row should be 'cert' '1992q4' 'endq3'. Then I would loop over all the variables to do the same thing and keep adding the information to the subsequent rows of the list.

                Comment


                • #9
                  I was able to solve the problems - both the creation of the table (by using excel) and recording the first and last date of the time series for each variable. Thanks to all that helped! Below is the code I finally wrote:

                  Code:
                  //assign variable labels to each variable name
                  foreach var of varlist _all{
                  label variable `var' "`var'"
                  }
                  
                  //convert variable name to sequential numbering so that we can use that for putexcel loop
                  local i = 2
                  foreach var of varlist _all{
                  char `var'[original_name] "`var'"
                  ren `var' v`i'
                  local ++i
                  }
                  
                  //finding first and last date of observations for each variable
                  forvalues i = 2/1088{
                  local lab`i': variable label v`i'
                  gen lab`i' = "`lab`i''"
                  putexcel A`i' = lab`i'
                  preserve
                  drop if missing(v`i')
                  keep v1088 //keep date column after dropping all missing observations for a variable
                  duplicates drop
                  sort v1088, stable
                  gen begin = string(v1088[1], "%tq")
                  gen last = string(v1088[_N], "%tq")
                  //gen begin = v1088[1]
                  //gen last = v1088[_N]
                  //format begin last %tq
                  putexcel B`i' = begin
                  sleep 5000
                  putexcel C`i' = last
                  sleep 5000
                  restore
                  }

                  Comment


                  • #10
                    A difference in time zones, I guess, meant that I was thinking about this late last night local time but did not get as far as posting code.

                    I now see better what you want although looking back at #3 there was a good hint. The data example in #6 had me jumping in the wrong direction, which was not your fault.

                    1. In essence you have about a thousand variables and you want the first and last dates from each or for each. I don't understand the structure of your data completely there. But this raises a serious question of whether your data structure is fit for purpose. I suspect that you need toreshape long.

                    Looking at your code, I have further strategic suggestions.

                    2. It's not obviously necessary to rename your variables or put their names in variable labels or characteristics. You can just use the names directly.

                    3. preserve and restore one thousand times, dropping missing values, and sorting to get the minimum and maximum are also quite unnecessary.

                    4. A cycle with 3000 putexcel operations is not needed either. Assemble your results in a dataset and then if you want to use MS Excel, export excel the whole lot in one go.


                    I made up a silly dataset, but code for your real dataset need not be much more complicated. I note that tabstat will do a fair job of showing the results you want, but perhaps not in practical form for the number of variables you have.

                    Code:
                    * this is just code for a sandbox dataset 
                    clear
                    set seed 2803  
                    set obs 5 
                    foreach v in frog toad newt { 
                        gen `v' = yq(2000, 1) + runiformint(1, 10) 
                    } 
                    
                    list 
                    
                    tabstat frog toad newt, s(min max) format(%tq) c(s) 
                    
                    
                        variable |       min       max
                    -------------+--------------------
                            frog |    2000q2    2001q3
                            toad |    2000q2    2001q4
                            newt |    2000q3    2002q3
                    ----------------------------------
                    This is the serious suggestion. Set up a loop over variables and post results to a dataset.


                    Code:
                     
                    tempname whatever
                    postfile `whatever' str80 which min max using shreya  
                    
                    foreach v of var * { 
                        su `v', meanonly 
                        post `whatever' ("`v'") (`r(min)') (`r(max)')  
                    }
                    
                    postclose `whatever' 
                    
                    use shreya, clear 
                    format min max %tq
                    
                    list
                    
                         +-------------------------+
                         | which      min      max |
                         |-------------------------|
                      1. |  frog   2000q2   2001q3 |
                      2. |  toad   2000q2   2001q4 |
                      3. |  newt   2000q3   2002q3 |
                         +-------------------------+
                    The above is what you have in a new dataset, which you can export as desired.

                    Comment


                    • #11
                      I see what you mean. Thank you for the suggestion. I will modify my code accordingly.

                      Comment

                      Working...
                      X