Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting for panels within panels

    I am hoping to produce a table which counts the number of fund ID’s (ie an ID is distinct to each fund) for different characteristics at the end of each year (ie the latest observed date for each year).
    I can count how many of each characteristic (eg how many funds where retail_fund=="Y") following an approach set out in
    Cox, N. J. (2007). Speaking Stata: Counting groups, especially panels. The Stata Journal, 7(4), 571-581.
    Here is a simple example

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long id float(month ff) str1 retail_fund
    2708 665   -.00494564 "N"
    2708 666  -.009506066 "N"
    2708 679  -.006455688 "N"
    2708 680  -.009019009 "N"
    2724 666 -.0045031635 "N"
    2724 667  -.009472995 "N"
    2724 684 -.0017794066 "N"
    2724 685  -.003388667 "N"
    2745 667 -.0045014257 "Y"
    2745 668 -.0032780494 "Y"
    2745 696   -.06255567 "Y"
    2745 697   -.01358385 "Y"
    end
    format %tmCCYY_Mon month
    Code:
    by id (month): gen byte tag = _n == 1
    count if tag
    count if retail_fund=="Y" & tag
    I intend saving each of the counts and then displaying them in a suitable table.
    However, I am having trouble counting the number of fund ID’s at the end of each year (and for each characteristic – eg if retail_fund == “Y”)
    I begin by producing a year() variable but am confused on how to set up the "by id (month): gen byte tag = _n == 1" command to count the last month for each ID in each year.
    Thank you very much, Dan

  • #2
    I don't fully understand this. The last month on record within a year (presumably you mean calendar year) is easy enough, but _n == 1 is not the right detail as it won't yield the last month unless there is only one.

    Code:
    clear
    input long id float(month ff) str1 retail_fund
    2708 665   -.00494564 "N"
    2708 666  -.009506066 "N"
    2708 679  -.006455688 "N"
    2708 680  -.009019009 "N"
    2724 666 -.0045031635 "N"
    2724 667  -.009472995 "N"
    2724 684 -.0017794066 "N"
    2724 685  -.003388667 "N"
    2745 667 -.0045014257 "Y"
    2745 668 -.0032780494 "Y"
    2745 696   -.06255567 "Y"
    2745 697   -.01358385 "Y"
    end
    format %tmCCYY_Mon month
    
    gen year = yofd(dofm(month)) 
    
    bysort id year (month) : gen last = _n == _N 
    
    list, sepby(id year) 
    
         +------------------------------------------------------+
         |   id      month          ff   retail~d   year   last |
         |------------------------------------------------------|
      1. | 2708   2015 Jun   -.0049456          N   2015      0 |
      2. | 2708   2015 Jul   -.0095061          N   2015      1 |
         |------------------------------------------------------|
      3. | 2708   2016 Aug   -.0064557          N   2016      0 |
      4. | 2708   2016 Sep    -.009019          N   2016      1 |
         |------------------------------------------------------|
      5. | 2724   2015 Jul   -.0045032          N   2015      0 |
      6. | 2724   2015 Aug    -.009473          N   2015      1 |
         |------------------------------------------------------|
      7. | 2724   2017 Jan   -.0017794          N   2017      0 |
      8. | 2724   2017 Feb   -.0033887          N   2017      1 |
         |------------------------------------------------------|
      9. | 2745   2015 Aug   -.0045014          Y   2015      0 |
     10. | 2745   2015 Sep    -.003278          Y   2015      1 |
         |------------------------------------------------------|
     11. | 2745   2018 Jan   -.0625557          Y   2018      0 |
     12. | 2745   2018 Feb   -.0135839          Y   2018      1 |
         +------------------------------------------------------+

    Comment


    • #3
      Thank you Nick, yes, sorry, that was meant to be the last (ie _n==_N) not the first! I hope you don't mind me posing an additional query?
      I'm attempting to save the counts for each year (and then I'll produce a table containing the counts). However, when I attempt the following commands I am getting a "type mismatch" error and am not sure if I need to clear the previous count first or if there is something wrong with using forval to create scalar names?

      Code:
      gen year = yofd(dofm(month))
      format year %tyCCYY
      bysort id year (month): gen byte tag = _n == _N
      forval y = 2015/2018 {
      count if year==`y' & tag
      scalar fund`y' = r(N)
      count if year==`y' & tag & retail_fund==1
      scalar retail`y' = r(N)
      }
      Thank you, Dan

      Comment


      • #4
        In #1 retail_fund is a string variable. All of a sudden you are treating it as numeric.

        Comment


        • #5
          Sorry Nic, yes more rooky errors! corrected the 3rd last line to - count if year==`y' & tag & retail_fund=="Y"
          I'm now attempting to produce a table of the scalars for each year for fund and retail_fund and I was planning to reading and follow Ian Watson’s tabout approach et
          https://www.ianwatson.com.au/stata/tabout_tutorial.pdf
          But have also seen svret and texsave at
          https://www.stata.com/meeting/boston...ton10_reif.pdf
          and estout at
          https://www.ssc.wisc.edu/sscc/pubs/stata_tables.htm
          Do you happen to have a preference for publishable presentations of data
          (in case it influences the choice, my next stage is to graph the counts over time). Thanks again for your help, Dan

          Comment


          • #6
            There are some very well thought-out programs out there by very able and experienced people. I don’t use any those you mention, and no reasons that bite for anyone else explain that. I tend to write my own Stata code and then finish off with Plain TeX mark-up. Preparing for tables in MS Excel or MS Word is not a real personal concern. So I am not the right person to answer that.

            More crucially, publishable presentation of results covers such an extraordinary range that I guess others would need more precision on what you want to give good advice.

            Comment


            • #7
              Thanks Nic, it looks like turning my scalar data into variables then enables efficient production of tables (and should also be easier to graph).
              Are you able to help me generate variables from the saved scalars?
              The original data was
              Code:
              clear
              input long id float(month ff) str1 retail_fund
              2708 665   -.00494564 "N"
              2708 666  -.009506066 "N"
              2708 679  -.006455688 "N"
              2708 680  -.009019009 "N"
              2724 666 -.0045031635 "N"
              2724 667  -.009472995 "N"
              2724 684 -.0017794066 "N"
              2724 685  -.003388667 "N"
              2745 667 -.0045014257 "Y"
              2745 668 -.0032780494 "Y"
              2745 696   -.06255567 "Y"
              2745 697   -.01358385 "Y"
              end
              format %tmCCYY_Mon month
              From which scalars stored the counted number of funds and retail funds per year using
              Code:
              gen year = yofd(dofm(month))
              format year %tyCCYY
              bysort id year (month): gen byte tag = _n == _N
              forval y = 2015/2018 {
              count if year==`y' & tag
              scalar fund`y' = r(N)
              count if year==`y' & tag & retail_fund=="Y"
              scalar retail`y' = r(N)
              }
              What I’m hoping to do is convert the scalar results into variables so that they look like this
              Code:
              clear
              input year retail fund
              2015 1 3
              2016 0 1
              2017 0 1
              2018 1 1
              end
              format %tyCCYY year
              In addition to making tables from these variables I’ll also graph the numbers through time. Sorry for the circular questions, Dan

              Comment


              • #8
                Note that retail fund is a typo for retail_fund

                Does this help?


                Code:
                clear
                input long id float(month ff) str1 retail_fund
                2708 665   -.00494564 "N"
                2708 666  -.009506066 "N"
                2708 679  -.006455688 "N"
                2708 680  -.009019009 "N"
                2724 666 -.0045031635 "N"
                2724 667  -.009472995 "N"
                2724 684 -.0017794066 "N"
                2724 685  -.003388667 "N"
                2745 667 -.0045014257 "Y"
                2745 668 -.0032780494 "Y"
                2745 696   -.06255567 "Y"
                2745 697   -.01358385 "Y"
                end
                
                format %tmCCYY_Mon month
                gen year = yofd(dofm(month))
                bysort id year (month): keep if _n == _N
                contract year retail_fund, zero
                Formatting yearly dates to show years doesn't do any harm, but it's not necessary. 2014 will show as such, and so forth.

                Comment


                • #9
                  Thank you Nick for the suggestion to an alternative pathway. However, the scalar looping approach looks like it will flexibly accommodate the twenty-odd other characteristics in the full data set (ie in addition to retail_fund - typo noted, thanks). The eventual tables will display counts across different combinations over time.
                  Applying the two code blocks from #7 above and then the
                  Code:
                  scalar list
                  command I have
                  Code:
                  retail2018 =          1
                    fund2018 =          1
                  retail2017 =          0
                    fund2017 =          1
                  retail2016 =          0
                    fund2016 =          1
                  retail2015 =          1
                    fund2015 =          3
                  Which I’m trying to get into variable form (no typo on the new variable name for retail)
                  Code:
                  clear
                  input year retail fund
                  2015 1 3
                  2016 0 1
                  2017 0 1
                  2018 1 1
                  end
                  To turn the scalars into variables, I’m trying to loop back over the scalar names and add them incrementally onto new variables – but having trouble looping through the scalar names - eg my failed attempt looks like this
                  Code:
                  clear
                  gen retail = .
                  scalar y=2014
                  * calculate length of variables
                  scalar length=(2018-2015+1)
                  forval i=1/length {
                                scalar y=(y+1)
                      replace retail = retail`y' in `i'
                  }
                  Sorry to be a pain, thank you, Dan

                  Comment


                  • #10
                    Pls ignore #9 I think I’m closer now - ie by saving the counts as variables instead.
                    For anyone following this conversation with a similar objective, I followed Clyde Schechter’s second code block in #3 of
                    https://www.statalist.org/forums/for...a-foreach-loop
                    and came up with
                    Code:
                    tempfile holding
                    capture postutil clear
                    postfile handle str32 year float retail float funds using `holding'
                     
                    gen year = yofd(dofm(month))
                    bysort id year (month): gen byte tag = _n == _N
                    forval y = 2015/2018 {
                    count if year==`y' & tag & retail_fund=="Y"
                    scalar number = r(N)
                    count if year==`y' & tag
                    post handle (`"`y'"') (number) (`r(N)')
                    }
                    postclose handle
                    clear
                    use `holding'
                    Thanks for your suggestions Nick,
                    I’ll see how I go producing publishable tables from this – but will re-post #5 as a general query of preferences (with examples) if I encounter more road blocks,
                    thanks, Dan

                    Comment


                    • #11
                      The counts of Y and N yielded by #8 are surely the same information as the counts of Y and (Y and N) yielded by #10.

                      Here is another approach that avoids any choreography with extra files:

                      Code:
                      clear
                      input long id float(month ff) str1 retail_fund
                      2708 665   -.00494564 "N"
                      2708 666  -.009506066 "N"
                      2708 679  -.006455688 "N"
                      2708 680  -.009019009 "N"
                      2724 666 -.0045031635 "N"
                      2724 667  -.009472995 "N"
                      2724 684 -.0017794066 "N"
                      2724 685  -.003388667 "N"
                      2745 667 -.0045014257 "Y"
                      2745 668 -.0032780494 "Y"
                      2745 696   -.06255567 "Y"
                      2745 697   -.01358385 "Y"
                      end
                      
                      format %tmCCYY_Mon month
                      gen year = yofd(dofm(month))
                      bysort id year (month): gen tag = _n == _N
                      
                      egen count_yes = total(retail_fund == "Y" & tag), by(year)
                      
                      egen count_all = total(tag), by(year)
                      
                      tabdisp year, c(count_yes count_all)
                      
                      ----------------------------------
                           year |  count_yes   count_all
                      ----------+-----------------------
                           2015 |          1           3
                           2016 |          0           1
                           2017 |          0           1
                           2018 |          1           1
                      ----------------------------------

                      Comment


                      • #12
                        Thanks for the follow up Nick. Yes, that's very neat. Also just read your "speaking stat: how to move step by: step" on a domestic flight. I appreciate your insight that I will miss the power of by: if I continue to apply a mental frame of using loops to solve problems. Thanks again, Dan

                        Comment

                        Working...
                        X