counting for panels within panels

Dan Daugaard

Join Date: Jul 2018

Posts: 61
#1

counting for panels within panels

14 Oct 2019, 17:29

I am hoping to produce a table which counts the number of fund ID’s (ie an ID is distinct to each fund) for different characteristics at the end of each year (ie the latest observed date for each year).
I can count how many of each characteristic (eg how many funds where retail_fund=="Y") following an approach set out in
Cox, N. J. (2007). Speaking Stata: Counting groups, especially panels. The Stata Journal, 7(4), 571-581.
Here is a simple example

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input long id float(month ff) str1 retail_fund 2708 665 -.00494564 "N" 2708 666 -.009506066 "N" 2708 679 -.006455688 "N" 2708 680 -.009019009 "N" 2724 666 -.0045031635 "N" 2724 667 -.009472995 "N" 2724 684 -.0017794066 "N" 2724 685 -.003388667 "N" 2745 667 -.0045014257 "Y" 2745 668 -.0032780494 "Y" 2745 696 -.06255567 "Y" 2745 697 -.01358385 "Y" end format %tmCCYY_Mon month

Code:

by id (month): gen byte tag = _n == 1 count if tag count if retail_fund=="Y" & tag

I intend saving each of the counts and then displaying them in a suitable table.
However, I am having trouble counting the number of fund ID’s at the end of each year (and for each characteristic – eg if retail_fund == “Y”)
I begin by producing a year() variable but am confused on how to set up the "by id (month): gen byte tag = _n == 1" command to count the last month for each ID in each year.
Thank you very much, Dan
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35636

15 Oct 2019, 00:28

I don't fully understand this. The last month on record within a year (presumably you mean calendar year) is easy enough, but _n == 1 is not the right detail as it won't yield the last month unless there is only one.

Code:

clear
input long id float(month ff) str1 retail_fund
2708 665   -.00494564 "N"
2708 666  -.009506066 "N"
2708 679  -.006455688 "N"
2708 680  -.009019009 "N"
2724 666 -.0045031635 "N"
2724 667  -.009472995 "N"
2724 684 -.0017794066 "N"
2724 685  -.003388667 "N"
2745 667 -.0045014257 "Y"
2745 668 -.0032780494 "Y"
2745 696   -.06255567 "Y"
2745 697   -.01358385 "Y"
end
format %tmCCYY_Mon month

gen year = yofd(dofm(month)) 

bysort id year (month) : gen last = _n == _N 

list, sepby(id year) 

     +------------------------------------------------------+
     |   id      month          ff   retail~d   year   last |
     |------------------------------------------------------|
  1. | 2708   2015 Jun   -.0049456          N   2015      0 |
  2. | 2708   2015 Jul   -.0095061          N   2015      1 |
     |------------------------------------------------------|
  3. | 2708   2016 Aug   -.0064557          N   2016      0 |
  4. | 2708   2016 Sep    -.009019          N   2016      1 |
     |------------------------------------------------------|
  5. | 2724   2015 Jul   -.0045032          N   2015      0 |
  6. | 2724   2015 Aug    -.009473          N   2015      1 |
     |------------------------------------------------------|
  7. | 2724   2017 Jan   -.0017794          N   2017      0 |
  8. | 2724   2017 Feb   -.0033887          N   2017      1 |
     |------------------------------------------------------|
  9. | 2745   2015 Aug   -.0045014          Y   2015      0 |
 10. | 2745   2015 Sep    -.003278          Y   2015      1 |
     |------------------------------------------------------|
 11. | 2745   2018 Jan   -.0625557          Y   2018      0 |
 12. | 2745   2018 Feb   -.0135839          Y   2018      1 |
     +------------------------------------------------------+

Comment

Dan Daugaard

Join Date: Jul 2018

Posts: 61
#3

16 Oct 2019, 16:36

Thank you Nick, yes, sorry, that was meant to be the last (ie _n==_N) not the first! I hope you don't mind me posing an additional query?
I'm attempting to save the counts for each year (and then I'll produce a table containing the counts). However, when I attempt the following commands I am getting a "type mismatch" error and am not sure if I need to clear the previous count first or if there is something wrong with using forval to create scalar names?

Code:

gen year = yofd(dofm(month)) format year %tyCCYY bysort id year (month): gen byte tag = _n == _N forval y = 2015/2018 { count if year==`y' & tag scalar fund`y' = r(N) count if year==`y' & tag & retail_fund==1 scalar retail`y' = r(N) }

Thank you, Dan
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35636
#4

16 Oct 2019, 16:50

In #1 retail_fund is a string variable. All of a sudden you are treating it as numeric.
Comment
Dan Daugaard

Join Date: Jul 2018

Posts: 61
#5

18 Oct 2019, 19:22

Sorry Nic, yes more rooky errors! corrected the 3rd last line to - count if year==`y' & tag & retail_fund=="Y"
I'm now attempting to produce a table of the scalars for each year for fund and retail_fund and I was planning to reading and follow Ian Watson’s tabout approach et
https://www.ianwatson.com.au/stata/tabout_tutorial.pdf
But have also seen svret and texsave at
https://www.stata.com/meeting/boston...ton10_reif.pdf
and estout at
https://www.ssc.wisc.edu/sscc/pubs/stata_tables.htm
Do you happen to have a preference for publishable presentations of data
(in case it influences the choice, my next stage is to graph the counts over time). Thanks again for your help, Dan
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35636
#6

19 Oct 2019, 01:47

There are some very well thought-out programs out there by very able and experienced people. I don’t use any those you mention, and no reasons that bite for anyone else explain that. I tend to write my own Stata code and then finish off with Plain TeX mark-up. Preparing for tables in MS Excel or MS Word is not a real personal concern. So I am not the right person to answer that.

More crucially, publishable presentation of results covers such an extraordinary range that I guess others would need more precision on what you want to give good advice.
1 like
Comment

Dan Daugaard

Join Date: Jul 2018
Posts: 61

19 Oct 2019, 23:04

Thanks Nic, it looks like turning my scalar data into variables then enables efficient production of tables (and should also be easier to graph).
Are you able to help me generate variables from the saved scalars?
The original data was

Code:

clear
input long id float(month ff) str1 retail_fund
2708 665   -.00494564 "N"
2708 666  -.009506066 "N"
2708 679  -.006455688 "N"
2708 680  -.009019009 "N"
2724 666 -.0045031635 "N"
2724 667  -.009472995 "N"
2724 684 -.0017794066 "N"
2724 685  -.003388667 "N"
2745 667 -.0045014257 "Y"
2745 668 -.0032780494 "Y"
2745 696   -.06255567 "Y"
2745 697   -.01358385 "Y"
end
format %tmCCYY_Mon month

From which scalars stored the counted number of funds and retail funds per year using

Code:

gen year = yofd(dofm(month))
format year %tyCCYY
bysort id year (month): gen byte tag = _n == _N
forval y = 2015/2018 {
count if year==`y' & tag
scalar fund`y' = r(N)
count if year==`y' & tag & retail_fund=="Y"
scalar retail`y' = r(N)
}

What I’m hoping to do is convert the scalar results into variables so that they look like this

Code:

clear
input year retail fund
2015 1 3
2016 0 1
2017 0 1
2018 1 1
end
format %tyCCYY year

In addition to making tables from these variables I’ll also graph the numbers through time. Sorry for the circular questions, Dan

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35636

20 Oct 2019, 01:59

Note that retail fund is a typo for retail_fund

Does this help?

Code:

clear
input long id float(month ff) str1 retail_fund
2708 665   -.00494564 "N"
2708 666  -.009506066 "N"
2708 679  -.006455688 "N"
2708 680  -.009019009 "N"
2724 666 -.0045031635 "N"
2724 667  -.009472995 "N"
2724 684 -.0017794066 "N"
2724 685  -.003388667 "N"
2745 667 -.0045014257 "Y"
2745 668 -.0032780494 "Y"
2745 696   -.06255567 "Y"
2745 697   -.01358385 "Y"
end

format %tmCCYY_Mon month
gen year = yofd(dofm(month))
bysort id year (month): keep if _n == _N
contract year retail_fund, zero

Formatting yearly dates to show years doesn't do any harm, but it's not necessary. 2014 will show as such, and so forth.

Comment

Dan Daugaard

Join Date: Jul 2018

Posts: 61
#9

20 Oct 2019, 15:41

Thank you Nick for the suggestion to an alternative pathway. However, the scalar looping approach looks like it will flexibly accommodate the twenty-odd other characteristics in the full data set (ie in addition to retail_fund - typo noted, thanks). The eventual tables will display counts across different combinations over time.
Applying the two code blocks from #7 above and then the

Code:

scalar list

command I have

Code:

retail2018 = 1 fund2018 = 1 retail2017 = 0 fund2017 = 1 retail2016 = 0 fund2016 = 1 retail2015 = 1 fund2015 = 3

Which I’m trying to get into variable form (no typo on the new variable name for retail)

Code:

clear input year retail fund 2015 1 3 2016 0 1 2017 0 1 2018 1 1 end

To turn the scalars into variables, I’m trying to loop back over the scalar names and add them incrementally onto new variables – but having trouble looping through the scalar names - eg my failed attempt looks like this

Code:

clear gen retail = . scalar y=2014 * calculate length of variables scalar length=(2018-2015+1) forval i=1/length { scalar y=(y+1) replace retail = retail`y' in `i' }

Sorry to be a pain, thank you, Dan
Comment
Dan Daugaard

Join Date: Jul 2018

Posts: 61
#10

20 Oct 2019, 21:10

Pls ignore #9 I think I’m closer now - ie by saving the counts as variables instead.
For anyone following this conversation with a similar objective, I followed Clyde Schechter’s second code block in #3 of
https://www.statalist.org/forums/for...a-foreach-loop
and came up with

Code:

tempfile holding capture postutil clear postfile handle str32 year float retail float funds using `holding' gen year = yofd(dofm(month)) bysort id year (month): gen byte tag = _n == _N forval y = 2015/2018 { count if year==`y' & tag & retail_fund=="Y" scalar number = r(N) count if year==`y' & tag post handle (`"`y'"') (number) (`r(N)') } postclose handle clear use `holding'

Thanks for your suggestions Nick,
I’ll see how I go producing publishable tables from this – but will re-post #5 as a general query of preferences (with examples) if I encounter more road blocks,
thanks, Dan
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35636

#11

21 Oct 2019, 03:19

The counts of Y and N yielded by #8 are surely the same information as the counts of Y and (Y and N) yielded by #10.

Here is another approach that avoids any choreography with extra files:

Code:

clear
input long id float(month ff) str1 retail_fund
2708 665   -.00494564 "N"
2708 666  -.009506066 "N"
2708 679  -.006455688 "N"
2708 680  -.009019009 "N"
2724 666 -.0045031635 "N"
2724 667  -.009472995 "N"
2724 684 -.0017794066 "N"
2724 685  -.003388667 "N"
2745 667 -.0045014257 "Y"
2745 668 -.0032780494 "Y"
2745 696   -.06255567 "Y"
2745 697   -.01358385 "Y"
end

format %tmCCYY_Mon month
gen year = yofd(dofm(month))
bysort id year (month): gen tag = _n == _N

egen count_yes = total(retail_fund == "Y" & tag), by(year)

egen count_all = total(tag), by(year)

tabdisp year, c(count_yes count_all)

----------------------------------
     year |  count_yes   count_all
----------+-----------------------
     2015 |          1           3
     2016 |          0           1
     2017 |          0           1
     2018 |          1           1
----------------------------------

Comment

Dan Daugaard

Join Date: Jul 2018

Posts: 61
#12

23 Oct 2019, 04:30

Thanks for the follow up Nick. Yes, that's very neat. Also just read your "speaking stat: how to move step by: step" on a domestic flight. I appreciate your insight that I will miss the power of by: if I continue to apply a mental frame of using loops to solve problems. Thanks again, Dan
Comment

Announcement