Accessing the group currently active from "by"

Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#1

Accessing the group currently active from "by"

20 Feb 2017, 04:22

Since looping is slow, I'm trying to make some calculations with "by". trouble is that I need to know/access which group the command is currently working on. I could not find any such information in the help files however... an example of what I would have liked have is:

Code:

clear all sysuse auto bysort rep78: summ price if headroom <= `rep78'

where `rep78' would be the current group that bysort works through. in this case it would be 1 2 3 4 5.
Any way to achieve this or any work-around that would work similarly which does not necessitate a loop of the sort:

Code:

clear all sysuse auto levelsof(rep78), clean local(levels) foreach level of local levels { summ price if headroom <= `level' & rep78==`level' }
Tags: None
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

20 Feb 2017, 04:33

No tsure I am following exactly, but i'd say the answer is simply:

Code:

bysort rep78: summ price if headroom <= rep78
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

20 Feb 2017, 04:35

I don't know a way of avoiding a loop here.
Comment
Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#4

20 Feb 2017, 04:37

Jorrit: you are absolutely correct. it's a bad example on my part then
Say I have a variable that has the values of rep78 as part of it's name and I wish to perform my calculations based on conditons on that variable.
for example I might have variables headroom1 headroom2... headroom5. I would like to have something like this

Code:

clear all sysuse auto bysort rep78: summ price if headroom`rep78' <= 10
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

20 Feb 2017, 04:42

Jorrit:

Not the same.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. bysort rep78: summ price if headroom <= rep78

-------------------------------------------------------------------------------------------
-> rep78 = 1

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |          0

-------------------------------------------------------------------------------------------
-> rep78 = 2

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |          3    4314.333    728.9968       3667       5104

-------------------------------------------------------------------------------------------
-> rep78 = 3

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         14    6076.143    3771.154       3299      15906

-------------------------------------------------------------------------------------------
-> rep78 = 4

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         18      6071.5    1709.608       3829       9735

-------------------------------------------------------------------------------------------
-> rep78 = 5

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         11        5913    2615.763       3748      11995

-------------------------------------------------------------------------------------------
-> rep78 = .

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |          5      6430.4    3804.322       3799      12990


. forval j = 1/5 {
  2. su price if headroom <= `j' & rep78 <= `j'
  3. }

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |          0

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |          5      4414.4    593.9346       3667       5104

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         19    5638.842    3303.747       3299      15906

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         53    6318.906    3082.758       3291      15906

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         69    6146.043     2912.44       3291      15906

Comment

Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#6

20 Feb 2017, 04:45

Nick - notice that in the loop version, the condition is

Code:

rep78==`level'
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

20 Feb 2017, 04:54

Ariel, Jorrit: Yes indeed. Sorry about that.

But what you want in #4 just won't work that way. The local macro will be evaluated once, before the command is executed. There is no loop machinery associated with by:.
Comment
Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#8

20 Feb 2017, 04:58

That's a shame. The calculations I wish to do would take ~35 hours to complete using nested loops. The solution is built upon Friedrich Huebler's comment here - could there be any other alternative?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#9

20 Feb 2017, 06:03

We might be able to give much better advice if you told us what they are!

But I don't regard loops as such as especially slow. You are being bitten by what you are doing within the loops. Perhaps there is a way to write that as a program you can call with by; but my prior is that the focus should be on speeding up the other stuff.
Comment
Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#10

20 Feb 2017, 06:11

What I'm trying to do is what's written in Friedrich Huebler comment in the thread I linked to in the previous comment. Here's the whole thread:
http://www.statalist.org/forums/foru...s-and-collumms

Huebler's solution worked fine and was "fast enough" when the data was small, with a small number of schools and variables that I wish to sum over. now it's a whole different story...
Comment
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#11

20 Feb 2017, 06:27

If you have a ton of schools it sounds like you might be better of splitting the dataset into two or more parts, making it essentially relational database. On dataset where you have school codes, year and kids. Another one where you have distances between each school pair. You can then merge 1:m for schools within smaller distances, giving a limited number (2 or 3 seen from your example) of duplicate observations y school and year, rather than a list of variables for each school, which there seem to be many.

edit: I'm not 100% sure, but i do believe that you would also save time by keeping the data in the suggested distances dataset in long rather than wide format.

Last edited by Jorrit Gosens; 20 Feb 2017, 06:31.
Comment
Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#12

20 Feb 2017, 06:43

The data is in long format except for the distance variables. I split the data by year and it seems this alone speeds up things considerably. not sure why though. I would think that accessing every database by year and calculating for each school the sum of nearby kids would be pretty much identical to using the full data and iterating by year as in Huebler's solution...
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#13

20 Feb 2017, 07:21

if slows things down because every observation will be tested.
Comment
Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#14

20 Feb 2017, 10:28

I figured as much. so generally, the more conditions in the qualifier, the "harder" the computer needs to work. I get that.
but from ~36 hours (the original code) to less than 1.5 hours on the split data? that's quite a large difference...
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#15

20 Feb 2017, 10:43

How many distinct years in your data?
Comment

Announcement

Accessing the group currently active from "by"

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment