Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Accessing the group currently active from "by"

    Since looping is slow, I'm trying to make some calculations with "by". trouble is that I need to know/access which group the command is currently working on. I could not find any such information in the help files however... an example of what I would have liked have is:

    Code:
    clear all
    sysuse auto
    bysort rep78: summ price if headroom <= `rep78'
    where `rep78' would be the current group that bysort works through. in this case it would be 1 2 3 4 5.
    Any way to achieve this or any work-around that would work similarly which does not necessitate a loop of the sort:
    Code:
    clear all
    sysuse auto
    levelsof(rep78), clean local(levels)
    foreach level of local levels {
           summ price if headroom <= `level' & rep78==`level'
    }

  • #2
    No tsure I am following exactly, but i'd say the answer is simply:
    Code:
    bysort rep78: summ price if headroom <= rep78

    Comment


    • #3
      I don't know a way of avoiding a loop here.

      Comment


      • #4
        Jorrit: you are absolutely correct. it's a bad example on my part then
        Say I have a variable that has the values of rep78 as part of it's name and I wish to perform my calculations based on conditons on that variable.
        for example I might have variables headroom1 headroom2... headroom5. I would like to have something like this
        Code:
        clear all
        sysuse auto
        bysort rep78: summ price if headroom`rep78' <= 10

        Comment


        • #5
          Jorrit:

          Not the same.

          Code:
          . sysuse auto, clear
          (1978 Automobile Data)
          
          . bysort rep78: summ price if headroom <= rep78
          
          -------------------------------------------------------------------------------------------
          -> rep78 = 1
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          0
          
          -------------------------------------------------------------------------------------------
          -> rep78 = 2
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          3    4314.333    728.9968       3667       5104
          
          -------------------------------------------------------------------------------------------
          -> rep78 = 3
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         14    6076.143    3771.154       3299      15906
          
          -------------------------------------------------------------------------------------------
          -> rep78 = 4
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         18      6071.5    1709.608       3829       9735
          
          -------------------------------------------------------------------------------------------
          -> rep78 = 5
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         11        5913    2615.763       3748      11995
          
          -------------------------------------------------------------------------------------------
          -> rep78 = .
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          5      6430.4    3804.322       3799      12990
          
          
          . forval j = 1/5 {
            2. su price if headroom <= `j' & rep78 <= `j'
            3. }
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          0
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |          5      4414.4    593.9346       3667       5104
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         19    5638.842    3303.747       3299      15906
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         53    6318.906    3082.758       3291      15906
          
              Variable |        Obs        Mean    Std. Dev.       Min        Max
          -------------+---------------------------------------------------------
                 price |         69    6146.043     2912.44       3291      15906

          Comment


          • #6
            Nick - notice that in the loop version, the condition is
            Code:
            rep78==`level'

            Comment


            • #7
              Ariel, Jorrit: Yes indeed. Sorry about that.

              But what you want in #4 just won't work that way. The local macro will be evaluated once, before the command is executed. There is no loop machinery associated with by:.

              Comment


              • #8
                That's a shame. The calculations I wish to do would take ~35 hours to complete using nested loops. The solution is built upon Friedrich Huebler's comment here - could there be any other alternative?

                Comment


                • #9
                  We might be able to give much better advice if you told us what they are!

                  But I don't regard loops as such as especially slow. You are being bitten by what you are doing within the loops. Perhaps there is a way to write that as a program you can call with by; but my prior is that the focus should be on speeding up the other stuff.

                  Comment


                  • #10
                    What I'm trying to do is what's written in Friedrich Huebler comment in the thread I linked to in the previous comment. Here's the whole thread:
                    http://www.statalist.org/forums/foru...s-and-collumms

                    Huebler's solution worked fine and was "fast enough" when the data was small, with a small number of schools and variables that I wish to sum over. now it's a whole different story...

                    Comment


                    • #11
                      If you have a ton of schools it sounds like you might be better of splitting the dataset into two or more parts, making it essentially relational database. On dataset where you have school codes, year and kids. Another one where you have distances between each school pair. You can then merge 1:m for schools within smaller distances, giving a limited number (2 or 3 seen from your example) of duplicate observations y school and year, rather than a list of variables for each school, which there seem to be many.

                      edit: I'm not 100% sure, but i do believe that you would also save time by keeping the data in the suggested distances dataset in long rather than wide format.
                      Last edited by Jorrit Gosens; 20 Feb 2017, 06:31.

                      Comment


                      • #12
                        The data is in long format except for the distance variables. I split the data by year and it seems this alone speeds up things considerably. not sure why though. I would think that accessing every database by year and calculating for each school the sum of nearby kids would be pretty much identical to using the full data and iterating by year as in Huebler's solution...

                        Comment


                        • #13
                          if slows things down because every observation will be tested.

                          Comment


                          • #14
                            I figured as much. so generally, the more conditions in the qualifier, the "harder" the computer needs to work. I get that.
                            but from ~36 hours (the original code) to less than 1.5 hours on the split data? that's quite a large difference...

                            Comment


                            • #15
                              How many distinct years in your data?

                              Comment

                              Working...
                              X