Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Old commands into new commands

    Hi there, I am looking at the code from Duflo, Dupas and Kremers paper on Education, HIV, and Early Fertility. Below you'll find the code for their first table Panel A baseline characteristics, by treatment group. Somehow, when I try to run the code it does not work. It tells me there is a synthax mystake in the command "for", when I use "help for" it shows me the new command "foreach". I tried to follow the instructions for that command but it did not work, guess I am doing it wrong. Does someone know how to make the code run in the latest version of stata?

    Thanks a lot!

    use school_info.dta, clear

    gen var=""
    for any mean_U sd_U mean_H sd_H mean_UH sd_UH mean_control sd_control p_Uonly p_Honly p_UH p_UUH p_HUH N: gen X=.
    gen urban=0
    replace urban=1 if situation<3

    gen sexratio_teachers=Nfemale/(TOTteachers-Nfemale)

    gen Honly=(HIVtreat==1) & (Utreat==0)
    gen Uonly=(HIVtreat==0) & (Utreat==1)
    gen UH=(HIVtreat==1) & (Utreat==1)

    local vars=10
    for kcpe2003 schsize ratio02 latrine_2004 urban total_2km
    TOTteachers meanage sexratio_teachers HIVtreat

    \ num 1/`vars':
    replace var="X" if _n==Y \
    reg X Uonly \
    replace N=e(N) if _n==Y \
    test Uonly=0 \
    replace p_Uonly=r(p) if _n==Y \
    reg X Honly \
    test Honly=0 \
    replace p_Honly=r(p) if _n==Y \
    reg X UH \
    test UH=0 \
    replace p_UH=r(p) if _n==Y \
    reg X Uonly Honly UH \
    test UH=Uonly \
    replace p_UUH=r(p) if _n==Y \
    test UH=Honly \
    replace p_HUH=r(p) if _n==Y \
    sum X if Utreat==1&HIVtreat==0 \
    replace mean_U=r(mean) if _n==Y \
    replace sd_U=r(sd) if _n==Y \
    sum X if Utreat==0&HIVtreat==1 \
    replace mean_H=r(mean) if _n==Y \
    replace sd_H=r(sd) if _n==Y \
    sum X if Utreat==1&HIVtreat==1 \
    replace mean_UH=r(mean) if _n==Y \
    replace sd_UH=r(sd) if _n==Y \
    sum X if Utreat==0 \
    replace mean_control=r(mean) if _n==Y \
    replace sd_control=r(sd) if _n==Y

    for var p_Uonly p_Honly p_UH p_UUH p_HUH mean* sd*: replace X=round(X, 0.001)
    outsheet var mean_U sd_U mean_H sd_H mean_UH sd_UH mean_control sd_control
    p_Uonly p_Honly p_UH p_UUH p_HUH N if _n<=`vars' using "table1a.xls",replace

  • #2
    That code is more than 20 years old in conception.

    No guarantees of anything but this is closer to modern Stata. I've commented out the calls to round() which are based on a profound misunderstanding.

    if _n == <integer>

    is a clumsy way to say

    in <integer>


    Code:
    use school_info.dta, clear
    
    gen var=""
    
    foreach v in mean_U sd_U mean_H sd_H mean_UH sd_UH mean_control sd_control p_Uonly p_Honly p_UH p_UUH p_HUH N {
         gen `v' = .
    }
    
    gen urban = situation<3
    gen sexratio_teachers=Nfemale/(TOTteachers-Nfemale)
    gen Honly=(HIVtreat==1) & (Utreat==0)
    gen Uonly=(HIVtreat==0) & (Utreat==1)
    gen UH=(HIVtreat==1) & (Utreat==1)
    
    local i = 1
    foreach v in kcpe2003 schsize ratio02 latrine_2004 urban total_2km
    TOTteachers meanage sexratio_teachers HIVtreat {
        
        replace var ="`v'" in `i'
        reg `v' Uonly
        replace N=e(N) in `i'
        test Uonly=0
        replace p_Uonly=r(p) in `i'
        
        reg `v' Honly
        test Honly=0
        replace p_Honly=r(p) in `i'
        
        reg `v' UH
        test UH=0
        replace p_UH=r(p) in `i'
        
        reg `v' Uonly Honly UH
        test UH=Uonly
        replace p_UUH=r(p) in `i'
        test UH=Honly
        replace p_HUH=r(p) in `i'
        
        sum `v' if Utreat==1&HIVtreat==0
        replace mean_U=r(mean) in `i'
        replace sd_U=r(sd) in `i'
        
        sum `v' if Utreat==0&HIVtreat==1
        replace mean_H=r(mean) in `i'
        replace sd_H=r(sd) in `i'
        
        sum `v' if Utreat==1&HIVtreat==1
        replace mean_UH=r(mean) in `i'  
        replace sd_UH=r(sd) in `i'  
        
        sum `v' if Utreat==0
        replace mean_control=r(mean) in `i'  
        replace sd_control=r(sd) in `i'
      
        local ++i
    }
    
    outsheet var mean_U sd_U mean_H sd_H mean_UH sd_UH mean_control sd_control
    p_Uonly p_Honly p_UH p_UUH p_HUH N in 1/10 using "table1a.xls",replace
    
    * for var p_Uonly p_Honly p_UH p_UUH p_HUH mean* sd*: replace  X=round(X, 0.001)
    * !!! would this even work? var is a string variable!
    Last edited by Nick Cox; 12 Apr 2021, 07:44.

    Comment


    • #3
      Thanks Nick!

      Yes, the code is super old I will have to see if I continue working with it or if a start over might be easier. But your code helped a lot, thanks.

      Comment


      • #4
        The allusion appears to be to Duflo, Esther, Pascaline Dupas, and Michael Kremer. 2015. "Education, HIV, and Early Fertility: Experimental Evidence from Kenya." American Economic Review, 105 (9): 2757-97


        It's fairly exceptional that the for command is hidden from users, but it was considered superseded 20 years ago by forvalues and foreach -- which did the job better, on the whole. There are users who recall it well enough and even prefer it, but Joro Kolev is perhaps the only fan vocal here.

        The old for was explicit in supporting parallel loops, which remain entirely possible, although users are often puzzled how to do them and even write nested loops instead. But users often got into extraordinary messes with it: some of the older FAQs at https://www.stata.com/support/faqs/programming/ are archaeological evidence.

        Reverting to my comments within the code at #2: var is a keyword for for as well as a variable name in this code so the code should "work". The problem is that rounding to 2 decimal places is not a good way to apply a display format. Stata can hold very few numbers as exact multiples of 0.01 as it works in binary.

        Evidently Nobel prizes are not awarded according to Stata code, which is as it should be.

        Last edited by Nick Cox; 13 Apr 2021, 03:05.

        Comment


        • #5
          You were harsh but fair with the Noble Prize winners, Nick :P.

          At some point I also got disappointed a bit in the olden but golden -for-, because it turned out that 1) it is very slow compared to the new foreach and forvalues, and that 2) it has some limits as to how many elements there could be in the list.
          Code:
          . for num 1/10000: qui dis X
          invalid numlist has too many elements
          r(123);
          I am still of the opinion that it was a mistake of epic proportions to exterminate the old -for-, instead of simply fix the perceived problems it has.


          Originally posted by Nick Cox View Post
          The allusion appears to be to Duflo, Esther, Pascaline Dupas, and Michael Kremer. 2015. "Education, HIV, and Early Fertility: Experimental Evidence from Kenya." American Economic Review, 105 (9): 2757-97


          It's fairly exceptional that the for command is hidden from users, but it was considered superseded 20 years ago by forvalues and foreach -- which did the job better, on the whole. There are users who recall it well enough and even prefer it, but Joro Kolev is perhaps the only fan vocal here.

          The old for was explicit in supporting parallel loops, which remain entirely possible, although users are often puzzled how to do them and even write nested loops instead. But users often got into extraordinary messes with it: some of the older FAQs at https://www.stata.com/support/faqs/programming/ are archaeological evidence.

          Reverting to my comments within the code at #2: var is a keyword for for as well as a variable name in this code so the code should "work". The problem is that rounding to 2 decimal places is not a good way to apply a display format. Stata can hold very few numbers as exact multiples of 0.01 as it works in binary.

          Evidently Nobel prizes are not awarded according to Stata code, which is as it should be.

          Comment


          • #6
            The old for was (and is!) interpreted code, which even though the help file has long since been removed, is visible through viewsource for.ado

            There were problems on various levels. Those mentioned here can't be a complete list.

            1. Loops should be low-level fundamental features of any language that supports them, not interpreted add-ons. Speed is one reason, but the point is deeper and more general. (There have been languages that cut down on loops by using arrays wherever possible, and Stata is one too insofar as variables, vectors, matrices and so on can be operands.)

            2. Loops should play with other elements of Stata programming, local macros first of all, and anything else such as global macros that might be invoked. That wasn't really true of
            for which instead had its own ad hoc set of constructs such as X in the above.

            3. for was written in the first instance to make simple loops as easy as they should be. That machinery inevitably was then often used to make complicated loops less complicated, but for doesn't scale well there. People new to this may not realise that in #1

            Code:
            for kcpe2003 schsize ratio02 latrine_2004 urban total_2km
            TOTteachers meanage sexratio_teachers HIVtreat
            \ num 1/`vars':
            replace var="X" if _n==Y \
            reg X Uonly \
            replace N=e(N) if _n==Y \
            test Uonly=0 \
            replace p_Uonly=r(p) if _n==Y \
            reg X Honly \
            test Honly=0 \
            replace p_Honly=r(p) if _n==Y \
            reg X UH \
            test UH=0 \
            replace p_UH=r(p) if _n==Y \
            reg X Uonly Honly UH \
            test UH=Uonly \
            replace p_UUH=r(p) if _n==Y \
            test UH=Honly \
            replace p_HUH=r(p) if _n==Y \
            sum X if Utreat==1&HIVtreat==0 \
            replace mean_U=r(mean) if _n==Y \
            replace sd_U=r(sd) if _n==Y \
            sum X if Utreat==0&HIVtreat==1 \
            replace mean_H=r(mean) if _n==Y \
            replace sd_H=r(sd) if _n==Y \
            sum X if Utreat==1&HIVtreat==1 \
            replace mean_UH=r(mean) if _n==Y \
            replace sd_UH=r(sd) if _n==Y \
            sum X if Utreat==0 \
            replace mean_control=r(mean) if _n==Y \
            replace sd_control=r(sd) if _n==Y


            is ONE statement. No doubt, it worked as written but it's hard to read and hard to revise. Code written like that is the programmers' responsibility but equally Stata (the company and the official code) should provide constructs that make it easier to write code that is clear, correct and changeable according to need. As already mentioned messes involving for were recurrent on Statalist, and unlike those contemporary threads in which the underlying theme is "indeed this seems complicated but Stata has a logic here which can be explained" the underlying theme was ""indeed this seems complicated and this command really is awkward to work with".

            Recently I rewrote

            SJ-2-2 pr0005 . . . . . . Speaking Stata: How to face lists with fortitude
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
            Q2/02 SJ 2(2):202--222 (no commands)
            demonstrates the usefulness of for, foreach, forvalues, and
            local macros for interactive (non programming) tasks

            as

            SJ-20-4 pr0074 . . . . . . . . . . . . Speaking Stata: Loops, again and again
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
            Q4/20 SJ 20(4):999--1015 (no commands)
            provides updated guidance on using foreach and forvalues for
            looping through lists of values and repeating commands using
            members of those lists in turn

            Some readers may want to thank arrangements whereby the last -- as written by a British academic -- is accessible at https://journals.sagepub.com/doi/pdf...36867X20976340 and is not behind a paywall.

            I hope no one was shocked that a second edition seemed appropriate after 18 years. It was a pleasant duty cutting out large chunks of stuff on the old
            for which were long since redundant.

            A further comment I have seen -- usually outside Statalist -- is that Stata doesn't support parallel loops. That is nonsense, but there is also some fault in it not being clearer how to implement parallel loops. I addressed this back in


            SJ-3-2 pr0009 . . . . . . . . . . . . . Speaking Stata: Problems with lists
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
            Q2/03 SJ 3(2):185--202 (no commands)
            discusses ways of working through lists held in macros

            -- which again sorely and surely needs a rewrite some time, to cut out the old stuff that is now irrelevant and to add some key ideas (and a better title!).


            NB. My comment on Nobel Prize winners was in context. This kind of very small stuff is a world apart from research tackling really big questions and aimed at improving people's lives in major ways.

            The Nobel/noble confusion allows various puns, as in the line about Lord Rayleigh, whose name is attached to a distribution.

            The noble Lord discovered a noble gas [argon], published a paper on it, and received a Nobel Prize. The days when that could happen are gone [!!!] forever.

            Comment


            • #7
              Before this Noble-worthy research was done, I and couple of other billions of people were poor and miserable. After this Noble research was done, I and the other couple of billions of people are all rich and prosperous, and rivers of milk and honey are flowing from the mountains down into the valleys where we all dance and sing in eternal happiness and joy. This is all the Nobel prize is about, bringing peace and happiness and prosperity to the world. If it were not for the Nobel prize, I do not know what would have become of me...

              As for the Noble/Nobel, Windsor/Winsor, it is funny at some level. But from my personal viewpoint, neither Nobel, nor Noble, nor Windsor, nor Winsor are my personal friends, and none of those guys owes me money, and none of those guys have done anything of consequence to me (first paragraph was an attempt at sarcasm). So, and it is a bit rude I agree, I care to the point of being able to suggest to the reader correctly what I mean, but I do not care that much whether it is spelled exactly and correctly as Noble, Noble, Noubleau, El Noblino or Dos Nobledor.

              I have taken well your point, Nick, that the old -for- can result in monstrosities which are non-troubleshootable after they have been created. I have never used -for- like this, in fact 99% of my use of -for- has been
              Code:
              for varlist "some varlist" : do this and that with reference to X \ then do this and that with reference to X, and the previous reference to X \ and then do this and that.
              and for 1% have been the infamous parallel loops that you mentioned, and which I indeed think were a lot more natural under the old -for- than now.

              The major disagreement is that you never talk, and you do not seem to notice that
              Code:
              for varlist somevarlist: references to X \ other references to X and previous derived from X
              is so much simpler than

              Code:
              foreach var of varlist {
              references to `var'
              other references to `var' and previous results
              }
              because in the former we have one line statement without any special symbols, and in the latter we have so many { ` ' } and new lines. The first one is infinitely faster to write and think through.

              The second is all about how you see your point 2) as a problem, and I see it as a benefit.

              The matter of fact is that the old -for- did not use locals, and yes, it did not play with the rest of Stata programming. But to me this is not a disadvantage, this is fantastic, because I can use -for- without knowing nothing about Stata programming, and I can teach -for- to students without taking them into the depths of Stata internal workings and Stata syntax.

              In fact I have a vision of "first stage Stata" where you do not need to know about explicit subscripting and macros and loops in Stata, and yet you can write Nobel prize papers just using -egen- and occasionally -for-.



              Originally posted by Nick Cox View Post
              The old for was (and is!) interpreted code, which even though the help file has long since been removed, is visible through viewsource for.ado

              There were problems on various levels. Those mentioned here can't be a complete list.

              1. Loops should be low-level fundamental features of any language that supports them, not interpreted add-ons. Speed is one reason, but the point is deeper and more general. (There have been languages that cut down on loops by using arrays wherever possible, and Stata is one too insofar as variables, vectors, matrices and so on can be operands.)

              2. Loops should play with other elements of Stata programming, local macros first of all, and anything else such as global macros that might be invoked. That wasn't really true of
              for which instead had its own ad hoc set of constructs such as X in the above.

              3. for was written in the first instance to make simple loops as easy as they should be. That machinery inevitably was then often used to make complicated loops less complicated, but for doesn't scale well there. People new to this may not realise that in #1

              Code:
              for kcpe2003 schsize ratio02 latrine_2004 urban total_2km
              TOTteachers meanage sexratio_teachers HIVtreat
              \ num 1/`vars':
              replace var="X" if _n==Y \
              reg X Uonly \
              replace N=e(N) if _n==Y \
              test Uonly=0 \
              replace p_Uonly=r(p) if _n==Y \
              reg X Honly \
              test Honly=0 \
              replace p_Honly=r(p) if _n==Y \
              reg X UH \
              test UH=0 \
              replace p_UH=r(p) if _n==Y \
              reg X Uonly Honly UH \
              test UH=Uonly \
              replace p_UUH=r(p) if _n==Y \
              test UH=Honly \
              replace p_HUH=r(p) if _n==Y \
              sum X if Utreat==1&HIVtreat==0 \
              replace mean_U=r(mean) if _n==Y \
              replace sd_U=r(sd) if _n==Y \
              sum X if Utreat==0&HIVtreat==1 \
              replace mean_H=r(mean) if _n==Y \
              replace sd_H=r(sd) if _n==Y \
              sum X if Utreat==1&HIVtreat==1 \
              replace mean_UH=r(mean) if _n==Y \
              replace sd_UH=r(sd) if _n==Y \
              sum X if Utreat==0 \
              replace mean_control=r(mean) if _n==Y \
              replace sd_control=r(sd) if _n==Y


              is ONE statement. No doubt, it worked as written but it's hard to read and hard to revise. Code written like that is the programmers' responsibility but equally Stata (the company and the official code) should provide constructs that make it easier to write code that is clear, correct and changeable according to need. As already mentioned messes involving for were recurrent on Statalist, and unlike those contemporary threads in which the underlying theme is "indeed this seems complicated but Stata has a logic here which can be explained" the underlying theme was ""indeed this seems complicated and this command really is awkward to work with".

              Recently I rewrote

              SJ-2-2 pr0005 . . . . . . Speaking Stata: How to face lists with fortitude
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
              Q2/02 SJ 2(2):202--222 (no commands)
              demonstrates the usefulness of for, foreach, forvalues, and
              local macros for interactive (non programming) tasks

              as

              SJ-20-4 pr0074 . . . . . . . . . . . . Speaking Stata: Loops, again and again
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
              Q4/20 SJ 20(4):999--1015 (no commands)
              provides updated guidance on using foreach and forvalues for
              looping through lists of values and repeating commands using
              members of those lists in turn

              Some readers may want to thank arrangements whereby the last -- as written by a British academic -- is accessible at https://journals.sagepub.com/doi/pdf...36867X20976340 and is not behind a paywall.

              I hope no one was shocked that a second edition seemed appropriate after 18 years. It was a pleasant duty cutting out large chunks of stuff on the old
              for which were long since redundant.

              A further comment I have seen -- usually outside Statalist -- is that Stata doesn't support parallel loops. That is nonsense, but there is also some fault in it not being clearer how to implement parallel loops. I addressed this back in


              SJ-3-2 pr0009 . . . . . . . . . . . . . Speaking Stata: Problems with lists
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
              Q2/03 SJ 3(2):185--202 (no commands)
              discusses ways of working through lists held in macros

              -- which again sorely and surely needs a rewrite some time, to cut out the old stuff that is now irrelevant and to add some key ideas (and a better title!).


              NB. My comment on Nobel Prize winners was in context. This kind of very small stuff is a world apart from research tackling really big questions and aimed at improving people's lives in major ways.

              The Nobel/noble confusion allows various puns, as in the line about Lord Rayleigh, whose name is attached to a distribution.

              The noble Lord discovered a noble gas [argon], published a paper on it, and received a Nobel Prize. The days when that could happen are gone [!!!] forever.

              Comment


              • #8
                I guess I digressed first, so blame that on me.

                I don't have anything deep(er) to say beyond underlining that whatever coding becomes familiar thereby becomes intuitive.

                (Other way round, whenever I see the word "Intuitive" I translate it as "familiar" and that usually fits. For example, "the user interface is intuitive" can be sales talk but what can be true is that "if you use the menu often enough, you half-remember where things are likely to be and that makes the other half easier to puzzle out".)

                I used the old for for a couple of years and often found it awkward, and I've used the newer foreach and forvalues commands for twenty years or so, so comparisons for me aren't easy, but moving from one to another wasn't painful and was even joyful. At last Stata let you loop in a way that was familiar (again).

                There seems to be a pattern that computer languages a bit more concise than, but not much more concise than, ordinary language do best, so where is COBOL (very long-winded) and where are APL and Forth (very concise)? The issue is not just how easy it is write down code but how easy it is to read code, even if it is only your own code some time later.

                There aren't many absolutes here. It's a real joy if everything can be expressed in one line, except not so if that one line takes hours or days to work out (as I imagine just about everybody can confirm).

                Another real issue for Stata is that there's a major bonus if loops have similar syntax to that common in other languages, and I think the present looping commands score better than for in that respect.

                I hope Joro Kolev writes his own documentation for for for his students. (Note the fontwork there.)
                Last edited by Nick Cox; 14 Apr 2021, 07:12.

                Comment

                Working...
                X