Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate a variable and recoding repeatedly

    I have a small question to ask you:
    • Is there a better option (in terms of time consumed and programming) than the one presented below? (which is also very prone to error )
    Code:
    gen             hesge_all = 1 if (inlist(ts,309,334,335,339,354,371) & ///
                    diploma_all == 3)
    *
    *
    recode            hesge_all (1=2) if (inlist(ts,334))
    recode            hesge_all (1=3) if (inlist(ts,335))
    recode            hesge_all (1=4) if (inlist(ts,339))
    recode            hesge_all (1=5) if (inlist(ts,354))
    recode            hesge_all (1=6) if (inlist(ts,371))
    Thanks a lot.

    Best,

    Michael

  • #2
    I have tried some kind of a loop, but it is not correct:

    Code:
    .                                 forvalues i = 2/6       {
      2.                                         forvalues k     = 334 335 339 354 371   {
      3.                                                 recode hesge_all  (1=`i') if (inlist(ts,`k'))
      4.         }
      5. }
    invalid syntax
    r(198);
    
    end of do-file
    Last edited by Michael Duarte Goncalves; 17 Nov 2022, 02:20.

    Comment


    • #3
      I would see the main issue here as one of style. I could write you a loop

      Code:
      tokenize "GARBAGE 334 335 339 354 371" 
      
      forval j = 2/6 { 
            replace hesge_all = `j'  if hesge_all == 1 & ts == ``j'' 
      }
      So, I have rewritten 5 lines as 4, and no loop will be shorter than 3 lines. Even for programmers that will probably seem awkward and cryptic trickery as a way of looping over 2 3 4 5 6 (easy) and 334 335 339 354 371 (awkward) at the same time.

      I would never write such an idiosyncratic loop in my own code. Code needs to be readable as well as concise and "efficient" (an over-used and often ill-defined term in these contexts). The person most likely to read your code is you at a later date!

      Your code is more readable than that loop or any loop I can imagine, but naturally there may be other suggestions. I would tend to write statements more like this

      Code:
       
       replace hesge_all = 4  if hesge_all == 1 & ts == 339
      but anyone who prefers recode is not going to be wrong in any sense.

      Loosely, for most of the problems I see in Stataland, people's programming time is a far bigger deal than machine time. People for whom it is the other way round -- Stata is slow on their difficult problem (for their big dataset), and speed-ups are sorely and surely needed -- don't need telling that their experience is different.

      Comment


      • #4
        Nick Cox:

        Thank you very much for taking the time to reply.

        It is still difficult for me to know when I should write a loop, or not. I thought loops were more of a rule than an exception.
        You have enlightened me a lot with your comments.

        In fact, I have a kind of admiration for all those people (you and of course Hemanshu Kumar for example ) who master this kind of tools.

        I would love to be able to make such beautiful loops and other programming magic one day. The road is still very long!


        Comment


        • #5
          When to use a loop and when not is often a tricky choice and on that experienced programmers disagree with each other too. I did once see on Statalist someone who had 500 lines or so that were all something like


          Code:
          replace x = 1 if y == 501 
          replace x = 2 if y == 502 
          replace x = 3 if y == 503
          and it is easy with experience but not for a beginner to see that a loop is a really good idea there -- except that no loop is needed at all either!

          I have sympathy (although it may not always seem so) for users whose previous computing experience is mostly a spreadsheet, a browser, a mailer and social media. It is easy to get to say graduate student without ever having learned about loops at all.

          Comment


          • #6
            Thank you for the very inspiring message!

            Michael
            Last edited by Michael Duarte Goncalves; 17 Nov 2022, 06:24.

            Comment


            • #7
              P.S : Nick Cox - I am curious: what is the functionality of tokenize mentioned above (#3)?

              Comment


              • #8
                The code
                Code:
                tokenize "GARBAGE 334 335 339 354 371"
                will take the string, separate it into words (i.e. portions separated by spaces), and store them into local macros `1', `2', etc.. so `1' will be GARBAGE, `2' will be 334, etc.

                Inside the loop in #3, this is then used in the evaluation ts == ``j'' so that in the loop, when for instance `j' is 2, ``j'' becomes `2' which is 334.

                So basically, the first iteration of the loop becomes
                Code:
                replace hesge_all == 2 if hesge_all == 1 & ts == 334
                Last edited by Hemanshu Kumar; 17 Nov 2022, 06:39.

                Comment


                • #9
                  Incidentally, the way I would have probably coded it is just this one line:

                  Code:
                  recode ts (309 = 1) (334 = 2) (335 = 3) (339 = 4) (354 = 5) (371 = 6), gen(hesge_all)
                  At least I find it very immediately understandable.

                  Comment


                  • #10
                    #9 The recode needed is of another variable, dependent on values of ts.

                    Comment


                    • #11
                      Thank you so much Hemanshu Kumar. I now understand tokenize.

                      Thank you also for the recode option.
                      But Nick is correct in #10.





                      Michael

                      Comment


                      • #12
                        Nick Cox OP shows code in #1 which generates hesge_all, first by setting it equal to 1, then recoding it to other values based on values of ts. I think it accomplishes the same thing as #9.

                        Comment


                        • #13
                          So with this toy dataset and the code in #9:

                          Code:
                          clear
                          input int ts
                          309
                          371
                          339
                          334
                          335
                          354
                          end
                          
                          recode ts (309 = 1) (334 = 2) (335 = 3) (339 = 4) (354 = 5) (371 = 6), gen(hesge_all)
                          the result is:

                          Code:
                          . li, noobs sep(0) ab(10)
                          
                            +-----------------+
                            |  ts   hesge_all |
                            |-----------------|
                            | 309           1 |
                            | 371           6 |
                            | 339           4 |
                            | 334           2 |
                            | 335           3 |
                            | 354           5 |
                            +-----------------+
                          I think this is what OP wants?

                          Comment


                          • #14
                            The code I wanted to generate also depends also on diploma_all on #1.

                            Comment


                            • #15
                              Sorry, yes: the gen() call was off out of sight because I didn’t scroll far enough.

                              Comment

                              Working...
                              X