Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • regression using loop

    Dear forum members
    I am trying to run regression in loop. This post is for confirming my code.
    Code:
    // Define dependent variables
    local depvars sout prfood prcheck prchnedu
    
    // Define independent variable groups
    local group1 i.v025 i.v190
    local group2 i.v025 i.v190 i.caste i.religion
    local group3 i.v025 i.v190 i.caste i.religion i.v151 i.odfunimp
    
    // Loop over dependent variables
    foreach y of local depvars {
        
        // Loop over independent variable groups
        forvalues g = 1/3 {
            
            // Assign group of independent variables
            if `g' == 1 {
                local indepvars `group1'
            }
            else if `g' == 2 {
                local indepvars `group2'
            }
            else {
                local indepvars `group3'
            }
    
            // Run regression
            reg `y' `indepvars'
            
            // display separator
            di "===============================NEW Specification===============================" //
            di "==============================================================================="        
        }
    }
    I my understanding, and what results window shows, it run regression for each depvars with three specification. It will first run the three specification for sout and so on for others.

    Is it right way for looping or more efficient and correct ways are there.

    Thank you
    Best regards,
    Mukesh

  • #2
    I would just make the loop as follows:
    Code:
    // Loop over dependent variables
    foreach y of local depvars {
        
        // Loop over independent variable groups
        forvalues g = 1/3 {
            
            // Run regression
            reg `y' `group`g''
            
            // display separator
            di "===============================NEW Specification===============================" //
            di "==============================================================================="        
        }
    }

    Comment


    • #3
      My comments are more about style than efficiency, a term often used far too vaguely.

      Efficiency in terms of computation speed or memory use? Either of those can be crucial for large datasets or small machines, which in turn need to be defined more precisely. Evaluation of local macros I understand to have trivial implications for either.

      For most of what I do, and for much of what I suspect most people do, efficiency might fairly be stretched yet further to include whatever helps writing code that makes it easier

      * to get correct and desired results (!)

      * for the coder to understand what they are doing (that may seem self-evident as a goal, but many threads here suggest that it is often in doubt) [!!! it is often a real issue when code is produced by AI or is copied wholesale from someone else's work on a similar but not identical project]

      * for the coder to revisit later and (re-)discover or (re-)invent what they wanted to do (perhaps weeks, months, years later; it is common that you may need to re-visit a project after some delay when a paper submission or report draft has been reviewed, or you're running a variant on previous work)

      * for the coder to debug and to maintain (e.g. correct, extend or vary what the code does)

      * for anyone else to do any of that, which could mean someone less or as or more experienced in coding; reviewers of code; supervisors who need to know what is being done (and who often are less competent in coding); students learning the skills

      and so on.

      As is often underlined, code writing is in many ways like any other kind of writing, especially technical or non-fiction writing generally. Goals can include correctness, clarity and brevity. Perhaps the main exception is that humour plays a less important part in code writing. Otherwise, my main style suggestions here pivot on two slightly contentious principles:

      * brevity is itself a form of clarity (naturally, code that is so condensed that it has become cryptic by too much cleverness is self-defeating)

      * comments can just bloat code and add to the amount of stuff that must be waded through

      Books on how to write code well often make fun of this kind of writing

      Code:
      * this code adds 2 and 2
      local four = 2 + 2
      but many of us have seen comments equally inane or pointless in real code. Who are the comments for and exactly why are they provided? are always key questions, however.

      On your code. I have various more specific comments.

      I will just phrase what follows in terms of what I would be likely to do. It's for any reader to decide how far they agrre.

      First, I note that the if else else construction can be avoided. That change is double-edged, as it depends on the writer and reader understanding that local macro references can be nested. But your code already requires understanding of local macro evaluations, so let the code teach too.

      Code:
      // Define dependent variables
      local depvars sout prfood prcheck prchnedu
      
      // Define independent variable groups
      local group1 i.v025 i.v190
      local group2 i.v025 i.v190 i.caste i.religion
      local group3 i.v025 i.v190 i.caste i.religion i.v151 i.odfunimp
      
      // Loop over dependent variables
      foreach y of local depvars {
          
          // Loop over independent variable groups
          forvalues g = 1/3 {
              
              // Run regression
              reg `y' `group`g''
          }
      }
      Next, a loop over one, two or three cases can often be replaced by those cases. For four or more cases, the advantage in a loop is more evident. So why not cut the inner loop?

      Code:
      // Define dependent variables
      local depvars sout prfood prcheck prchnedu
      
      // Define independent variable groups
      local group1 i.v025 i.v190
      local group2 i.v025 i.v190 i.caste i.religion
      local group3 i.v025 i.v190 i.caste i.religion i.v151 i.odfunimp
      
      // Loop over dependent variables
      foreach y of local depvars {
              reg `y' `group1'
              reg `y' `group2'
              reg `y' `group3'
      }

      Next, I would cut all those comments. You've used macro names that make it clear what you're doing and anyone who can't understand what the code is doing directly is best advised not to mess with it and/or to do some reading to find out!

      Code:
      local depvars sout prfood prcheck prchnedu
      
      local group1 i.v025 i.v190
      local group2 i.v025 i.v190 i.caste i.religion
      local group3 i.v025 i.v190 i.caste i.religion i.v151 i.odfunimp
      
      foreach y of local depvars {
              reg `y' `group1'
              reg `y' `group2'
              reg `y' `group3'
      }
      (I don't like the terms dependent and independent variables, poor word choices for several decades now, but that is not a coding issue.)

      Next to consider is cutting out the local macro definitions for groups of predictor variables. I leave that as an exercise to consider.

      I cut out the ugly separator code. I often want separator code but I would be more likely to use what is different in the text, not just to say in effect "Here are more results". If so, then there is more point to an inner loop.

      Code:
      local depvars sout prfood prcheck prchnedu
      
      local group1 i.v025 i.v190
      local group2 i.v025 i.v190 i.caste i.religion
      local group3 i.v025 i.v190 i.caste i.religion i.v151 i.odfunimp
      
      foreach y of local depvars {
          
          forvalues g = 1/3 {
                di "{title:regress `y' on `group`g''}"
                di    
                reg `y' `group`g''
                di
          }
      
      }

      Comment


      • #4
        #3 was quite a long time in writing and #2 from Hemanshu Kumar was not visible until I posted. Still, it's evident that Hemanshu's suggestion and my first suggestion are exactly the same.

        Comment


        • #5
          Orthogonal tip. If you want repeated characters, consider something like this

          .
          Code:
           di 80*"="
          ================================================================================
          while at the same wonder whether you really need that or your readers do. (Positive advice in #3.)

          Comment


          • #6
            I would second Nick's suggestion to entirely avoid defining macros for the predictor variables, in this specific case. The gain in brevity in the regression commands is, to me, more than offset by the loss in terms of cognitive load, from having to look up the macro definitions.

            On comments in codes, I find that this changes as one's coding competence improves. Comments to explain to yourself (or others) what a chunk of code does can, over time, feel redundant as your skills improve or you repeatedly use similar code to accomplish similar tasks across projects. The comments my code increasingly try to include are not so much about what a code chunk does, as they are about why I made certain choices -- especially the seemingly small ones. The seemingly big choices are often documented in your paper, report, etc ("we use a probit model because...") but when you return to your code after a while, it is often completely unclear why you included, say one control variable but not another one, or why you grouped castes in this way but not that, and so on.

            Comment


            • #7
              Dear Hemanshu Kumar and Nick Cox thank you your responses with explanation, suggestions, & making it simple.

              I tried to embed the margins (v190#v025) and marginsplot in loop only after regressing on group3 for each outcome and export graph with respective names.


              Code:
              local depvars s562 prfood prcheck prchnedu
              
              local group1 i.v025 i.v190
              local group2 i.v025 i.v190 i.caste i.religion
              local group3 i.v025 i.v190 i.caste i.religion i.v151 i.odfunimp
              
              foreach y of local depvars {
                  
                  forvalues g = 1/3 {
                        di "{title:regress `y' on `group`g''}"
                        di    
                        logit `y' `group`g''
                        di
                        if `g' == 3 {
                            margins i.v190#i.v025
                            marginsplot, name(mplot_`y', replace) title("Predictive margin `y'")
                            graph export "$figure/mplot_`y'.emf", replace
                        }
                  }
              
              }
              Thank you
              Last edited by Mukesh Punia; 27 Jun 2025, 04:53.
              Best regards,
              Mukesh

              Comment


              • #8
                Well, if your model fit command is really logit then the title should not mention regress.

                And why you want margins results only for the fullest model of three is your choice and not a matter of coding style.

                Otherwise, does this work? Necessarily we can't test this. Nor can we see the definition of the global macro.

                Comment


                • #9
                  On #6 I think Hemanshu Kumar encapsulates common attitudes as people's experience with coding increases.

                  Comment


                  • #10
                    Agreed with Hemanshu on #6

                    Yes, I changed it (title) to logit in #8.

                    For why margins after fullest model, I am still exploring and also posted separately (https://www.statalist.org/forums/for...ns#post1779239. Received valuable suggestions from members but TBH still not very clear about it. For example, using weights or adjusting other predictors in margins besides preceding logit specifications.

                    Yes, it worked. "Nor can we see the definition of the global macro." I do not understood what you mean here.

                    ​​​​​​​Thank you
                    Best regards,
                    Mukesh

                    Comment


                    • #11
                      In #7 your code refers to a global macro figure --which is not defined in any of the code we can see. Hence we can't be certain that the code will work as you wish, but there isn't an obvious problem either.

                      Comment


                      • #12
                        1. I wanted to like #6, but for some reason the system will not let me.

                        2. To the discussion about the use of comments I will add one more thing. When a block of code includes a complicated calculation, I find it helpful to include a comment that summarizes succinctly what the calculation is all about. I don't have a handy example in Stata right now, but here's a block of C++ code that is, I think, cleanly coded. But I would be interested to know if anybody here can, without benefit of an explanatory comment recognize what it does:

                        Code:
                            for(i = 0; i < N_STAGES; ++i) {
                            cc[i][0][0] = 1.0;
                                for(k = 1; k < N_STAGES-i; ++k) {
                                    sum = 0;
                                    for(j = 0; j < k; ++j) {
                                        assert (i < N_STAGES && j < N_STAGES && k < N_STAGES);
                                        cc[i][k][j] = cc[i][k-1][j]*lambdas[i+k-1]/(lambdas[i+k]-lambdas[i+j]);
                                        sum += cc[i][k][j];
                                    }
                                    cc[i][k][k] = -sum;
                                }
                            }
                        
                            for (i = 0; i <= clinical_stage; ++i) {
                                sum = 0;
                                k = clinical_stage-i;
                                for (j = 0; j <= k; ++j) {
                                    sum += cc[i][k][j]*exp(-lambdas[i+j]*lead_time);
                                }
                                pjump[i] = sum;
                            }

                        Comment

                        Working...
                        X