Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • For loop

    Hi guys,

    I would like to make a loop that includes several different variables for different databases. Let's say I have DB1, DB2, DB3 corresponding to 3 different aggregations (product, firm, market). For each level some of the variables I will create are similar and just differ for the names (e.g. gr_rate_product, gr_rate_firm, gr_rate_atc2) so that I can just change the name in the loop like this for instance: gr_rate_`i' . However some of the variables are specific for each level and are not included in other levels (e.g. for market database there is a variable called share_off_generics which does not appear neither in DB1 nor in DB2 and that cannot be created with the data contained in such DBs). My aim is to perform 3 regression with all the variables of the 3 DBs and include them in a single loop. So something like:

    reg gr_rate`i' var1_`i' var2 if DB = DB1
    reg gr_rate`i' var1_`i' var2 var3 if DB = DB2
    reg gr_rate`i' var1_`i' var4 if DB = DB3

    where var1_`i' denotes the generic variable in common and var2, var3, var4 are DB specific variables

    Is there a way to do it?

    Thanks a lot

  • #2
    if DB = DB1 is illegal.

    If you are testing for equality you always need == not =.

    Otherwise I cannot follow this. I have no idea whether DB is a variable name or a local macro or indeed anything else.

    It's not even clear that you need an if qualifier rather than an if command.

    In general, if what you want follows a pattern, you may be able to write a loop. The flavour of this is that what you want doesn't follow a pattern, so the prospects for a loop look bad.

    Abstraction is often the key to a solution in mathematics. In computing, keep all details concrete. "Let's say" is a bad sign. Tell us exactly what you have; don't hypothesise.

    Otherwise put: a better question would show us some code and ask whether it can easily be rewritten as a loop.
    Last edited by Nick Cox; 06 Nov 2019, 03:00.

    Comment


    • #3
      Ok. So the one above was a pseudo code.
      It's very long but I'll try. I have 3 databases called: DB_product_aggregation, DB_firm_aggregation and DB_atc2_aggregation containing different aggregation levels.
      So here is my code:

      Code:
      local recall "dummy_2" // will be improved with other dummies
      
      clear all
      clear
      macro drop _all
      set more off
      
      cd "/Users/federiconutarelli/Desktop/PhD/Lavoro_Riccaboni/DATi_USA/Pharma_project_2019 /abnormal_values"
      
      foreach rec of local recall{
      local aggr "product firm atc2"
      tokenize `aggr'
      foreach agg of local aggr{
      use "DB_`agg'_aggregation.dta", clear
      
      * STEP 1,
      egen sum_recalls = sum(`rec'), by(id`agg')
      drop if sum_recalls == 0 
      
      * STEP 2 
      sort idpr Year
      bysort id`agg' (Year): gen ritiri = cond(`rec', sum(`rec'), 0) 
      su ritiri, meanonly 
      
      quietly forval j = 1/`r(max)' { 
          egen when`j' = total(Year * (ritiri == `j')), by(id`agg') 
          replace when`j' = . if when`j' == 0  
          gen rescaled`j' = Year - when`j'
      }
      }
      }
      
      * Here I stopped the loop since the problems arise (please, see STEP 3)
      
      * STEP 3:
      eststo clear
      xtset id`agg' Year
      
      * Variables that are common to all 3 aggregations (databases):
      
      gen sales_lag = L.tot_sales 
      gen tasso_crescita_sales_`agg' = ln(tot_sales - sales_lag) 
      gen log_sales = ln(tot_sales)
      gen log_sales_lagged = L.log_sales
      gen recalls_sales = recalls_lagged*log_sales_lagged //fino a qui ok per tutte 3
      gen log_sales_lag = L.log_sales  
      gen dummy_2_lag1 = L.`rec'
      gen dummy_2_lag2 = L2.`rec'
      gen dummy_2_lag3 = L3.`rec'
      xi i.Year, pre(dum) 
      
      *variables that are common to Product and firm aggregations:
      
      replace age`agg' = age`agg' + 1 
      gen age`agg'_cat = .
      replace age`agg'_cat = 1 if age`agg' <= 5
      replace age`agg'_cat = 2 if age`agg' >= 6 & age`agg' <= 10
      replace age`agg'_cat = 3 if age`agg' >= 11 & age`agg' <= 30
      replace age`agg'_cat = 4 if age`agg' >= 31 & age`agg' < . 
      tab age`agg'_cat*, gen(age`agg'cat)
      egen mean_gr_rate_atc2 = mean(tasso_crescita_sales_`agg'), by(idatc2 Year) 
      *newmolfirm
      
      * Variables that are common to firm and atc2 aggregations
      gen lag_inflow_ratio = inflow/nprod_lag //inflow is already in both databases
      gen lag_outflow_ratio = outflow/nprod_lag
      *percentage_rx (already in the DBs)
      
      * FIRM SPECIFIC variables:
      gen lag_atcmain = L.atcmain //atcmain is a variable of firm aggregation database
      
      * ATC2 SPECIFIC: newmolatc
      gen mean_gr_rate_market = 
      gen share_generic_products = 
      gen mean_agefirm_byatc = 
      gen mean_agefirm_squared
      gen hhi = 
      gen newmolatc =
      Ideally, I would like to perform the following in a single nested loop on aggregation type (please see local aggr):

      Code:
      *For product aggregation 
      reg tasso_crescita_sales_prod log_sales_lag mean_gr_rate_atc2 dummy_2_lag1 dummy_2_lag2 dummy_2_lag3 mean_gr_rate_atc2 recalls_sales ageprodcat1 ageprodcat2 ageprodcat3 ageprodcat4 newmolfirm newmolmarket dumYear*
      
      *For Firm aggregation
      reg tasso_crescita_sales lag_log_tot_sales mean_gr_rate_atc2 dummy_2_lag1 dummy_2_lag2 dummy_2_lag3 recalls_sales percentage_rx lag_inflow_ratio lag_outflow_ratio agefirmcat* lag_atcmain newmolfirm newmolmarket dumYear*
      
      *For ATC2 aggregation
      reg tasso_crescita_sales_atc2 log_sales_lag dummy_2_lag1 dummy_2_lag2 dummy_2_lag3 mean_gr_rate_market recalls_sales percentage_rx share_generic mean_agefirm_byatc mean_agefirm_squared hhi lag_inflow_ratio lag_outflow_ratio newmolatc newmolmarket dumYear_*
      Thank you again a lot!


      Comment


      • #4
        Thanks for showing code. On your first block of code, I have only some small cosmetic comments, as below:


        Code:
        local recall "dummy_2" // will be improved with other dummies
        
        * NJC 1: You don't need -clear- if you have just issued -clear all- 
        
        clear all
        clear
        macro drop _all
        set more off
        
        cd "/Users/federiconutarelli/Desktop/PhD/Lavoro_Riccaboni/DATi_USA/Pharma_project_2019 /abnormal_values"
        
        * NJC 2: indent for clarity
        *        insert spaces and blank lines for clarity 
         
        foreach rec of local recall {
            local aggr "product firm atc2"
            tokenize `aggr'
        
            foreach agg of local aggr {
                use "DB_`agg'_aggregation.dta", clear
        
        * NJC 3 the -egen- function -sum()- is undocumented as of Stata 9 
        *       use -total()- instead 
        
                * STEP 1,
                egen sum_recalls = sum(`rec'), by(id`agg')
                drop if sum_recalls == 0 
        
        * NJC 4 use -bysort- to combine the next two 
                * STEP 2 
                sort idpr Year
                bysort id`agg' (Year): gen ritiri = cond(`rec', sum(`rec'), 0) 
        
                su ritiri, meanonly 
        
                quietly forval j = 1/`r(max)' { 
                    egen when`j' = total(Year * (ritiri == `j')), by(id`agg') 
                    replace when`j' = . if when`j' == 0  
                    gen rescaled`j' = Year - when`j'
                }
            }
        }
        
        * Here I stopped the loop since the problems arise (please, see STEP 3)
        
        * STEP 3:
        eststo clear
        xtset id`agg' Year
        
        * Variables that are common to all 3 aggregations (databases):
        
        gen sales_lag = L.tot_sales 
        gen tasso_crescita_sales_`agg' = ln(tot_sales - sales_lag) 
        gen log_sales = ln(tot_sales)
        gen log_sales_lagged = L.log_sales
        gen recalls_sales = recalls_lagged*log_sales_lagged //fino a qui ok per tutte 3
        gen log_sales_lag = L.log_sales  
        gen dummy_2_lag1 = L.`rec'
        gen dummy_2_lag2 = L2.`rec'
        gen dummy_2_lag3 = L3.`rec'
        
        * NJC 5 -xi- is obsolete unless you are using a very old version, 
        * in which case you should tell us what it is 
        
        xi i.Year, pre(dum) 
        
        *variables that are common to Product and firm aggregations:
        
        replace age`agg' = age`agg' + 1 
        gen age`agg'_cat = .
        replace age`agg'_cat = 1 if age`agg' <= 5
        replace age`agg'_cat = 2 if age`agg' >= 6 & age`agg' <= 10
        replace age`agg'_cat = 3 if age`agg' >= 11 & age`agg' <= 30
        replace age`agg'_cat = 4 if age`agg' >= 31 & age`agg' < . 
        tab age`agg'_cat*, gen(age`agg'cat)
        egen mean_gr_rate_atc2 = mean(tasso_crescita_sales_`agg'), by(idatc2 Year) 
        *newmolfirm
        
        * Variables that are common to firm and atc2 aggregations
        gen lag_inflow_ratio = inflow/nprod_lag //inflow is already in both databases
        gen lag_outflow_ratio = outflow/nprod_lag
        *percentage_rx (already in the DBs)
        
        * FIRM SPECIFIC variables:
        gen lag_atcmain = L.atcmain //atcmain is a variable of firm aggregation database
        
        * NJC 6 Just to flag that none of these will work.... 
        
        * ATC2 SPECIFIC: newmolatc
        gen mean_gr_rate_market = 
        gen share_generic_products = 
        gen mean_agefirm_byatc = 
        gen mean_agefirm_squared
        gen hhi = 
        gen newmolatc =

        On your regressions: it seems to me that no loop will make this clearer or more concise.


        Code:
        *For product aggregation 
        reg tasso_crescita_sales_prod log_sales_lag mean_gr_rate_atc2 dummy_2_lag1 ///
        dummy_2_lag2 dummy_2_lag3 mean_gr_rate_atc2 recalls_sales ageprodcat1      ///
        ageprodcat2 ageprodcat3 ageprodcat4 newmolfirm newmolmarket dumYear*
        
        *For Firm aggregation
        reg tasso_crescita_sales lag_log_tot_sales mean_gr_rate_atc2 dummy_2_lag1  ///
        dummy_2_lag2 dummy_2_lag3 recalls_sales percentage_rx lag_inflow_ratio     ///
        lag_outflow_ratio agefirmcat* lag_atcmain newmolfirm newmolmarket dumYear*
        
        *For ATC2 aggregation
        reg tasso_crescita_sales_atc2 log_sales_lag dummy_2_lag1 dummy_2_lag2      ///
        dummy_2_lag3 mean_gr_rate_market recalls_sales percentage_rx               ///
        share_generic mean_agefirm_byatc mean_agefirm_squared hhi lag_inflow_ratio ///
        lag_outflow_ratio newmolatc newmolmarket dumYear_*

        What you could do is define a local macro with predictors that the regressions have in common. That would help to show what is different.

        Comment


        • #5
          What you could do is define a local macro with predictors that the regressions have in common. That would help to show what is different.
          Thank you very much for the comments. Actually I was thinking about something like storing the number of loops and performing the regressions separately as the loop goes on (i.e. If number_loops == 1 then I do the first regression, if number_loops ==2 then I do the second and so on). Just I don't know if this is feasible in STATA

          Federico
          Last edited by Federico Nutarelli; 06 Nov 2019, 07:52.

          Comment


          • #6
            I am now lost, apart from knowing that Stata not STATA is the correct spelling, as FAQ Advice #18 does explain.

            There are no local macros in your regression commands, so if you are doing that in a loop it's not obvious. You may well have explained this, even more than once, but I may not be alone in being unable to read code as long as #3 and understand the totality of what is happening.

            Comment


            • #7
              No indeed. There are no local macros. What I was sayin is that, since I would like to specify different variables for different aggregations (please see code above under * STEP 3), a possibility could be that of counting and storing the number of the loop in which I am in and tell to generate variables according to the loop number. So I have:


              Code:
               
               local aggr "product firm atc2"     tokenize `aggr'      foreach agg of local aggr {       *define variables }
              what I was thinking was: specifying the variables that are common to product and firm aggregation means that the number of loops made (since we are looping through the local aggr that is "product firm atc2") are <= 2 (so that aggr took values "product" and "firm"); when instead the number of loops are >= 2 it means that aggr takes values firm and atc2 and hence in this way I could define the variables that are instead common to firm and atc2. I mean, this was just a question: is it possible/convenient to do something like this, i.e. create a macro or a variable (e.g. a counter) according to which an if loop (or something similar) can be build up and hence aggregation specific variables constructed?

              Thank you again

              Comment


              • #8
                I guess you're halfway to answering your own question. You're better qualified than I am to fill in the other half.

                Comment


                • #9
                  Ok thank you anyway.
                  Have a nice day!

                  Comment

                  Working...
                  X