Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create multiple new variables using postfile

    Hello,

    I am trying to create a new dataset from an output using postfile command. Specifically, I want to create variables cat1_ave as the mean of cat1, cat2_ave as the mean of cat2, until cat10_ave. I can successfully create the first two columns (i.e., year and cat1_ave). However, the remaining columns do not work. They are still the average of the cat1. I wondered how to solve this problem? I cannot figure it out. Below is my syntax.

    capture postutil clear
    postfile ms_year1 year cat1_ave cat2_ave cat3_ave ///
    cat4_ave cat5_ave cat6_ave ///
    cat7_ave cat8_ave cat9_ave cat10_ave ///
    using ms_year1.dta, replace

    forvalues yr = 2010/2017 {
    sum cat1-cat10 if (year==`yr' & it=="ms")
    local var "`k'"
    post ms_year1 (`yr') (`r(mean)') (`r(mean)') (`r(mean)') ///
    (`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)') (`r(mean)')
    }
    postclose ms_year1



    Many thanks,
    David

  • #2
    When you -summarize- a single variable, r(mean) contains the mean of that variable. When you -summarize- a whole list of variables, r(mean) contains the mean of only the last variable in the list. You have, in your code, set all of cat1_ave through cat10_ave to the mean value of cat1.

    So what you need to do is add one mean at a time in a loop, and then post the whole thing. Something like this:

    Code:
    forvalues yr = 2010/2017 {
        local topost (`yr')
        forvalues i = 1/10 {
            sum cat1`i' if (year==`yr' & it=="ms")
    //        local var "`k'" // YOU NEVER USE THIS AND IT DOESN"T APPEAR TO MAKE SENSE
                            // AS LOCAL MACRO k IS NEVER DEFINED IN THIS CODE
            local topost `topost' (`r(mean)')
        }
        post ms_year1 `topost'
    }
    As no sample data was provided, this code is not tested and may contain typos or other errors, but the gist of it is how you approach this kind of thing in general.

    That said, this seems an unnecessarily complicated approach. Why not just do this?

    Code:
    preserve
    collapse (mean) cat1-cat10, by(yr)
    rename cat* cat*_ave
    save ms_year1, replace
    restore
    In the future, when showing code, please wrap it inside code delimiters so that indentation and other formatting will be preserved. The resulting code will be much easier for people who want to help you to read. If you are not familiar with code delimiters, please read FAQ #12 for information about them.

    Comment


    • #3
      Thanks much, Clyde!

      The collapse approach is promising. I have two follow-up questions about this approach. (1) I also want to calculate the mean within each year by the variable "it", and I tried to add the second by(it), but it does not work. (2) some years have missing values. When using the collapse approach, a couple of years with missing values do not show up in the generated "ms_year1" dataset. I was wondering how to display every year even though there are missing values in that year?

      Regarding the forvalues and post approach, the codes look correct. However, the output says "invalid syntax post: above message corresponds to expression 2, variable cat1_ave". and no means were generated. I am not sure why it is now working?

      This is my first post at statalist.org. I will wrap it inside the code delimiters.

      sample data:
      code:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str3 it year cat1-cat10
      ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
      ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
      ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
      ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
      ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
      ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
      ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
      ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
      mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
      mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
      mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
      mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
      mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
      mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
      mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
      mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
      mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
      mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
      ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
      mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
      mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
      mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
      mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
      mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
      mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
      end
      Thanks,
      David

      Comment


      • #4
        David, thanks for the -dataex-. But you still need to ask your question more clearly. You don't show any code, so I have no way to know what you did. You don't explain what you mean by "does not work." There are thousands of ways code can "not work." You have to show what Stata actually gave you as output (including error messages). And, if it isn't obvious, you need to then explain how it's not what you were expecting.

        That said, here is some code using postfile that works with your most recent example data.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str3 it year cat1-cat10
        ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
        ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
        ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
        ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
        ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
        ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
        ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
        ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
        mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
        mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
        mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
        mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
        mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
        mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
        mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
        mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
        mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
        mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
        ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
        mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
        mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
        mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
        mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
        mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
        mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
        end
        tempfile results
        capture postutil clear
        postfile ms_year1 str3 it yr cat1_ave cat2_ave cat3_ave cat4_ave cat5_ave cat6_ave ///
            cat7_ave cat8_ave cat9_ave cat10_ave using `results'
        levelsof it, local(its)
        foreach it of local its {
            forvalues yr = 2010/2017 {
                local topost ("`it'") (`yr')
                forvalues i = 1/10 {
                    sum cat`i' if (year==`yr' & it=="ms")
            //        local var "`k'" // YOU NEVER USE THIS AND IT DOESN"T APPEAR TO MAKE SENSE
                                    // AS LOCAL MACRO k IS NEVER DEFINED IN THIS CODE
                    local topost `topost' (`r(mean)')
                }
                post ms_year1 `topost'
            }
        }
        postclose ms_year1
        use `results', clear
        If you want me to troubleshoot the -collapse- version, post back with the actual code you tried and show the actual results you got, and I'll give it a try.

        Comment


        • #5
          Thanks a lot, Clyde. I think I did not explain my question clearly. What I want to do is to list the average of cat1-cat10 by year and by it. Because my original codes writes "it=="ms", the generated outputs list only "ms". In other words, even though in the "it" and "yr" columns display "mc", "mp" and each year nicely, the observations from cat1 to cat10 are "ms"'s corresponding means. Thus, I tweaked that part a little bit. But it is not working. The error message says: "invalid syntax post: above message corresponds to expression 3, variable cat1_ave".

          Below are codes that are not working.

          Code:
          tempfile results
          capture postutil clear
          postfile results str3 it yr cat1_ave cat2_ave cat3_ave cat4_ave cat5_ave cat6_ave ///
              cat7_ave cat8_ave cat9_ave cat10_ave using "C:\Users\Desktop\results.dta"
          
          levelsof it, local(its)
          foreach it of local its {
              forvalues yr = 2010/2017 {
                  local topost ("`it'") (`yr')
                  forvalues i = 1/10 {
                      sum cat`i' if (year==`yr' & it=="`its'")
                      local topost `topost' (`r(mean)')
                  }
                  post results `topost'
              }
          }
          postclose results
          Here is the sample data:
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str3 it year cat1-cat10
          ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
          ms    2010    0.209    0.891    0.467    0.882    0.209    0.651    0.227    0.596    0.395    0.597
          ms    2010    0.187    0.761    0.739    0.748    0.561    0.275    0.244    0.245    0.937    0.215
          ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
          ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
          ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
          ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
          ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
          ms    2015    0.434    0.563    0.203    0.591    0.398    0.547    0.058    0.607    0.191    0.212
          ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
          ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
          ms    2017    0.717    0.026    0.008    0.534    0.246    0.896    0.915    0.827    0.326    0.591
          mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
          mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
          mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
          mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
          mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
          mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
          mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
          mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
          mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
          mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
          ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
          mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
          mc    2012    0.427    0.754    0.667    0.005    0.510    0.851    0.218    0.131    0.190    0.971
          mc    2012    0.828    0.208    0.032    0.655    .    0.628    0.536    0.056    0.702    0.045
          mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
          mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
          mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
          mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
          mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
          end
          Regarding the -collapse- version, I really think it is a fantastic approach. Below is the code that I tried. Not sure how to break it down by "it" within each year.

          Code:
          preserve
          collapse (mean) cat1-cat10, by(yr) by(it)
          rename cat* cat*_ave
          save ms_collapse, replace
          restore
          Many thanks,
          David

          Comment


          • #6
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str3 it year cat1-cat10
            ms    2010    0.278    0.236    0.750    0.072    0.385    0.704    0.932    0.742    0.884    0.591
            ms    2010    0.209    0.891    0.467    0.882    0.209    0.651    0.227    0.596    0.395    0.597
            ms    2010    0.187    0.761    0.739    0.748    0.561    0.275    0.244    0.245    0.937    0.215
            ms    2011    0.004    0.626    0.018    0.653    0.508    0.071    0.458    0.292    0.672    0.567
            ms    2012    0.311    0.675    0.271    0.239    0.238    0.339    0.261    0.575    0.820    0.046
            ms    2013    0.815    0.002    0.705    0.918    0.875    0.739    0.295    0.915    0.178    0.349
            ms    2014    0.817    0.825    0.450    0.963    0.027    0.968    0.274    0.245    0.843    0.007
            ms    2015    0.518    0.867    0.475    0.909    0.340    0.146    0.563    0.939    0.133    0.821
            ms    2015    0.434    0.563    0.203    0.591    0.398    0.547    0.058    0.607    0.191    0.212
            ms    2016    0.554    0.910    0.451    0.754    0.090    0.110    0.598    0.015    0.720    0.914
            ms    2017    0.109    0.345    0.716    0.862    0.909    0.167    0.215    0.600    0.822    0.219
            ms    2017    0.717    0.026    0.008    0.534    0.246    0.896    0.915    0.827    0.326    0.591
            mp    2010    0.737    0.397    0.038    .    .    0.461    0.111    .    .    .
            mp    2011    0.664    0.669    0.172    0.546    0.095    0.931    0.893    0.619    0.252    0.262
            mp    2012    0.649    0.495    0.567    0.354    0.954    0.029    0.256    0.144    0.836    0.932
            mp    2013    0.787    0.947    0.232    0.392    0.536    0.767    0.218    0.306    0.068    0.578
            mp    2014    0.143    0.747    0.787    0.890    0.815    0.558    0.936    0.754    0.838    0.988
            mp    2015    0.868    0.910    0.743    0.925    0.375    0.649    0.321    0.533    0.971    0.522
            mp    2016    0.138    0.935    0.777    0.822    0.046    0.886    0.325    0.888    0.539    0.044
            mp    2017    0.189    0.285    0.480    0.209    0.750    0.697    0.120    0.969    0.236    0.162
            mp    2012    0.260    0.607    0.747    0.767    0.894    0.459    0.471    0.016    0.275    0.637
            mc    2010    0.704    0.570    0.581    0.063    0.802    0.468    0.900    0.632    0.065    0.580
            ms    2011    0.531    0.519    0.706    .    0.833    0.869    0.963    .    0.477    .
            mc    2012    0.537    0.048    0.517    0.612    0.759    0.748    0.134    0.612    0.995    0.555
            mc    2012    0.427    0.754    0.667    0.005    0.510    0.851    0.218    0.131    0.190    0.971
            mc    2012    0.828    0.208    0.032    0.655    .    0.628    0.536    0.056    0.702    0.045
            mc    2013    0.490    0.742    0.354    0.529    0.587    0.693    0.141    0.247    0.634    0.627
            mc    2014    0.866    0.251    0.261    0.006    0.400    0.986    0.330    0.562    0.355    0.431
            mc    2015    0.669    0.898    0.898    0.366    0.924    0.368    0.001    0.905    0.879    0.632
            mc    2016    0.251    0.772    0.488    0.608    0.422    0.760    0.430    0.852    0.327    0.507
            mc    2017    0.543    0.716    0.129    0.968    0.553    0.083    0.197    0.277    0.550    0.099
            end
            
            //    HERE'S HOW TO DO IT WITH A TEMPFILE
            tempfile results
            capture postutil clear
            postfile handle str3 it yr cat1_ave cat2_ave cat3_ave cat4_ave cat5_ave cat6_ave ///
                cat7_ave cat8_ave cat9_ave cat10_ave using `results'
            
            levelsof it, local(its)
            foreach it of local its {
                forvalues yr = 2010/2017 {
                    local topost ("`it'") (`yr')
                    forvalues i = 1/10 {
                        sum cat`i' if (year==`yr' & it=="`it'")
                        if `r(N)' {
                            local topost `topost' (`r(mean)')
                        }
                        else {
                            local topost `topost' (.)
                        }
                    }
                    post handle `topost'
                }
            }
            postclose handle
            
            //    HERE'S HOW TO DO IT WITH COLLAPSE
            preserve
            tempfile results
            collapse (mean) cat1-cat10, by(year it)
            rename cat* cat*_ave
            fillin it year
            save `results', replace
            restore
            
            use `results', clear
            isid it yr, sort
            So the above is how to do it both ways. (Do only one or the other, of course.) Changes to your code are in bold italics. I gather that you want a row of missing cat*_ave values for combinations of it and year that do not exist in your data. That's what that -fillin- command does.

            Comment


            • #7
              Thanks much, Clyde! Both -foreach- and -collapse- approach really works great! You have magic! I think the loop over is too complicated for me. I will stick to the -collapse- approach.

              Again, thank you, Clyde!

              Best,
              David

              Comment

              Working...
              X