Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks, Carlo. It's a good idea to begin programs with a -version #- statement, typically the version you are using, so that Stata knows what minimum version is needed to run the program and that it will run in future versions. There's nothing special her that requires version 17, so you can replace 17 with your version to get it to run (or simply omit the statement altogether for learning purposes, but not for serious work).

    Note I also have a typo in the command and should be corrected with the text in red.

    Code:
    cap progam drop myprog
    program define myprog, rclass
      version 17
      syntax varlist(max=1) [if] [in]
     
      unab v : `varlist'
      confirm numeric var `v'
      marksample touse
     
      tempname p10 p90 p10_mean p90_mean
      qui summ `v' if `touse', det
      scalar `p10' = r(p10)
      scalar `p90' = r(p90)
      return scalar p10 = `p10'
      return scalar p90 = `p90'
    
      qui mean `v' if `touse' & inrange(`v', ., `p10')
      scalar `p10_mean' = r(table)["b",1]
      return scalar p10_mean = `p10_mean'
     
      qui mean `v' if `touse' & inrange(`v', `p90', .)
      scalar `p90_mean' = r(table)["b",1]  
      return scalar p90_mean = `p90_mean'
    end

    Comment


    • #17
      Hi Carlo,
      I have stata 17 on my labtop. I figured out that the program works on my laptop computer when I copy the code. The code problem (mismatch) appears to be that I have changed the text when I tapped it into our secure server (I need to figure out what the typo or similar was?) where I plan to use this program. I have tried to capture the bootstrapped confidence interval via e(ci_normal) or similar. I need to do that to use in my do-files at the server. However, I am not sure all of these "stored" functions work after the customized program. Do you know whether the bootstrapped CIs can be captured via e(?)? Besides the mean CI I also need to bootstrap the Coefficient of variation(CV) and the related CI (like it was the case for the mean). Is it easy to extend the program to do that as well?? Thank you for your kind help in advance :-)))) It is very useful for me and probably others :-)) I think the problem has a general nature!
      Best Troels

      Comment


      • #18
        Hi Leonardo
        Thank you for pointing the typo out. I will adjust my version accordingly. I forgot that it was you who made the program when i responded to Carlo. Do you think the program can be extended? I am not sure all of these "stored" functions work after the customized program. Do you know whether the bootstrapped CIs can be captured via e(?)? Besides the mean CI I also need to bootstrap the Coefficient of variation(CV) and the related CI (like it was the case for the mean). Is it easy to extend the program to do that as well?? (I do not think these programs are easy :-) Thank you for your kind help in advance :-)))) It is very useful for me and probably others :-)) I think the problem has a general nature as mentioned to Carlo above!
        Best Troels

        Comment


        • #19
          You're welcome.

          Originally posted by Troels Kristensen View Post
          Do you think the program can be extended? Besides the mean CI I also need to bootstrap the Coefficient of variation(CV) and the related CI (like it was the case for the mean). Is it easy to extend the program to do that as well?
          Yes, the program can be extended. In fact, it's a general purpose strategy to create such a custom program for use with -bootstrap- or -simulate- (among others) when the procedures you want are either multiple or require more than just a single command. You can also make more than one program, if that makes more sense. In your case, you could add the CV to the program in #16.

          I am not sure all of these "stored" functions work after the customized program.
          I'm not sure what you mean. The program persists as long as the Stata session is in existence, so you can run it multiple times.

          Do you know whether the bootstrapped CIs can be captured via e(?)?
          The output of -help bootstrap- will tell you where things are returned. The display table is returned in -r(table)- and most other statistics are return in -e()-, such as -e(b_bs)- for the point estimates and there are similar matrices for the CIs.

          Code:
          I do not think these programs are easy
          It's quite normal to think so when starting out, and I certainly wouldn't fault you for thinking they were difficult. In order to learn how these programs are made and to get better at them, and later tweaking or making your own, I recommend that you rad the PDF User's Guide manual, and paying particular attention to Chapter 18: Programming Stata. It will no doubt take some time, but it will pay back dividends.

          Comment


          • #20
            This is how you could add the CV estimation in the full sample to the same program that then takes the means of the tails.

            Code:
            cap progam drop myprog
            program define myprog, rclass
              version 17
              syntax varlist(max=1) [if] [in]
             
              unab v : `varlist'
              confirm numeric var `v'
              marksample touse
             
              tempname mean var cv p10 p90 p10_mean p90_mean
              qui summ `v' if `touse', det
              scalar `p10' = r(p10)
              scalar `p90' = r(p90)
              scalar `mean' = r(mean)
              scalar `var' = r(Var)
              scalar `cv' = sqrt(`var') / `mean'
              return scalar p10 = `p10'
              return scalar p90 = `p90'
              return scalar cv = `cv'
            
              qui mean `v' if `touse' & inrange(`v', ., `p10')
              scalar `p10_mean' = r(table)["b",1]
              return scalar p10_mean = `p10_mean'
             
              qui mean `v' if `touse' & inrange(`v', `p90', .)
              scalar `p90_mean' = r(table)["b",1]  
              return scalar p90_mean = `p90_mean'
            end

            Comment


            • #21

              Hi Leonardo,
              Thank you - this is useful. However to decribe the extreme groups, there should be two CVs and related CIs - one for each extreme group above p90 and below p10 - if possible :-)

              Just a refection question: How can it be seen that we start bootstraping based on the entire sample of prices (N=74) - in the end the extreme groups comprise N=8 based on the price data. Just to be sure that the results are different than a bootstrap based on the subsample. Best Troels

              Click image for larger version

Name:	Captures.JPG
Views:	1
Size:	65.6 KB
ID:	1631049

              Comment


              • #22
                Apparently, the "ereturn list" is working now (had problems using it after one of the previous versions). This means the results of the program can be captured in a do-file and processed further as I understand it - as required.

                In relation to the CV and related confidence interval for the two extreme groups (above p90 and below p10) I tried the following
                qui cv `v' if `touse' & inrange(`v', ., `p10')
                scalar `p10_cv' = r(table)["b",1]
                return scalar p10_cv = `p10_cv'

                but this does not work - probably because cv is unknown by stata and I do not know the programming language? Hope you can help :-)

                Comment


                • #23
                  This is an edited version to give the CV in each tail to demonstrate how to do it.

                  Code:
                  cap progam drop myprog
                  program define myprog, rclass
                    version 17
                    syntax varlist(max=1) [if] [in]
                   
                    unab v : `varlist'
                    confirm numeric var `v'
                    marksample touse
                   
                    tempname p10 p10_mean p10_sd p10_cv ///
                             p90 p90_mean p90_sd p90_cv
                    qui summ `v' if `touse', det
                    scalar `p10' = r(p10)
                    scalar `p90' = r(p90)
                    return scalar N = r(N)
                    return scalar p10 = `p10'
                    return scalar p90 = `p90'
                  
                    qui summ `v' if `touse' & inrange(`v', ., `p10'), detail
                    scalar `p10_mean' = r(mean)
                    scalar `p10_sd' = r(sd)
                    scalar `p10_cv' = `p10_sd' / `p10_mean'
                    return scalar p10_N = r(N)
                    return scalar p10_mean = `p10_mean'
                    return scalar p10_sd = `p10_sd'
                    return scalar p10_cv = `p10_cv'
                   
                    qui summ `v' if `touse' & inrange(`v', `p90', .), detail
                    scalar `p90_mean' = r(mean)
                    scalar `p90_sd' = r(sd)
                    scalar `p90_cv' = `p90_sd' / `p90_mean'
                    return scalar p90_N = r(N)
                    return scalar p90_mean = `p90_mean'
                    return scalar p90_sd = `p90_sd'
                    return scalar p90_cv = `p90_cv'
                  end
                  Apparently, the "ereturn list" is working now (had problems using it after one of the previous versions). This means the results of the program can be captured in a do-file and processed further as I understand it
                  No, not quite. The above program returns r() results, not in e(). You can see this by the fact that it's an r-class program (highlighted in red). You can access results in r() or e() just as easily. After running this several times using -bootstrap-, those results are stored in e(), but have asked -bootstrap- to gather the results of -myprog- from r(). bootstrap computes the confidence intervals for you, based on the bootstrap samples.

                  I strongly recommend reading up more on how to program using Stata from the helpful documentation. I've done enough to show you how you can make such programs and you'll benefit greatly from a more solid foundation after reading the documentation.

                  How can it be seen that we start bootstraping based on the entire sample of prices (N=74) - in the end the extreme groups comprise N=8 based on the price data. Just to be sure that the results are different than a bootstrap based on the subsample.
                  I added the estimation sample size to be returned in r(N), but you don't need this in this case. -bootstrap- works on the overall sample, so will bootstrap the whole sample if running something like

                  Code:
                  bootstrap: myprog varname

                  Comment


                  • #24
                    Hi Leonardo
                    Thank you and well done. From my point of view , we managed to make a useful application via our dialogue. At least it makes the calculations that I wanted to perform. Nevertheless, as stated above I the ability to describe extreme groups of sample has a general interest/nature.

                    You are right I need to study the programming materials - to try to improve my skills :-) Anyway, I think you help and advice was very helpful! /(I am impressed! How are your activities financed?)

                    .
                    myprog price

                    . ret list

                    scalars:
                    r(p90_cv) = .1180803594571282
                    r(p90_sd) = 1554.719812837211
                    r(p90_mean) = 13166.625
                    r(p90_N) = 8
                    r(p10_cv) = .0648969476026494
                    r(p10_sd) = 237.8959856744119
                    r(p10_mean) = 3665.75
                    r(p10_N) = 8
                    r(p90) = 11385
                    r(p10) = 3895
                    r(N) = 74

                    .
                    . bootstrap p10_mean=r(p10_mean) p90_mean=r(p90_mean) p10_cv=r(p10_cv) p90_cv=r(p90_cv), reps(50)
                    > : myprog price
                    (running myprog on estimation sample)

                    warning: myprog does not set e(sample), so no observations will be excluded from the resampling
                    because of missing values or other reasons. To exclude observations, press Break, save
                    the data, drop any observations that are to be excluded, and rerun bootstrap.

                    Bootstrap replications (50)
                    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
                    .................................................. 50

                    Bootstrap results Number of obs = 74
                    Replications = 50

                    Command: myprog price
                    p10_mean: r(p10_mean)
                    p90_mean: r(p90_mean)
                    p10_cv: r(p10_cv)
                    p90_cv: r(p90_cv)

                    ------------------------------------------------------------------------------
                    | Observed Bootstrap Normal-based
                    | coefficient std. err. z P>|z| [95% conf. interval]
                    -------------+----------------------------------------------------------------
                    p10_mean | 3665.75 119.4233 30.70 0.000 3431.685 3899.815
                    p90_mean | 13166.63 801.2914 16.43 0.000 11596.12 14737.13
                    p10_cv | .0648969 .0152537 4.25 0.000 .0350002 .0947937
                    p90_cv | .1180804 .031155 3.79 0.000 .0570177 .179143
                    ------------------------------------------------------------------------------

                    .
                    end of do-file

                    Comment


                    • #25
                      Hi Leonardo,
                      I think I realized that my challenge have not been fully solved. The programme we have produced can describe P10_mean , P90_mean etc for a specific variable price - but does not describe the other variables in the data set (such as weight, mpg headroom etc) based on the same bootstrap procedure. I think, the latter is required if you want to describe the cars in p10 and p90 for price. Can this be done? This means besides the p10_mean, p90_mean etc for price also to include the same descriptives for other variables in the same operations/program. I think it is required to be able to describe the extreme groups >p90 and <p10 based on one variable such as price and as a result of- one bootstrapping procedure. At the moment it is possible to the describe the extreme groups via bootstrapping of one variable at a time. I do not think this is the same result as if all is done in one procedure. This means e.g. 1000 new samples which is used to calclulate all variables rather than 1000 new samples for each variable?
                      Do you understand my challenge? (woundering what is most appropriate from a statistical point if view.
                      Best troels

                      Comment


                      • #26
                        i think this is now a different problem, but in theory, you should be able to extend the framework in the above program to do what you want with each bootstrap sample. Whether you need to do this, or if it makes sense, I can't say in your case.

                        Comment


                        • #27
                          This is a good course to get started with programming Stata.
                          HTML Code:
                          https://www.stata.com/netcourse/writing-own-commands-nc251/

                          Comment

                          Working...
                          X