Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate a confindence interval for multiple *dta file using a loop.

    Hi everybody!!

    I have downloaded 32 *.dta files of the Permanent Household Survey of Argentina that is conducted quarterly.
    I had generated new variables to calculate the net enrollment rate (NER) for the 4th 2010 quarter. But I will have to do the same 31 times more. The NER is a measure of the proportion of primary school age children who are enrolled in primary school.

    I couldn't find the answer in the forum of what to do in this cases.

    Only for the 4th 2010 quarter I' have done this:

    Code:
    gen primary=1 if ch12==2 & ch10==1
    replace primary=0 if ch12!=2
    replace primary=0 if ch12==2 & ch10==2
    ci primary [fw=pondera] if ch06>=6 & ch06<=11 & ch12!=9 & ch10!=0
    But I need to do the same with another 31 datasets more. The files are all of them in the same directory and the title of the files are like "individual_t110.dta" (first quarter of 2010), "individual210.dta" (second quarter of 2010), "individual310.dta" (third quarter.. ), "individual410.dta" (4th quarter) and the same for 2011, 2012, 2013 ... 2018.

    I know that the solution is a loop, but I don't know where to start. Even if I knew how to do that loop, I should have an matrix of means as output.

    Thanks!!!
    Last edited by Ignacio Ibarra; 18 Sep 2019, 21:14.

  • #2
    It is possible to do this, but in Stata it is usually not a good idea to put information into a matrix unless you are going to actually do matrix algebra with it. For any other purpose, you are better off building a data set of results.

    So there are three steps. First create a local macro containing the names of all the files. Next create a temporary file to hold the results. Then loop over the filenames to do the calculations:

    Code:
    local filenames: dir "." files "individual*.dta"
    
    capture postutil clear
    tempfile results
    postfile handle str32 filename float ner using `results'
    
    foreach f of local filenames {
        use `f', clear
        display "Processing file `f'"
        quietly gen primary=1 if ch12==2 & ch10==1
        quietly replace primary=0 if ch12!=2
        quietly replace primary=0 if ch12==2 & ch10==2
        ci primary [fw=pondera] if ch06>=6 & ch06<=11 & ch12!=9 & ch10!=0 
        post handle ("`f'") (`r(mean)')
    }
    
    postclose handle
    use `results', clear
    At the end of this code, the confidence intervals will have all been displayed in the Results window, and a new data file containing them will be in memory.

    By the way, I assume you are using a fairly old version of Stata because the syntax in your -ci- command is not modern.

    Comment


    • #3
      Hi, Clyde. You're a genious.

      Yes, I don't want to make algebra, so the dataset of results it's a better option.

      But, I runned your code and, at the end, appears an error: "invalid file specification". Why? Is it be possible that in the first line, where you put local filenames: dir "." files "individual*.dta" , I have to put local filenames: dir "C:\Users\Asus\Desktop\Tesis\Datasets" files "individual*.dta"? Or that's no the problem
      Last edited by Ignacio Ibarra; 19 Sep 2019, 11:16.

      Comment


      • #4
        But, I runned your code and, at the end, appears an error: "invalid file specification".
        Please show the exact output from Stata including the error message. I don't know what "at the end" means here. Without seeing exactly what happened, I can't troubleshoot it.

        Is it be possible that in the first line, where you put local filenames: dir "." files "individual*.dta" , I have to put local filenames: dir "C:\Users\Asus\Desktop\Tesis\Datasets" files "individual*.dta"?
        The code above assumes that the files individual* are in the current working directory. If that's not the case, then, not only do you need to change the -local filenames- command in the way you describe, but you also have to include the full pathname in every command that refers to `f', except for the -post handle- command. Generally speaking, it is simpler to change the current working directory before running the unmodified code than to make all those code changes.

        Comment


        • #5
          Hi Clyde, the output is
          Code:
           local filenames: dir "C:\Users\Asus\Desktop\Tesis - Educacion Superior y Economia\EPH - Trimestral - 2019 - 2010\Bases" files "individual*.dta"
          
           capture postutil clear
          
           tempfile results
          
           postfile handle str32 filename float ner using `results'
          
          
           foreach f of local filenames {
            2.     use `f', clear
            3.     display "Processing file `f'"
            4.     quietly gen primary=1 if ch12==2 & ch10==1
            5.     quietly replace primary=0 if ch12!=2
            6.     quietly replace primary=0 if ch12==2 & ch10==2
            7.     ci primary [fw=pondera] if ch06>=6 & ch06<=11 & ch12!=9 & ch10!=0
            8.     post handle ("`f'") (`r(mean)')
            9. }
          file individual_t110.dta not found
          r(601);
          
          end of do-file
          
          r(601);
          The working directory that you refer is the folder in my computer where i hosted all the individual*.dta files, isn't it?

          Any suggestion?

          Regards

          Comment


          • #6
            The working directory is the folder from which you launched Stata by double-clicking on a .dta, .do, or .smcl file, or, if you launched Stata in some other way, it is the default working directory for Stata which, assuming you accepted the default installation, would be, in Windows, C:\Program Files\Stata16 (or whatever version you are running). The way to know what your working directory is to issue the command -cd-, and Stata will tell you. If it's not the directory that hosts those files, then use the -cd- command again to change it to the directory you want:

            Code:
            cd "C:\Users\Asus\Desktop\Tesis - Educacion Superior y Economia\EPH - Trimestral - 2019 - 2010\Bases"


            Comment


            • #7
              Clyde, I am so greatful with you.

              Just one more question, please.

              In which part of the loop I could add a command that generates columns in the same result set with the mean of other variables. I have to calculate the NER for different education levels: kindergarten, primary, secondary, university, etc.

              I have just tried to do that running this code, adding only garden variable, but doesn't work:

              Code:
              cd "C:\Users\Asus\Desktop\Tesis - Educacion Superior y Economia\EPH - Trimestral - 2019 - 2010\Bases"
              
              local filenames: dir "C:\Users\Asus\Desktop\Tesis - Educacion Superior y Economia\EPH - Trimestral - 2019 - 2010\Bases" files "individual*.dta"
              
              capture postutil clear
              tempfile results_2
              postfile handle str32 filename float ner using `results_2'
              
              foreach f of local filenames {
                  use `f', clear
                  display "Processing file `f'"
                  quietly gen primary=1 if ch12==2 & ch10==1
                  quietly replace primary=0 if ch12!=2
                  quietly replace primary=0 if ch12==2 & ch10==2
                  ci primary [fw=pondera] if ch06>=6 & ch06<=11 & ch12!=9 & ch10!=0
                  gen kindergarten=1 if ch12==1 & ch10==1
                  replace kindergarten=0 if ch12!=1   
                  replace kindergarten=0 if ch12==1 & ch10==2  
                  ci jardin [fw=pondera] if ch06>=4 & ch06<=5 & ch12!=9 & ch10!=0
                  post handle ("`f'") (`r(mean)')
              }
              
              postclose handle
              use `results_2', clear
              I thought if I add this lines in the loop It will generated other column for the "kindergarten" variable. But nothing changed.

              Code:
              gen kindergarten=1 if ch12==1 & ch10==1
              replace kindergarten=0 if ch12!=1   
              replace kindergarten=0 if ch12==1 & ch10==2  
              ci jardin [fw=pondera] if ch06>=4 & ch06<=5 & ch12!=9 & ch10!=0

              Thank you so much.
              Last edited by Ignacio Ibarra; 20 Sep 2019, 16:44.

              Comment


              • #8
                Yes, I think that will do it. And, with that, you can also change the -local filenames- command back to the simpler:

                Code:
                local filenames: dir "." files " individual*.dta

                Comment

                Working...
                X