Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find word in a do file with an inline command

    I would like to give Stata the command to find a word inside a .do file.
    I know I could use Ctrl+F, but I would like to do it with an inline command.
    Is there such a command?

    This is the reason for doing it.
    I have a very large dataset, with many variables, that I would like to share with someone; with this person, I will share also the .do file that I use to conduct analyses on this dataset.
    To make the .dta file lighter, I would like to drop from this .dta all of the variables that I do not use in the .do file.
    So, my idea is to write a command (which loops over all the variables in the dataset) that would look for words (i.e. the names of the variables) into the .do file and, if Stata doesn't find this word, it will drop the corresponding variable.
    Any idea on how to do it?

  • #2
    You could read in your do file as a dataset with a single string variable and then you can use list to find relevant lines in the file as particular observations in your dataset. Or just count matches. Before you do that you need to store your variable names in a local macro.

    Finding out what's a variable and what isn't could be a harder task. Perhaps you're self-disciplined enough never to use variable name abbreviations and won't be bothered by mismatches whenever a variable name matches part of other text in a do-file.

    Comment


    • #3
      Many thanks, Nick, I will try this option.

      Comment


      • #4
        I should add another limitation to this approach. If your code uses any wildcard expressions to refer to groups of variables, then some or all of those variables will not literally be mentioned in the wildcard expression and you will not find them. For example, in the auto.dta, if a command refers to mpg-headroom, rep78 is implicitly included there, but not explicitly mentioned in the do-file text. Similarly, if a command refers to t*, turn and trunk are implied, but neither is explicitly mentioned.

        Comment


        • #5
          I found a possible solution, which seems to work only if I am "self-diligent" (to put it as Nick said) and it is based on PowerShell.
          First, I shall save the .do file as a .txt file.
          Then, I run the following code from within Stata.

          Code:
              cap erase "mypath\txtcom.ps1" //powershell (PS) file that will be read by PS
              cap erase "mypath\first-file.txt" //text file that will be produced by PS
              
              set trace on //just to verify possible issues (such as typos) in the code
              
              local vars "var1 var2 xxxx3 var4" //four example variables over which I run the following loop, xxxx3 is a variable that does not appear in the original .do file
          
              foreach var of local vars{
                  
              cap file close notepad    
              file open notepad using "mypath\txtcom.ps1", text write append
              file write notepad _n "if (Select-String -Path mypath\doFILEsaveAStxt.txt -Pattern " `""`var'""' "-SimpleMatch) "
                  
              file write notepad _n " {  "
              file write notepad _n " echo " `"`var'"' "`t" " | Out-File -FilePath mypath\first-file.txt" " -Append -NoNewLine} "
              file write notepad _n " else  "
              file write notepad _n " {  "
              file write notepad _n " echo " `"Not Contains String"'
              file write notepad _n " }  " //this is the code that will appear in txtcom.ps1
              
              file close notepad
          
              }
              
              winexec powershell.exe "mypath\txtcom.ps1"
          This way, first-file.txt will contain one line where all the variables that appear in the original .do file are separated by a 5-characters blank space (the default tab space in PowerShell `t).
          The output looks like:

          Code:
           var1     var2     var4
          Then, I should open the original .dta file and, in the command box, I should write "keep " followed the copy-pasted text from the .txt file, that is:

          Code:
           keep var1     var2     var4
          Caveats mentioned by Clyde applies, so one must--again--be self-diligent and double check the finale dataset a few times.
          In fact, it is better to save the new smaller dataset and run the entire original .do file to verify that the latter works with former.

          Sources being used for this code:
          https://stackoverflow.com/questions/...file-from-text
          https://stackoverflow.com/questions/...ng-in-textfile
          https://learn.microsoft.com/en-us/po...powershell-7.3
          https://learn.microsoft.com/en-us/po...powershell-7.3
          https://learn.microsoft.com/en-us/po...powershell-7.3
          https://learn.microsoft.com/en-us/po...powershell-7.3
          Last edited by FLuca; 27 Oct 2023, 08:58. Reason: included the example variable that does not exist in the original .do file

          Comment

          Working...
          X