Error message during a loop "file XXX.txt not found" causing the loop to stop - STATA 14

Adrien Haidar

Join Date: Oct 2017
Posts: 14

#16

13 Oct 2017, 03:42

Alternatively and in order not to slow down too much the code I have implemented the following code:

Code:

di "https://www.youtube.com/watch?v=`video'"
                            cap copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace
                            
                            di _rc
                            gen rc = _rc

* If the copy fail, try again adding a sleep of 2sec.

                            if _rc != 0 {
                            
                            sleep 2000
                            cap copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace

* Apply remaining commands only if it succeed at the second trial, if not, go to the next video.
                            if _rc == 0 {
                            
                            import delimited using "`video'.txt",  stringcols(_all) delimiter("þ") varn(noname) clear

                            keep if strpos(v1, "watch-view-count") 
                            gen vues = substr(v1, strpos(v1,"watch-view-count"),.)
                                                        
                            drop v1
                            keep vues
                            gen video = "`video'"
                            gen time = c(current_time)
                            gen date = c(current_date)
                        
                            
                            append using "VA_`c(current_date)'"
                            order time video vues
                            save "VA_`c(current_date)'", replace
                            
                }
                }
                            
* If the copy succeed, apply remaining commands.
                            if rc ==0 {
                            
                            import delimited using "`video'.txt",  stringcols(_all) delimiter("þ") varn(noname) clear

                            keep if strpos(v1, "watch-view-count") 
                            gen vues = substr(v1, strpos(v1,"watch-view-count"),.)
                                                        
                            drop v1
                            keep vues
                            gen video = "`video'"
                            gen time = c(current_time)
                            gen date = c(current_date)
                        
                            
                            append using "VA_`c(current_date)'"
                            order time video vues
                            save "VA_`c(current_date)'", replace
                }

I do believe that this might be the best version (assuming that I haven't done any mistakes

). I'm running this code next to the previous one.

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#17

13 Oct 2017, 06:07

That looks like what I would have in mind. An alternative to

Code:

capture copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace display _rc

is

Code:

capture noisily copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace

which will capture the return code in _rc, prevent a non-zero return code from causing the do-file to stop, but will not suppress the display of the return code, or any other output from the copy command. When the return code is 0, of course, nothing will be displayed.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

#18

13 Oct 2017, 09:43

Here's an alternative approach that uses runby, a brand new command available from SSC that will be announced later today. With runby, you run a program on by-group data subsets. For this task, each subset contains a single observation with the video identifier. The program performs all the functions needed to obtain and extract the desired information from Youtube's response. Once the program terminates for the current by-group, what's left in memory is considered results and is stored. At the end, when all by-groups have been processed, these stored results are combined and replace the initial data in memory.

I'm using a temporary file to receive the response from Youtube. The example does not try to save it to a permanent file but it could easily be adjusted to do that if needed (see this recent hint from Alan Riley on how to perform a true file rename using the undocumented undocumented _renamefile command).

I could not get an error with the video identifiers that I'm including in this example so the code to retry looks right but has not been tested.

Code:

clear all

* Example generated by -dataex-. To install: ssc install dataex
clear
input str13 vname
"2PXEUsz6wHs"    
"Am9pavV7q2g"    
"9Z4s-bktMrY"    
"-RC_f4oEzHc"
"dgA3PoNiwbY"    
"j6_Y77uWtGw"    
"970T1Sd1thc"    
"NbC3VOOo1mo"    
"IfMuhob6EAU"    
"9uNpkeMlQE0"    
"2Ut97j6-lsE"    
"caca"
end

program get_videos
    local videoid = vname[1]
    local url "https://www.youtube.com/watch?v=`videoid'"
    dis as res "`url'"
    
    local more 1
    local try 0
    while `more' {
        tempfile f
        cap noi copy "`url'" "`f'"
        if _rc {
            local ++try
            if `try' < 5 {
                dis "trying again in 2 seconds..."
                sleep 2000
            }
            else local more 0
        }
        else {
            import delimited using "`f'",  stringcols(_all) delimiter("þ") varn(noname) clear
            keep if strpos(v1, "watch-view-count")
            gen vname = "`videoid'"
            gen vues = regexs(1) if regexm(v1,"([0-9,]+) views")
            keep vname vues
            gen time = c(current_time)
            gen date = c(current_date)
            local more 0
        }
    }
end

runby get_videos, by(vname) verbose

Code:

and here are the results
. list

     +--------------------------------------------------+
     |       vname        vues       time          date |
     |--------------------------------------------------|
  1. | -RC_f4oEzHc   1,662,880   11:42:08   13 Oct 2017 |
  2. | 2PXEUsz6wHs   2,650,547   11:42:09   13 Oct 2017 |
  3. | 2Ut97j6-lsE     546,559   11:42:09   13 Oct 2017 |
  4. | 970T1Sd1thc     550,435   11:42:09   13 Oct 2017 |
  5. | 9Z4s-bktMrY   5,516,395   11:42:10   13 Oct 2017 |
     |--------------------------------------------------|
  6. | 9uNpkeMlQE0     793,893   11:42:10   13 Oct 2017 |
  7. | Am9pavV7q2g   1,092,619   11:42:11   13 Oct 2017 |
  8. | IfMuhob6EAU     205,846   11:42:11   13 Oct 2017 |
  9. | NbC3VOOo1mo     557,074   11:42:11   13 Oct 2017 |
 10. | dgA3PoNiwbY   1,364,231   11:42:12   13 Oct 2017 |
     |--------------------------------------------------|
 11. | j6_Y77uWtGw     428,987   11:42:12   13 Oct 2017 |
     +--------------------------------------------------+

Last edited by Robert Picard; 13 Oct 2017, 10:04. Reason: Added results

Announcement

Comment

Comment

Comment