Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Alternatively and in order not to slow down too much the code I have implemented the following code:

    Code:
    di "https://www.youtube.com/watch?v=`video'"
                                cap copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace
                                
                                di _rc
                                gen rc = _rc
    
    * If the copy fail, try again adding a sleep of 2sec.
    
                                if _rc != 0 {
                                
                                sleep 2000
                                cap copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace
    
    * Apply remaining commands only if it succeed at the second trial, if not, go to the next video.
                                if _rc == 0 {
                                
                                import delimited using "`video'.txt",  stringcols(_all) delimiter("þ") varn(noname) clear
    
                                keep if strpos(v1, "watch-view-count") 
                                gen vues = substr(v1, strpos(v1,"watch-view-count"),.)
                                                            
                                drop v1
                                keep vues
                                gen video = "`video'"
                                gen time = c(current_time)
                                gen date = c(current_date)
                            
                                
                                append using "VA_`c(current_date)'"
                                order time video vues
                                save "VA_`c(current_date)'", replace
                                
                    }
                    }
                                
    * If the copy succeed, apply remaining commands.
                                if rc ==0 {
                                
                                import delimited using "`video'.txt",  stringcols(_all) delimiter("þ") varn(noname) clear
    
                                keep if strpos(v1, "watch-view-count") 
                                gen vues = substr(v1, strpos(v1,"watch-view-count"),.)
                                                            
                                drop v1
                                keep vues
                                gen video = "`video'"
                                gen time = c(current_time)
                                gen date = c(current_date)
                            
                                
                                append using "VA_`c(current_date)'"
                                order time video vues
                                save "VA_`c(current_date)'", replace
                    }
    I do believe that this might be the best version (assuming that I haven't done any mistakes ). I'm running this code next to the previous one.

    Comment


    • #17
      That looks like what I would have in mind. An alternative to
      Code:
      capture copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace
      display _rc
      is
      Code:
      capture noisily copy "https://www.youtube.com/watch?v=`video'" "`video'.txt", replace
      which will capture the return code in _rc, prevent a non-zero return code from causing the do-file to stop, but will not suppress the display of the return code, or any other output from the copy command. When the return code is 0, of course, nothing will be displayed.

      Comment


      • #18
        Here's an alternative approach that uses runby, a brand new command available from SSC that will be announced later today. With runby, you run a program on by-group data subsets. For this task, each subset contains a single observation with the video identifier. The program performs all the functions needed to obtain and extract the desired information from Youtube's response. Once the program terminates for the current by-group, what's left in memory is considered results and is stored. At the end, when all by-groups have been processed, these stored results are combined and replace the initial data in memory.

        I'm using a temporary file to receive the response from Youtube. The example does not try to save it to a permanent file but it could easily be adjusted to do that if needed (see this recent hint from Alan Riley on how to perform a true file rename using the undocumented undocumented _renamefile command).

        I could not get an error with the video identifiers that I'm including in this example so the code to retry looks right but has not been tested.

        Code:
        clear all
        
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str13 vname
        "2PXEUsz6wHs"    
        "Am9pavV7q2g"    
        "9Z4s-bktMrY"    
        "-RC_f4oEzHc"
        "dgA3PoNiwbY"    
        "j6_Y77uWtGw"    
        "970T1Sd1thc"    
        "NbC3VOOo1mo"    
        "IfMuhob6EAU"    
        "9uNpkeMlQE0"    
        "2Ut97j6-lsE"    
        "caca"
        end
        
        program get_videos
            local videoid = vname[1]
            local url "https://www.youtube.com/watch?v=`videoid'"
            dis as res "`url'"
            
            local more 1
            local try 0
            while `more' {
                tempfile f
                cap noi copy "`url'" "`f'"
                if _rc {
                    local ++try
                    if `try' < 5 {
                        dis "trying again in 2 seconds..."
                        sleep 2000
                    }
                    else local more 0
                }
                else {
                    import delimited using "`f'",  stringcols(_all) delimiter("þ") varn(noname) clear
                    keep if strpos(v1, "watch-view-count")
                    gen vname = "`videoid'"
                    gen vues = regexs(1) if regexm(v1,"([0-9,]+) views")
                    keep vname vues
                    gen time = c(current_time)
                    gen date = c(current_date)
                    local more 0
                }
            }
        end
        
        runby get_videos, by(vname) verbose
        Code:
        and here are the results
        . list
        
             +--------------------------------------------------+
             |       vname        vues       time          date |
             |--------------------------------------------------|
          1. | -RC_f4oEzHc   1,662,880   11:42:08   13 Oct 2017 |
          2. | 2PXEUsz6wHs   2,650,547   11:42:09   13 Oct 2017 |
          3. | 2Ut97j6-lsE     546,559   11:42:09   13 Oct 2017 |
          4. | 970T1Sd1thc     550,435   11:42:09   13 Oct 2017 |
          5. | 9Z4s-bktMrY   5,516,395   11:42:10   13 Oct 2017 |
             |--------------------------------------------------|
          6. | 9uNpkeMlQE0     793,893   11:42:10   13 Oct 2017 |
          7. | Am9pavV7q2g   1,092,619   11:42:11   13 Oct 2017 |
          8. | IfMuhob6EAU     205,846   11:42:11   13 Oct 2017 |
          9. | NbC3VOOo1mo     557,074   11:42:11   13 Oct 2017 |
         10. | dgA3PoNiwbY   1,364,231   11:42:12   13 Oct 2017 |
             |--------------------------------------------------|
         11. | j6_Y77uWtGw     428,987   11:42:12   13 Oct 2017 |
             +--------------------------------------------------+
        Last edited by Robert Picard; 13 Oct 2017, 10:04. Reason: Added results

        Comment

        Working...
        X