Skip to content

0.6.1: Change `create_list_for()` return, add features & improvements

Compare
Choose a tag to compare
@shailshouryya shailshouryya released this 07 Sep 03:30
· 396 commits to main since this release
f8ca4a6
  • BREAKING CHANGE

    • BEFORE:
      • create_list_for() returned a str containing the name of the file the program wrote to
    • NOW:
      • create_list_for() returns a tuple containing
        • a list of lists containing the video information found by the program for the current run
          • by default, returns dummy video data to avoid cluttering the output
          • to return the actual video data, set the video_data_returned ListCreator attribute to True
            • dummy data: [[0, '', '', '']]
        • a tuple containing a str with the name of the channel (taken from the channel's heading) and a str with the name of the file written to
          • ('The Channel Name', 'the_name_of_the_file')
          • ('The Channel Name', '') if the ListCreator attributes are txt=False, csv=False, md=False, AND video_data_returned=True
      • see the NEW FEATURES section below for more details about video_data_returned
    • access the full documentation for the updated create_list_for method with help(ListCreator.create_list_for) in the python interpreter
  • BUGFIX

    • fixes cookie_consent blocking logic for new HTML in GDPR regions
      • YouTube updated the HTML formatting for blocking cookie consent, and the previous cookie consent blocking logic broke
      • this release fixes the blocking logic to work with the new HTML formatting
  • NEW FEATURES

    • overview for the new ListCreator attributes given here, but run help(ListCreator) in the python interpreter or read the "More API information" section in the python README to see the full documentation:
      • file_suffix allows more control over the file naming (True by default)
      • all_video_data_in_memory scrapes the ENTIRE YouTube channel's videos page, EVEN if files exist for the channel already (False by default)
        • must also set the video_data_returned attribute to True to actually get this information
      • video_data_returned returns the video data for all videos the program scraped (False by default)
        • data returned depends on a number of factors, see full documentation for more details
      • video_id_only saves only the video ID instead of the entire URL (False by default)
    • overview for the updated file_name argument options in the create_list_for method given here, but run help(ListCreator.create_list_for) in the python interpreter to see the full documentation:
      • file_name='auto' names the output file(s) using the name that shows up under the banner when you navigate to the channel's homepage (with spaces removed)
      • file_name='id' names the output file(s) using the identifier from the URL provided to the url argument
        • run help(ListCreator.create_list_for) for a comprehensive list of examples
        • using file_name='id' is very useful when multiple channels have the SAME channel name
  • PERFORMANCE IMPROVEMENTS

    • BEFORE:
      • the program pulled the video data from the selenium instance and wrote to the file(s) directly
    • NOW:
      • the program loads the video data from the selenium instance into memory, THEN writes the saved video data from memory to the file(s)
        • the performance improvement is more noticeable when writing more information
          • for example:
            • writing information for 200 videos to just a csv file: negligible performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
            • writing information for 200 videos to csv, txt, md files: slight performance difference between writing to files directly and loading to memory & THEN writing to files, but still not much of a performance difference
            • writing information for 20000 videos to just a csv file: noticeable performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
            • writing information for 20000 videos to csv, txt, md files: significant performance difference between writing to to files directly and loading to memory & THEN writing to files
          • summary:
            • the performance difference between writing to ONE file directly and loading to memory & THEN writing to ONE file is barely noticeable for small jobs and more noticeable for larger jobs
            • the performance difference between writing to MULTIPLE files directly and loading to memory & THEN writing to MULTIPLE file is more noticeable for small jobs (compared to writing to only ONE file) and SIGNIFICANT for larger jobs
    • logs from tests used to benchmark performance included below:
See logs
for https://www.youtube.com/user/schafer5 (small channel, 230 videos)
writing to 1 file directly with csv=True, txt=False, md=False
  • to create the file:
It took 9.240757292005583            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.265756259999762            seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.537945401003526 seconds to complete.
  • to update the file:
It took 0.8453300589972059          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 0.6392399440010195          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.754261410002073 seconds to complete.
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the file:
It took 9.163404727999989            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.260267737000007            seconds to load information for 230 videos into memory
It took 0.002389371999996115         seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.483281371000004 seconds to complete.
  • to update the file:
It took 0.8521808300000089          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0964175420000117          seconds to load information for 60 videos into memory
It took 0.0015745449999826633       seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.985743492000012 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
  • to create the files:
It took 9.166668037003546            seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 10.160974278995127           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
It took 10.164936708999448           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 10.168633003995637           seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
This program took 25.594990328005224 seconds to complete.
  • to update the files:
It took 0.8503098270011833          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.5225159670007997          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 1.5322243859991431          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 1.5359413480036892          seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.472728426997492 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the files:
It took 9.367390958000005      seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.218187391999997      seconds to load information for 230 videos into memory
It took 0.003894963000000473   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
It took 0.005060710999998719   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 0.006283445999997639   seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
This program took 18.754924324 seconds to complete.
  • to update the files:
It took 0.8672965029999986          seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0901944209999996          seconds to load information for 60 videos into memory
It took 0.005667658999996661        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 0.008393589000000645        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 0.008197031000001687        seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.090583961999997 seconds to complete.
for https://www.youtube.com/c/KhanAcademy (medium channel, 8095 videos)
writing to 1 file directly with csv=True, txt=False, md=False
  • to create the file:
It took 322.72226654399856          seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 256.63442500399833          seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 585.4076739919983 seconds to complete.
  • to update the file:
It took 0.8482559289986966          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.5600300389996846          seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 7.653723870003887 seconds to complete.
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the file:
It took 316.9717323640002       seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 248.92245618300012      seconds to load information for 8095 videos into memory
It took 0.07691853599999376     seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 572.114162118 seconds to complete.
  • to update the file:
It took 0.8459371520000332          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.9670944140000302          seconds to load information for 60 videos into memory
It took 0.02941359300007207         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 8.209143252000104 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
  • to create the files:
It took 314.01985485899786          seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 519.1903085960002           seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
It took 519.1941804189992           seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 519.197644068001            seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
This program took 839.4073893879977 seconds to complete.
  • to update the files:
It took 0.8488957250010571          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.580211615000735           seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 1.681963879003888           seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 1.6842712280049454          seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.823843261001457 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the files:
It took 316.342601403           seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 261.87072707100003      seconds to load information for 8095 videos into memory
It took 0.1363127509999913      seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 0.1775351439999895      seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
It took 0.18588107000005039     seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
This program took 584.703847726 seconds to complete.
  • to update the files:
It took 0.8483775499998956          seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.0671216570001434          seconds to load information for 60 videos into memory
It took 0.17331316700006028         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 0.22995445900005507         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 0.23345572800008085         seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.503321469999833 seconds to complete.
for https://www.youtube.com/user/NBCNews/videos (large channel, ~32550 videos)
writing to 1 file directly with csv=True, txt=False, md=False
  • to create the file:
It took 3420.0639533489993          seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 4988.648231769999           seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8414.909623333002 seconds to complete.
  • to update the file:
# forgot to run this test :D
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the file:
It took 3367.386001154002       seconds to find 32357 videos from https://www.youtube.com/user/NBCNews/videos
It took 4880.191474030002       seconds to load information for 32357 videos into memory
It took 0.24478799300050014     seconds to write all 32357 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8253.73690525 seconds to complete.
  • to update the file:
It took 0.8474488579995523          seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1012943870009622          seconds to load information for 60 videos into memory
It took 0.11654774600174278         seconds to write the 5 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
This program took 8.668505469999218 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
  • to create the files:
It took 3396.025502143               seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 7683.585577874001            seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.txt
It took 7683.592947972               seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.md
It took 7684.030176524999            seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 11086.336240618999 seconds to complete.
  • to update the files:
It took 0.8738655359993572          seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.8775347520004289          seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 2.120259861001614           seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 2.132926509999379           seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.435579917999348 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
  • to create the files:
It took 3478.1540728540003          seconds to find 32353 videos from https://www.youtube.com/user/NBCNews/videos
It took 5022.493407319              seconds to load information for 32353 videos into memory
It took 0.5065521739998076          seconds to write the 6 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.587243801997829           seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.txt
It took 0.6058889249979984          seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.md
This program took 8507.703900004002 seconds to complete.
  • to update the files:
It took 0.8569685050024418         seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1060196290018212         seconds to load information for 60 videos into memory
It took 0.5880495099991094         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.8386826800015115         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 0.8496009250011411         seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.45503293100046 seconds to complete.