0.6.1: Change `create_list_for()` return, add features & improvements
-
BREAKING CHANGE
- BEFORE:
create_list_for()
returned astr
containing the name of the file the program wrote to
- NOW:
create_list_for()
returns atuple
containing- a
list
oflist
s containing the video information found by the program for the current run- by default, returns dummy video data to avoid cluttering the output
- to return the actual video data, set the
video_data_returned
ListCreator attribute toTrue
- dummy data:
[[0, '', '', '']]
- dummy data:
- a
tuple
containing astr
with the name of the channel (taken from the channel's heading) and astr
with the name of the file written to('The Channel Name', 'the_name_of_the_file')
('The Channel Name', '')
if the ListCreator attributes aretxt=False
,csv=False
,md=False
, ANDvideo_data_returned=True
- a
- see the NEW FEATURES section below for more details about
video_data_returned
- access the full documentation for the updated
create_list_for
method withhelp(ListCreator.create_list_for)
in the python interpreter
- BEFORE:
-
BUGFIX
- fixes
cookie_consent
blocking logic for new HTML in GDPR regions- YouTube updated the HTML formatting for blocking cookie consent, and the previous cookie consent blocking logic broke
- this release fixes the blocking logic to work with the new HTML formatting
- fixes
-
NEW FEATURES
- overview for the new ListCreator attributes given here, but run
help(ListCreator)
in the python interpreter or read the "More API information" section in the python README to see the full documentation:file_suffix
allows more control over the file naming (True
by default)all_video_data_in_memory
scrapes the ENTIRE YouTube channel's videos page, EVEN if files exist for the channel already (False
by default)- must also set the
video_data_returned
attribute toTrue
to actually get this information
- must also set the
video_data_returned
returns the video data for all videos the program scraped (False
by default)- data returned depends on a number of factors, see full documentation for more details
video_id_only
saves only the video ID instead of the entire URL (False
by default)- example: saves 'abcdefghijk' instead of 'https://www.youtube.com/watch?v=abcdefghijk'
- overview for the updated
file_name
argument options in thecreate_list_for
method given here, but runhelp(ListCreator.create_list_for)
in the python interpreter to see the full documentation:file_name='auto'
names the output file(s) using the name that shows up under the banner when you navigate to the channel's homepage (with spaces removed)file_name='id'
names the output file(s) using the identifier from the URL provided to theurl
argument- run
help(ListCreator.create_list_for)
for a comprehensive list of examples - using
file_name='id'
is very useful when multiple channels have the SAME channel name
- run
- overview for the new ListCreator attributes given here, but run
-
PERFORMANCE IMPROVEMENTS
- BEFORE:
- the program pulled the video data from the selenium instance and wrote to the file(s) directly
- NOW:
- the program loads the video data from the selenium instance into memory, THEN writes the saved video data from memory to the file(s)
- the performance improvement is more noticeable when writing more information
- for example:
- writing information for 200 videos to just a csv file: negligible performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
- writing information for 200 videos to csv, txt, md files: slight performance difference between writing to files directly and loading to memory & THEN writing to files, but still not much of a performance difference
- writing information for 20000 videos to just a csv file: noticeable performance difference between writing to csv file directly and loading to memory & THEN writing to csv file
- writing information for 20000 videos to csv, txt, md files: significant performance difference between writing to to files directly and loading to memory & THEN writing to files
- summary:
- the performance difference between writing to ONE file directly and loading to memory & THEN writing to ONE file is barely noticeable for small jobs and more noticeable for larger jobs
- the performance difference between writing to MULTIPLE files directly and loading to memory & THEN writing to MULTIPLE file is more noticeable for small jobs (compared to writing to only ONE file) and SIGNIFICANT for larger jobs
- for example:
- the performance improvement is more noticeable when writing more information
- the program loads the video data from the selenium instance into memory, THEN writes the saved video data from memory to the file(s)
- logs from tests used to benchmark performance included below:
- BEFORE:
See logs
for https://www.youtube.com/user/schafer5 (small channel, 230 videos)
writing to 1 file directly with csv=True, txt=False, md=False
- to create the file:
It took 9.240757292005583 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.265756259999762 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.537945401003526 seconds to complete.
- to update the file:
It took 0.8453300589972059 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 0.6392399440010195 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.754261410002073 seconds to complete.
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the file:
It took 9.163404727999989 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.260267737000007 seconds to load information for 230 videos into memory
It took 0.002389371999996115 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
This program took 19.483281371000004 seconds to complete.
- to update the file:
It took 0.8521808300000089 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0964175420000117 seconds to load information for 60 videos into memory
It took 0.0015745449999826633 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
This program took 7.985743492000012 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
- to create the files:
It took 9.166668037003546 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 10.160974278995127 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
It took 10.164936708999448 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 10.168633003995637 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
This program took 25.594990328005224 seconds to complete.
- to update the files:
It took 0.8503098270011833 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.5225159670007997 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 1.5322243859991431 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 1.5359413480036892 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.472728426997492 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the files:
It took 9.367390958000005 seconds to find 230 videos from https://www.youtube.com/user/schafer5/videos
It took 4.218187391999997 seconds to load information for 230 videos into memory
It took 0.003894963000000473 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.md
It took 0.005060710999998719 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.csv
It took 0.006283445999997639 seconds to write all 230 videos to CoreySchafer_reverse_chronological_videos_list.txt
This program took 18.754924324 seconds to complete.
- to update the files:
It took 0.8672965029999986 seconds to find 60 videos from https://www.youtube.com/user/schafer5/videos
It took 1.0901944209999996 seconds to load information for 60 videos into memory
It took 0.005667658999996661 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.csv
It took 0.008393589000000645 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.txt
It took 0.008197031000001687 seconds to write the 0 ***NEW*** videos to the pre-existing CoreySchafer_reverse_chronological_videos_list.md
This program took 8.090583961999997 seconds to complete.
for https://www.youtube.com/c/KhanAcademy (medium channel, 8095 videos)
writing to 1 file directly with csv=True, txt=False, md=False
- to create the file:
It took 322.72226654399856 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 256.63442500399833 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 585.4076739919983 seconds to complete.
- to update the file:
It took 0.8482559289986966 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.5600300389996846 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 7.653723870003887 seconds to complete.
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the file:
It took 316.9717323640002 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 248.92245618300012 seconds to load information for 8095 videos into memory
It took 0.07691853599999376 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
This program took 572.114162118 seconds to complete.
- to update the file:
It took 0.8459371520000332 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 0.9670944140000302 seconds to load information for 60 videos into memory
It took 0.02941359300007207 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
This program took 8.209143252000104 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
- to create the files:
It took 314.01985485899786 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 519.1903085960002 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
It took 519.1941804189992 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 519.197644068001 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
This program took 839.4073893879977 seconds to complete.
- to update the files:
It took 0.8488957250010571 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.580211615000735 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 1.681963879003888 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 1.6842712280049454 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.823843261001457 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the files:
It took 316.342601403 seconds to find 8095 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 261.87072707100003 seconds to load information for 8095 videos into memory
It took 0.1363127509999913 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.csv
It took 0.1775351439999895 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.md
It took 0.18588107000005039 seconds to write all 8095 videos to KhanAcademy_reverse_chronological_videos_list.txt
This program took 584.703847726 seconds to complete.
- to update the files:
It took 0.8483775499998956 seconds to find 60 videos from https://www.youtube.com/c/KhanAcademy/videos
It took 1.0671216570001434 seconds to load information for 60 videos into memory
It took 0.17331316700006028 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.csv
It took 0.22995445900005507 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.txt
It took 0.23345572800008085 seconds to write the 0 ***NEW*** videos to the pre-existing KhanAcademy_reverse_chronological_videos_list.md
This program took 8.503321469999833 seconds to complete.
for https://www.youtube.com/user/NBCNews/videos (large channel, ~32550 videos)
writing to 1 file directly with csv=True, txt=False, md=False
- to create the file:
It took 3420.0639533489993 seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 4988.648231769999 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8414.909623333002 seconds to complete.
- to update the file:
# forgot to run this test :D
writing to 1 file by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the file:
It took 3367.386001154002 seconds to find 32357 videos from https://www.youtube.com/user/NBCNews/videos
It took 4880.191474030002 seconds to load information for 32357 videos into memory
It took 0.24478799300050014 seconds to write all 32357 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 8253.73690525 seconds to complete.
- to update the file:
It took 0.8474488579995523 seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1012943870009622 seconds to load information for 60 videos into memory
It took 0.11654774600174278 seconds to write the 5 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
This program took 8.668505469999218 seconds to complete.
writing to 3 files directly with csv=True, txt=True, md=True
- to create the files:
It took 3396.025502143 seconds to find 32347 videos from https://www.youtube.com/user/NBCNews/videos
It took 7683.585577874001 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.txt
It took 7683.592947972 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.md
It took 7684.030176524999 seconds to write all 32347 videos to NBCNews_reverse_chronological_videos_list.csv
This program took 11086.336240618999 seconds to complete.
- to update the files:
It took 0.8738655359993572 seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.8775347520004289 seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 2.120259861001614 seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 2.132926509999379 seconds to write the 0 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.435579917999348 seconds to complete.
writing to 3 files by loading video information into memory THEN writing to files with csv=True, txt=True, md=True
- to create the files:
It took 3478.1540728540003 seconds to find 32353 videos from https://www.youtube.com/user/NBCNews/videos
It took 5022.493407319 seconds to load information for 32353 videos into memory
It took 0.5065521739998076 seconds to write the 6 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.587243801997829 seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.txt
It took 0.6058889249979984 seconds to write all 32353 videos to NBCNews_reverse_chronological_videos_list.md
This program took 8507.703900004002 seconds to complete.
- to update the files:
It took 0.8569685050024418 seconds to find 60 videos from https://www.youtube.com/user/NBCNews/videos
It took 1.1060196290018212 seconds to load information for 60 videos into memory
It took 0.5880495099991094 seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.csv
It took 0.8386826800015115 seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.txt
It took 0.8496009250011411 seconds to write the 4 ***NEW*** videos to the pre-existing NBCNews_reverse_chronological_videos_list.md
This program took 9.45503293100046 seconds to complete.