Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
0.6.0: Add
verify_page_bottom_n_times
, file_buffering
, Video Dura…
…tion - if you are an existing user, skim through the **BREAKING CHANGE** and **NON-BREAKING CHANGES** sections below - if you are a new user, you do not need to worry about these sections - just skip to the **NEW FEATURES** section at the bottom and read the python README to get started - **BREAKING CHANGE** - the program now extracts the video duration for every video uploaded by a channel - this will likely cause problems when updating pre-existing `csv` files, since - the video duration information goes in a new column - `csv` file renderers expect consistent column formatting throughout the file - BUT a pre-existing csv file will only have the `Video Number,Video Title,Video URL,Watched,Watch again later,Notes` columns - so updating a pre-existing `csv` file will result in newly extracted videos having the `Video Number,Video Title,Video Duration,Video URL,Watched,Watch again later,Notes` columns while the already extracted videos will only have the `Video Number,Video Title,Video URL,Watched,Watch again later,Notes` columns (no `Video Duration` column) - therefore, updating a pre-existing csv file will result in the newly extracted videos having 7 columns, while pre-existing videos will have only 6 columns - **if you want to continue using your pre-existing csv file and do NOT WANT TO INCLUDE the video duration** for previously extracted videos: - **if you have NOT yet updated the pre-existing csv file:** - APPROACH 1: use a csv file editor such as Excel, Google Sheets, Numbers, IDE extension, etc. - open the csv file - insert the `Video Duration` column between the `Video Title` and `Video URL` columns - save the file - the csv editor should automatically format the existing rows to include the `Video Duration` column - therefore, all rows should now have an empty cell for the `Video Duration` column - APPROACH 2: use a simple text editor/IDE - open the csv file - insert the `Video Duration` column between the `Video Title` and `Video URL` columns - text editors will NOT automatically format the existing rows to include the `Video Duration` column - so you will need to manually format the existing rows to include the `Video Duration` column - the simplest way to do this would be to use a `Find and Replace` operation: - Find all occurrences of: `,https://` - Replace with: `,,https://` - **this assumes the only urls in the csv file are in the `Video URL` column!** - if you have manually added/modified parts of the file and this is no longer true, you will have to modify this approach slightly to meet your needs - **if you have ALREADY updated the pre-existing csv file:** - you will not be able to use APPROACH 1 from above - you will need to use APPROACH 2 with slight modifications: - Find all occurrences of (with regular expression mode enabled): `([^:][^\d]{2}),https://` - Replace with: `$1,,https://` (depending on your editor, you may need to substitute `$1` with `\1` or something else) - looks for `,https://` where it is NOT preceeded with `:\d\d` - since the most recently extracted videos will have the video duration but the already existing videos will not have the video duration - so this only adds a comma for previously extracted videos without the video duration - as with APPROACH 1, **this also assumes the only urls in the csv file are in the `Video URL` column!** - if you have manually added/modified parts of the file and this is no longer true, you will also have to modify this approach slightly to meet your needs - if the file is a `chronological_videos_list` file (as opposed to a `reverse_chronological_videos_list` file): - you will ALSO need to insert the `Video Duration` column between the `Video Title` and `Video URL` columns in the csv header - since `chronological_videos_list` files use the csv header from the pre-existing csv file - NOTE the program updates the `reverse_chronological_videos_list` csv header every time the program looks for new videos when rerun on a previously scraped channel - but usually this csv header update is not noticeable since the header does not change - the csv header update is noticeable this time, however, since there is a new column (Video Duration) - for `chronological_videos_list` files, however, the program never updates the csv header - **if you want to continue using your pre-existing csv file and WANT TO INCLUDE the the video duration** for previously extracted videos: - rerun the program for the channel (in a different directory) - copy over any notes you took in the pre-existing file to the new file with the video duration information - **if you do NOT want/care about using the pre-existing csv file** - just delete the pre-existing csv file and rerun the program on the channel again (or run the program on the same channel from a different directory) - NOTE that if the channel deleted a video OR unlisted a video between - the time the video information was originally scraped - and you rerunning this after installing release `0.6.0+` - the deleted/unlisted video(s) will not show up (no workaround for this - this is how YouTube displays videos) - **NON-BREAKING CHANGES** - `txt` and `md` files now also include the video duration information - this is simply an extra line in the output file, and will not cause any rendering issues since `txt` and `md` files do not depend on a consistent formatting the way `csv` files do - `txt` and `md` file now use slightly different formatting such as - fewer newlines - `md` files using `h3` headings for video information instead of bullet points (the bullet points were also improperly formatted previously, but since they are no longer used, this is not an issue) - NOTE that if you want these files to contain the video duration information, you will still need to rerun the program on the channel from scratch (either in a different directory, or after deleting the pre-existing files in the current directory) - **NEW FEATURES** - `verify_page_bottom_n_times` attribute - for more information, see - commit a68f8f6 - commit 5b361de - commit 916f050 - commit 6a02bfe (documentation) - `file_buffering` attribute - for more information, see - commit 0730cdb - commit 38b8317 (documentation)
- Loading branch information