Skip to content

Commit

Permalink
0.6.0: Add verify_page_bottom_n_times, file_buffering, Video Dura…
Browse files Browse the repository at this point in the history
…tion

- if you are an existing user, skim through the **BREAKING CHANGE** and **NON-BREAKING CHANGES** sections below
  - if you are a new user, you do not need to worry about these sections - just skip to the **NEW FEATURES** section at the bottom and read the python README to get started
- **BREAKING CHANGE**
  - the program now extracts the video duration for every video uploaded by a channel
    - this will likely cause problems when updating pre-existing `csv` files, since
      - the video duration information goes in a new column
      - `csv` file renderers expect consistent column formatting throughout the file
        - BUT a pre-existing csv file will only have the `Video Number,Video Title,Video URL,Watched,Watch again later,Notes` columns
        - so updating a pre-existing `csv` file will result in newly extracted videos having the `Video Number,Video Title,Video Duration,Video URL,Watched,Watch again later,Notes` columns while the already extracted videos will only have the `Video Number,Video Title,Video URL,Watched,Watch again later,Notes` columns (no `Video Duration` column)
        - therefore, updating a pre-existing csv file will result in the newly extracted videos having 7 columns, while pre-existing videos will have only 6 columns
    - **if you want to continue using your pre-existing csv file and do NOT WANT TO INCLUDE the video duration** for previously extracted videos:
      - **if you have NOT yet updated the pre-existing csv file:**
        - APPROACH 1: use a csv file editor such as Excel, Google Sheets, Numbers, IDE extension, etc.
          - open the csv file
          - insert the `Video Duration` column between the `Video Title` and `Video URL` columns
          - save the file
            - the csv editor should automatically format the existing rows to include the `Video Duration` column
            - therefore, all rows should now have an empty cell for the `Video Duration` column
        - APPROACH 2: use a simple text editor/IDE
          - open the csv file
          - insert the `Video Duration` column between the `Video Title` and `Video URL` columns
          - text editors will NOT automatically format the existing rows to include the `Video Duration` column
            - so you will need to manually format the existing rows to include the `Video Duration` column
            - the simplest way to do this would be to use a `Find and Replace` operation:
              - Find all occurrences of:         `,https://`
              - Replace with:                    `,,https://`
                - **this assumes the only urls in the csv file are in the `Video URL` column!**
                  - if you have manually added/modified parts of the file and this is no longer true, you will have to modify this approach slightly to meet your needs
      - **if you have ALREADY updated the pre-existing csv file:**
        - you will not be able to use APPROACH 1 from above
        - you will need to use APPROACH 2 with slight modifications:
          - Find all occurrences of (with regular expression mode enabled): `([^:][^\d]{2}),https://`
          - Replace with:                                              `$1,,https://` (depending on your editor, you may need to substitute `$1` with `\1` or something else)
            - looks for `,https://` where it is NOT preceeded with `:\d\d`
              - since the most recently extracted videos will have the video duration but the already existing videos will not have the video duration
              - so this only adds a comma for previously extracted videos without the video duration
              - as with APPROACH 1, **this also assumes the only urls in the csv file are in the `Video URL` column!**
                - if you have manually added/modified parts of the file and this is no longer true, you will also have to modify this approach slightly to meet your needs
          - if the file is a `chronological_videos_list` file (as opposed to a `reverse_chronological_videos_list` file):
            - you will ALSO need to insert the `Video Duration` column between the `Video Title` and `Video URL` columns in the csv header
              - since `chronological_videos_list` files use the csv header from the pre-existing csv file
                - NOTE the program updates the `reverse_chronological_videos_list` csv header every time the program looks for new videos when rerun on a previously scraped channel
                - but usually this csv header update is not noticeable since the header does not change
                - the csv header update is noticeable this time, however, since there is a new column (Video Duration)
                - for `chronological_videos_list` files, however, the program never updates the csv header
    - **if you want to continue using your pre-existing csv file and WANT TO INCLUDE the the video duration** for previously extracted videos:
      - rerun the program for the channel (in a different directory)
      - copy over any notes you took in the pre-existing file to the new file with the video duration information
    - **if you do NOT want/care about using the pre-existing csv file**
      - just delete the pre-existing csv file and rerun the program on the channel again (or run the program on the same channel from a different directory)
        - NOTE that if the channel deleted a video OR unlisted a video between
          - the time the video information was originally scraped
          - and you rerunning this after installing release `0.6.0+`
          - the deleted/unlisted video(s) will not show up (no workaround for this - this is how YouTube displays videos)
- **NON-BREAKING CHANGES**
  - `txt` and `md` files now also include the video duration information
    - this is simply an extra line in the output file, and will not cause any rendering issues since `txt` and `md` files do not depend on a consistent formatting the way `csv` files do
  - `txt` and `md` file now use slightly different formatting such as
    - fewer newlines
    - `md` files using `h3` headings for video information instead of bullet points (the bullet points were also improperly formatted previously, but since they are no longer used, this is not an issue)
  - NOTE that if you want these files to contain the video duration information, you will still need to rerun the program on the channel from scratch (either in a different directory, or after deleting the pre-existing files in the current directory)
- **NEW FEATURES**
  - `verify_page_bottom_n_times` attribute
    - for more information, see
      - commit a68f8f6
      - commit 5b361de
      - commit 916f050
      - commit 6a02bfe (documentation)
  - `file_buffering` attribute
    - for more information, see
      - commit 0730cdb
      - commit 38b8317 (documentation)
  • Loading branch information
shailshouryya committed Jul 19, 2021
1 parent 567d059 commit c8a9613
Show file tree
Hide file tree
Showing 5 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# General Overview

#### See the [releases](https://github.com/Shail-Shouryya/yt-videos-list/releases) page to see new additions/modifications for each release!
#### See this [comparison](https://github.com/Shail-Shouryya/yt-videos-list/compare/v0.5.9...main) page to see new additions/modifications that will be available in the NEXT release!
#### See this [comparison](https://github.com/Shail-Shouryya/yt-videos-list/compare/v0.6.0...main) page to see new additions/modifications that will be available in the NEXT release!

<details>
<summary><b>See sister <a href="https://github.com/Shail-Shouryya/YouTube-Channels">YouTube-Channels</a> repository for a list of interesting channels!</b></summary></h3>
Expand Down
2 changes: 1 addition & 1 deletion python/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Python Quick Start

#### See the [releases](https://github.com/Shail-Shouryya/yt-videos-list/releases) page to see new additions/modifications for each release!
#### See this [comparison](https://github.com/Shail-Shouryya/yt-videos-list/compare/v0.5.9...main) page to see new additions/modifications that will be available in the NEXT release!
#### See this [comparison](https://github.com/Shail-Shouryya/yt-videos-list/compare/v0.6.0...main) page to see new additions/modifications that will be available in the NEXT release!

<details>
<summary><b>See sister <a href="https://github.com/Shail-Shouryya/YouTube-Channels">YouTube-Channels</a> repository for a list of interesting channels!</b></summary></h3>
Expand Down
2 changes: 1 addition & 1 deletion python/dev/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from .custom_logger import log


__version__ = '0.5.9'
__version__ = '0.6.0'
__author__ = 'Shail-Shouryya'
__email__ = '[email protected]'
__development_status__ = '4 - Beta'
Expand Down
2 changes: 1 addition & 1 deletion python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

setup(
name = 'yt_videos_list',
version = '0.5.9',
version = '0.6.0',
description = 'YouTube bot to make a YouTube videos list (including all video titles and URLs uploaded by a channel) with end-to-end web scraping - no API tokens required. 🌟 Star this repo if you found it useful! 🌟',
long_description = long_description,
long_description_content_type = 'text/markdown',
Expand Down
2 changes: 1 addition & 1 deletion python/yt_videos_list/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
from .custom_logger import log


__version__ = '0.5.9'
__version__ = '0.6.0'
__author__ = 'Shail-Shouryya'
__email__ = '[email protected]'
__development_status__ = '4 - Beta'
Expand Down

0 comments on commit c8a9613

Please sign in to comment.