-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add/asyncio aiohttp acceleration #52
base: master
Are you sure you want to change the base?
Changes from 7 commits
13e774b
79589f5
b170e31
753e799
13b5dcd
263dce8
1477482
9196853
595a5dc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -1,6 +1,6 @@ | ||||||
<div style="text-align:center"><img src="https://raw.githubusercontent.com/urlstechie/urlchecker-python/master/docs/urlstechie.png"/></div> | ||||||
|
||||||
[![Build Status](https://travis-ci.com/urlstechie/urlchecker-python.svg?branch=master)](https://travis-ci.com/urlstechie/urlchecker-python) [![Documentation Status](https://readthedocs.org/projects/urlchecker-python/badge/?version=latest)](https://urlchecker-python.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/urlstechie/urlchecker-python/branch/master/graph/badge.svg)](https://codecov.io/gh/urlstechie/urlchecker-python) [![Python](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue)](https://www.python.org/doc/versions/) [![CodeFactor](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python/badge)](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python) ![PyPI](https://img.shields.io/pypi/v/urlchecker) [![Downloads](https://pepy.tech/badge/urlchecker)](https://pepy.tech/project/urlchecker) [![License](https://img.shields.io/badge/license-MIT-brightgreen)](https://github.com/urlstechie/urlchecker-python/blob/master/LICENSE) | ||||||
[![Build Status](https://travis-ci.com/urlstechie/urlchecker-python.svg?branch=master)](https://travis-ci.com/urlstechie/urlchecker-python) [![Documentation Status](https://readthedocs.org/projects/urlchecker-python/badge/?version=latest)](https://urlchecker-python.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/urlstechie/urlchecker-python/branch/master/graph/badge.svg)](https://codecov.io/gh/urlstechie/urlchecker-python) [![Python](https://img.shields.io/badge/python-3.5%20|%203.6%20|%203.7%20|%203.8%20|%203.9-blue)](https://www.python.org/doc/versions/) [![CodeFactor](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python/badge)](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python) [![PyPI version](https://badge.fury.io/py/urlchecker.svg)](https://badge.fury.io/py/urlchecker) [![Downloads](https://pepy.tech/badge/urlchecker)](https://pepy.tech/project/urlchecker) [![License](https://img.shields.io/badge/license-MIT-brightgreen)](https://github.com/urlstechie/urlchecker-python/blob/master/LICENSE) | ||||||
|
||||||
|
||||||
# urlchecker-python | ||||||
|
@@ -10,6 +10,11 @@ and then test for and report broken links. If you are interesting in using | |||||
this as a GitHub action, see [urlchecker-action](https://github.com/urlstechie/urlchecker-action). There are also container | ||||||
bases available on [quay.io/urlstechie/urlchecker](https://quay.io/repository/urlstechie/urlchecker?tab=tags). | ||||||
|
||||||
## Module Dependencies | ||||||
**Versions <= 0.0.22** are built around the [Requests](https://requests.readthedocs.io/en/master/) library whereas | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
**versions >= 0.0.23** are built around the [asyncio](https://docs.python.org/3/library/asyncio.html) and the [AIOHTTP](https://docs.aiohttp.org/en/stable/) libraries. | ||||||
|
||||||
|
||||||
## Module Documentation | ||||||
|
||||||
A detailed documentation of the code is available under [urlchecker-python.readthedocs.io](https://urlchecker-python.readthedocs.io/en/latest/) | ||||||
|
@@ -88,7 +93,7 @@ optional arguments: | |||||
--save SAVE Path to a csv file to save results to. | ||||||
--retry-count RETRY_COUNT | ||||||
retry count upon failure (defaults to 2, one retry). | ||||||
--timeout TIMEOUT timeout (seconds) to provide to the requests library | ||||||
--timeout TIMEOUT timeout (minutes) to provide to the aiohttp library | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should not the change the API here - if the user provides seconds, we should just convert to minutes for the library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That makes sense. I tried to keep the numbers the same, that's why I only changed the unit, but I will fix this. |
||||||
(defaults to 5) | ||||||
``` | ||||||
|
||||||
|
@@ -121,7 +126,7 @@ $ urlchecker check . | |||||
save: None | ||||||
timeout: 5 | ||||||
|
||||||
/tmp/urlchecker-action/README.md | ||||||
/tmp/urlchecker-action/README.md | ||||||
-------------------------------- | ||||||
https://github.com/urlstechie/urlchecker-action/blob/master/LICENSE | ||||||
https://github.com/r-hub/docs/blob/bc1eac71206f7cb96ca00148dcf3b46c6d25ada4/.github/workflows/pr.yml | ||||||
|
@@ -152,7 +157,7 @@ https://github.com/SuperKogito/Voice-based-gender-recognition/issues | |||||
https://github.com/buildtesters/buildtest/blob/v0.9.1/.github/workflows/urlchecker.yml | ||||||
https://github.com/berlin-hack-and-tell/berlinhackandtell.rocks/blob/master/.github/workflows/urlchecker-pr-label.yml | ||||||
|
||||||
/tmp/urlchecker-action/examples/README.md | ||||||
/tmp/urlchecker-action/examples/README.md | ||||||
----------------------------------------- | ||||||
https://github.com/urlstechie/urlchecker-action/releases | ||||||
https://github.com/urlstechie/urlchecker-action/issues | ||||||
|
@@ -184,7 +189,7 @@ $ urlchecker check --exclude-pattern SuperKogito . | |||||
save: None | ||||||
timeout: 5 | ||||||
|
||||||
/tmp/urlchecker-action/README.md | ||||||
/tmp/urlchecker-action/README.md | ||||||
-------------------------------- | ||||||
https://github.com/urlstechie/urlchecker-action/blob/master/LICENSE | ||||||
https://github.com/urlstechie/urlchecker-action/issues | ||||||
|
@@ -212,7 +217,7 @@ https://github.com/berlin-hack-and-tell/berlinhackandtell.rocks/actions?query=wo | |||||
https://github.com/USRSE/usrse.github.io | ||||||
https://github.com/rseng/awesome-rseng/blob/5f5cb78f8392cf10aec2f3952b305ae9611029c2/.github/workflows/urlchecker.yml | ||||||
|
||||||
/tmp/urlchecker-action/examples/README.md | ||||||
/tmp/urlchecker-action/examples/README.md | ||||||
----------------------------------------- | ||||||
https://help.github.com/en/actions/reference/events-that-trigger-workflows | ||||||
https://github.com/urlstechie/urlchecker-action/issues | ||||||
|
@@ -386,32 +391,32 @@ You can look at `checker.checks`, which is a dictionary of result objects, | |||||
organized by the filename: | ||||||
|
||||||
```python | ||||||
for file_name, result in checker.checks.items(): | ||||||
print() | ||||||
print(result) | ||||||
print("Total Results: %s " % result.count) | ||||||
print("Total Failed: %s" % len(result.failed)) | ||||||
print("Total Passed: %s" % len(result.passed)) | ||||||
for file_name, result in checker.checks.items(): | ||||||
print() | ||||||
print(result) | ||||||
print("Total Results: %s " % result.count) | ||||||
print("Total Failed: %s" % len(result.failed)) | ||||||
print("Total Passed: %s" % len(result.passed)) | ||||||
|
||||||
... | ||||||
|
||||||
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/tests/test_files/sample_test_file.md | ||||||
Total Results: 26 | ||||||
Total Results: 26 | ||||||
Total Failed: 6 | ||||||
Total Passed: 20 | ||||||
|
||||||
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.pytest_cache/README.md | ||||||
Total Results: 1 | ||||||
Total Results: 1 | ||||||
Total Failed: 0 | ||||||
Total Passed: 1 | ||||||
|
||||||
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.eggs/pytest_runner-5.2-py3.7.egg/ptr.py | ||||||
Total Results: 0 | ||||||
Total Results: 0 | ||||||
Total Failed: 0 | ||||||
Total Passed: 0 | ||||||
|
||||||
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/docs/source/conf.py | ||||||
Total Results: 3 | ||||||
Total Results: 3 | ||||||
Total Failed: 0 | ||||||
Total Passed: 3 | ||||||
``` | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
asyncio==3.4.3 | ||
aiohttp==3.7.3 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
[DEFAULT] | ||
git_path_test_value = https://github.com/urlstechie/urlchecker-test-repo | ||
file_types_test_values = .md,.py,.c,.txt | ||
file_types_test_values = .md,.c,.txt | ||
exclude_test_urls = https://github.com/SuperKogito/URLs-checker/issues/2,https://github.com/SuperKogito/URLs-checker/issues/3 | ||
exclude_test_patterns = https://github.com/SuperKogito/Voice-based-gender-recognition/issues,https://img.shields.io/ |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -31,9 +31,9 @@ def test_check_file_type(file_path, file_types): | |||||
|
||||||
@pytest.mark.parametrize( | ||||||
"file_path", | ||||||
["tests/test_files/sample_test_file.md", "tests/test_files/sample_test_file.py"], | ||||||
["tests/test_files/sample_test_file.txt", "tests/test_files/sample_test_file.py"], | ||||||
) | ||||||
@pytest.mark.parametrize("file_types", [[".md", ".py"]]) | ||||||
@pytest.mark.parametrize("file_types", [[".txt", ".py"]]) | ||||||
def test_check_file_type(file_path, file_types): | ||||||
""" | ||||||
test check file types | ||||||
|
@@ -53,18 +53,18 @@ def test_check_file_type(file_path, file_types): | |||||
["tests/test_files/sample_test_file.md", "tests/test_files/sample_test_file.py"], | ||||||
) | ||||||
@pytest.mark.parametrize( | ||||||
"white_list_patterns", [["[.py]"], ["[.md]"], ["tests/test_file"]] | ||||||
"exclude_patterns", [["[.py]"], ["[.md]"], ["tests/test_file"]] | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you explain this change to me please? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! So your current regular expressions are matching exactly .py. We usually only need brackets when we want an exact match of a character that could be a regular expression (in the example above, the period). So I moved the brackets around just the periods, left the letters as is, and added a $ to indicate we only want to match the end of the line (e.g., we wouldn't want to match filename.python-bindings or |
||||||
) | ||||||
def test_include_files(file_path, white_list_patterns): | ||||||
def test_include_files(file_path, exclude_patterns): | ||||||
""" | ||||||
test if a file should be included based on patterns (using extension for test) | ||||||
""" | ||||||
_, extension = os.path.splitext(file_path) | ||||||
expected = not extension in file_path | ||||||
result = include_file(file_path, white_list_patterns) | ||||||
result = include_file(file_path, exclude_patterns) | ||||||
|
||||||
# No files should be included for a global path pattern | ||||||
if "tests/test_file" in white_list_patterns: | ||||||
if "tests/test_file" in exclude_patterns: | ||||||
if result: | ||||||
raise AssertionError | ||||||
|
||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
https://github.com/urlstechie/urlchecker-action | ||
https://github.com/urlstechie/urlchecker-python | ||
https://github.com/urlstechie/urlstechie.github.io | ||
https://urlstechie.github.io/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# This is a test file | ||
include<stdio.h> | ||
|
||
int main() { | ||
printf("https://www.google.com/"); | ||
SuperKogito marked this conversation as resolved.
Show resolved
Hide resolved
|
||
printf("https://www.youtube.com/"); | ||
printf("https://stackoverflow.com/"); | ||
printf("https://github.com/"); | ||
return 0; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# This is a test file | ||
https://github.com/urlstechie | ||
https://github.com/urlstechie/urlchecker-python | ||
https://urlstechie.github.io/ | ||
https://superkogito.github.io/blog/urlstechie.html | ||
https://twitter.com | ||
https://anaconda.org/conda-forge/urlchecker | ||
https://urlchecker-python.readthedocs.io/en/latest/ |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -136,7 +136,7 @@ def get_parser(): | |||||
|
||||||
check.add_argument( | ||||||
"--timeout", | ||||||
help="timeout (seconds) to provide to the requests library (defaults to 5)", | ||||||
help="timeout (minutes) to provide to the aiohttp library (defaults to 5)", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
type=int, | ||||||
default=5, | ||||||
) | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,7 +48,11 @@ def main(args, extra): | |
sys.exit("Error %s does not exist." % path) | ||
|
||
# Parse file types, and excluded urls and files (includes absolute and patterns) | ||
file_types = args.file_types.split(",") | ||
file_types = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this necessary? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if the user puts a singe character expansion, this wouldn't be included since it's == 1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. true, but as far as I know the shortest file type is 2 characters like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Try doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes but doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This issue seems separate from asyncio - should it be a separate PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes I can add it in a separate one, I only did it here because we were waiting on the asyncio code so I thought I ship them together. |
||
if "," in args.file_types: | ||
file_types = [ft for ft in args.file_types.split(",") if len(ft) > 1] | ||
else: | ||
file_types.append(args.file_types) | ||
exclude_urls = remove_empty(args.exclude_urls.split(",")) | ||
exclude_patterns = remove_empty(args.exclude_patterns.split(",")) | ||
exclude_files = remove_empty(args.exclude_files.split(",")) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a version bump,up to 0.1.0 since we would be fundamentally changing the core library and making it un-usable for older python versions.