Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add/asyncio aiohttp acceleration #52

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
branches:
only:
only:
- master

language: python
python:
- "3.5"
- "3.6"
- "3.7"
- "3.8"
- "3.9"

cache:
- pip
Expand All @@ -23,6 +24,7 @@ before_script:
- pip3 install codecov
- pip3 install coveralls
- pip3 install codacy-coverage
- pip3 install -r requirements.txt
- sudo apt-get update

# command to run tests
Expand Down
5 changes: 3 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# CHANGELOG

This is a manually generated log to track changes to the repository for each release.
Each section should include general headers such as **Implemented enhancements**
This is a manually generated log to track changes to the repository for each release.
Each section should include general headers such as **Implemented enhancements**
and **Merged pull requests**. Critical items to know are:

- renamed commands
Expand All @@ -12,6 +12,7 @@ and **Merged pull requests**. Critical items to know are:
Referenced versions in headers are tagged on Github, in parentheses are for pypi.

## [vxx](https://github.com/urlstechie/urlschecker-python/tree/master) (master)
- accelerate code using asyncio and aiohttp (0.0.23)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a version bump,up to 0.1.0 since we would be fundamentally changing the core library and making it un-usable for older python versions.

- updating "whitelist" arguments to exclude (0.0.22)
- adding support for dotfiles for a file type (0.0.21)
- final regexp needs to again parse away { or } (0.0.20)
Expand Down
37 changes: 21 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<div style="text-align:center"><img src="https://raw.githubusercontent.com/urlstechie/urlchecker-python/master/docs/urlstechie.png"/></div>

[![Build Status](https://travis-ci.com/urlstechie/urlchecker-python.svg?branch=master)](https://travis-ci.com/urlstechie/urlchecker-python) [![Documentation Status](https://readthedocs.org/projects/urlchecker-python/badge/?version=latest)](https://urlchecker-python.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/urlstechie/urlchecker-python/branch/master/graph/badge.svg)](https://codecov.io/gh/urlstechie/urlchecker-python) [![Python](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue)](https://www.python.org/doc/versions/) [![CodeFactor](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python/badge)](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python) ![PyPI](https://img.shields.io/pypi/v/urlchecker) [![Downloads](https://pepy.tech/badge/urlchecker)](https://pepy.tech/project/urlchecker) [![License](https://img.shields.io/badge/license-MIT-brightgreen)](https://github.com/urlstechie/urlchecker-python/blob/master/LICENSE)
[![Build Status](https://travis-ci.com/urlstechie/urlchecker-python.svg?branch=master)](https://travis-ci.com/urlstechie/urlchecker-python) [![Documentation Status](https://readthedocs.org/projects/urlchecker-python/badge/?version=latest)](https://urlchecker-python.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/urlstechie/urlchecker-python/branch/master/graph/badge.svg)](https://codecov.io/gh/urlstechie/urlchecker-python) [![Python](https://img.shields.io/badge/python-3.5%20|%203.6%20|%203.7%20|%203.8%20|%203.9-blue)](https://www.python.org/doc/versions/) [![CodeFactor](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python/badge)](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python) [![PyPI version](https://badge.fury.io/py/urlchecker.svg)](https://badge.fury.io/py/urlchecker) [![Downloads](https://pepy.tech/badge/urlchecker)](https://pepy.tech/project/urlchecker) [![License](https://img.shields.io/badge/license-MIT-brightgreen)](https://github.com/urlstechie/urlchecker-python/blob/master/LICENSE)


# urlchecker-python
Expand All @@ -10,6 +10,11 @@ and then test for and report broken links. If you are interesting in using
this as a GitHub action, see [urlchecker-action](https://github.com/urlstechie/urlchecker-action). There are also container
bases available on [quay.io/urlstechie/urlchecker](https://quay.io/repository/urlstechie/urlchecker?tab=tags).

## Module Dependencies
**Versions <= 0.0.22** are built around the [Requests](https://requests.readthedocs.io/en/master/) library whereas
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Versions <= 0.0.22** are built around the [Requests](https://requests.readthedocs.io/en/master/) library whereas
**versions <= 0.0.22** are built around the [Requests](https://requests.readthedocs.io/en/master/) library whereas

**versions >= 0.0.23** are built around the [asyncio](https://docs.python.org/3/library/asyncio.html) and the [AIOHTTP](https://docs.aiohttp.org/en/stable/) libraries.


## Module Documentation

A detailed documentation of the code is available under [urlchecker-python.readthedocs.io](https://urlchecker-python.readthedocs.io/en/latest/)
Expand Down Expand Up @@ -88,7 +93,7 @@ optional arguments:
--save SAVE Path to a csv file to save results to.
--retry-count RETRY_COUNT
retry count upon failure (defaults to 2, one retry).
--timeout TIMEOUT timeout (seconds) to provide to the requests library
--timeout TIMEOUT timeout (minutes) to provide to the aiohttp library
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not the change the API here - if the user provides seconds, we should just convert to minutes for the library.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. I tried to keep the numbers the same, that's why I only changed the unit, but I will fix this.

(defaults to 5)
```

Expand Down Expand Up @@ -121,7 +126,7 @@ $ urlchecker check .
save: None
timeout: 5

/tmp/urlchecker-action/README.md
/tmp/urlchecker-action/README.md
--------------------------------
https://github.com/urlstechie/urlchecker-action/blob/master/LICENSE
https://github.com/r-hub/docs/blob/bc1eac71206f7cb96ca00148dcf3b46c6d25ada4/.github/workflows/pr.yml
Expand Down Expand Up @@ -152,7 +157,7 @@ https://github.com/SuperKogito/Voice-based-gender-recognition/issues
https://github.com/buildtesters/buildtest/blob/v0.9.1/.github/workflows/urlchecker.yml
https://github.com/berlin-hack-and-tell/berlinhackandtell.rocks/blob/master/.github/workflows/urlchecker-pr-label.yml

/tmp/urlchecker-action/examples/README.md
/tmp/urlchecker-action/examples/README.md
-----------------------------------------
https://github.com/urlstechie/urlchecker-action/releases
https://github.com/urlstechie/urlchecker-action/issues
Expand Down Expand Up @@ -184,7 +189,7 @@ $ urlchecker check --exclude-pattern SuperKogito .
save: None
timeout: 5

/tmp/urlchecker-action/README.md
/tmp/urlchecker-action/README.md
--------------------------------
https://github.com/urlstechie/urlchecker-action/blob/master/LICENSE
https://github.com/urlstechie/urlchecker-action/issues
Expand Down Expand Up @@ -212,7 +217,7 @@ https://github.com/berlin-hack-and-tell/berlinhackandtell.rocks/actions?query=wo
https://github.com/USRSE/usrse.github.io
https://github.com/rseng/awesome-rseng/blob/5f5cb78f8392cf10aec2f3952b305ae9611029c2/.github/workflows/urlchecker.yml

/tmp/urlchecker-action/examples/README.md
/tmp/urlchecker-action/examples/README.md
-----------------------------------------
https://help.github.com/en/actions/reference/events-that-trigger-workflows
https://github.com/urlstechie/urlchecker-action/issues
Expand Down Expand Up @@ -386,32 +391,32 @@ You can look at `checker.checks`, which is a dictionary of result objects,
organized by the filename:

```python
for file_name, result in checker.checks.items():
print()
print(result)
print("Total Results: %s " % result.count)
print("Total Failed: %s" % len(result.failed))
print("Total Passed: %s" % len(result.passed))
for file_name, result in checker.checks.items():
print()
print(result)
print("Total Results: %s " % result.count)
print("Total Failed: %s" % len(result.failed))
print("Total Passed: %s" % len(result.passed))

...

UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/tests/test_files/sample_test_file.md
Total Results: 26
Total Results: 26
Total Failed: 6
Total Passed: 20

UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.pytest_cache/README.md
Total Results: 1
Total Results: 1
Total Failed: 0
Total Passed: 1

UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.eggs/pytest_runner-5.2-py3.7.egg/ptr.py
Total Results: 0
Total Results: 0
Total Failed: 0
Total Passed: 0

UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/docs/source/conf.py
Total Results: 3
Total Results: 3
Total Failed: 0
Total Passed: 3
```
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
asyncio==3.4.3
aiohttp==3.7.3
2 changes: 1 addition & 1 deletion tests/_local_test_config.conf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[DEFAULT]
git_path_test_value = https://github.com/urlstechie/urlchecker-test-repo
file_types_test_values = .md,.py,.c,.txt
file_types_test_values = .md,.c,.txt
exclude_test_urls = https://github.com/SuperKogito/URLs-checker/issues/2,https://github.com/SuperKogito/URLs-checker/issues/3
exclude_test_patterns = https://github.com/SuperKogito/Voice-based-gender-recognition/issues,https://img.shields.io/
62 changes: 59 additions & 3 deletions tests/test_client_check.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,37 @@
import os
import pytest
import subprocess
import argparse
import tempfile
import subprocess
import configparser
from urlchecker.client import check


def test_client_general():
# excute scripts
pipe = subprocess.run(
["urlchecker", "-h"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
assert pipe.stderr.decode("utf-8") == ""

pipe = subprocess.run(
["urlchecker", "--help"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
assert pipe.stderr.decode("utf-8") == ""

pipe = subprocess.run(
["urlchecker", "--version"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
assert pipe.stderr.decode("utf-8") == ""


@pytest.mark.parametrize("config_fname", ["./tests/_local_test_config.conf"])
@pytest.mark.parametrize("cleanup", [False, True])
@pytest.mark.parametrize("print_all", [False, True])
@pytest.mark.parametrize("force_pass", [False, True])
@pytest.mark.parametrize("rcount", [1, 3])
@pytest.mark.parametrize("timeout", [3, 5])
def test_client_general(config_fname, cleanup, print_all, force_pass, rcount, timeout):
@pytest.mark.parametrize("timeout", [5, 7])
def test_client_check(config_fname, cleanup, print_all, force_pass, rcount, timeout):

# init config parser
config = configparser.ConfigParser()
Expand Down Expand Up @@ -101,3 +121,39 @@ def test_client_save(save):
if save:
if not os.path.exists(output_csv.name):
raise AssertionError


@pytest.mark.parametrize("config_fname", ["./tests/_local_test_config.conf"])
def test_client_check_main(config_fname):

# init config parser
config = configparser.ConfigParser()
config.read(config_fname)

# init env variables
path = config["DEFAULT"]["git_path_test_value"]
file_types = config["DEFAULT"]["file_types_test_values"]
exclude_urls = config["DEFAULT"]["exclude_test_urls"]
exclude_patterns = config["DEFAULT"]["exclude_test_patterns"]

# init args
args = argparse.Namespace()
args.path = path
args.branch = "master"
args.subfolder = "test_files"
args.cleanup = True
args.force_pass = True
args.no_print = True
args.file_types = file_types
args.files = ""
args.exclude_urls = ""
args.exclude_patterns = ""
args.exclude_files = ""
args.save = ""
args.retry_count = 1
args.timeout = 5

# excute script
with pytest.raises(SystemExit) as e:
check.main(args=args, extra=[])
assert e.value.code == 0
2 changes: 1 addition & 1 deletion tests/test_core_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
"file_paths",
[
["tests/test_files/sample_test_file.md"],
["tests/test_files/sample_test_file.py"],
["tests/test_files/sample_test_file.c"],
["tests/test_files/sample_test_file.rst"],
],
)
Expand Down
12 changes: 6 additions & 6 deletions tests/test_core_fileproc.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ def test_check_file_type(file_path, file_types):

@pytest.mark.parametrize(
"file_path",
["tests/test_files/sample_test_file.md", "tests/test_files/sample_test_file.py"],
["tests/test_files/sample_test_file.txt", "tests/test_files/sample_test_file.py"],
)
@pytest.mark.parametrize("file_types", [[".md", ".py"]])
@pytest.mark.parametrize("file_types", [[".txt", ".py"]])
def test_check_file_type(file_path, file_types):
"""
test check file types
Expand All @@ -53,18 +53,18 @@ def test_check_file_type(file_path, file_types):
["tests/test_files/sample_test_file.md", "tests/test_files/sample_test_file.py"],
)
@pytest.mark.parametrize(
"white_list_patterns", [["[.py]"], ["[.md]"], ["tests/test_file"]]
"exclude_patterns", [["[.py]"], ["[.md]"], ["tests/test_file"]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"exclude_patterns", [["[.py]"], ["[.md]"], ["tests/test_file"]]
"exclude_patterns", [["[.]py$"], ["[.]md$"], ["tests/test_file"]]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain this change to me please?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! So your current regular expressions are matching exactly .py. We usually only need brackets when we want an exact match of a character that could be a regular expression (in the example above, the period). So I moved the brackets around just the periods, left the letters as is, and added a $ to indicate we only want to match the end of the line (e.g., we wouldn't want to match filename.python-bindings or .mdl

)
def test_include_files(file_path, white_list_patterns):
def test_include_files(file_path, exclude_patterns):
"""
test if a file should be included based on patterns (using extension for test)
"""
_, extension = os.path.splitext(file_path)
expected = not extension in file_path
result = include_file(file_path, white_list_patterns)
result = include_file(file_path, exclude_patterns)

# No files should be included for a global path pattern
if "tests/test_file" in white_list_patterns:
if "tests/test_file" in exclude_patterns:
if result:
raise AssertionError

Expand Down
4 changes: 2 additions & 2 deletions tests/test_core_urlproc.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@ def test_get_user_agent():

def test_check_response_status_code():
class failedResponse:
status_code = 500
status = 500

class successResponse:
status_code = 200
status = 200

# Any failure returns True (indicating a retry is needed)
assert not check_response_status_code(
Expand Down
4 changes: 4 additions & 0 deletions tests/test_files/.dotfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
https://github.com/urlstechie/urlchecker-action
https://github.com/urlstechie/urlchecker-python
https://github.com/urlstechie/urlstechie.github.io
https://urlstechie.github.io/
10 changes: 10 additions & 0 deletions tests/test_files/sample_test_file.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This is a test file
include<stdio.h>

int main() {
printf("https://www.google.com/");
SuperKogito marked this conversation as resolved.
Show resolved Hide resolved
printf("https://www.youtube.com/");
printf("https://stackoverflow.com/");
printf("https://github.com/");
return 0;
}
8 changes: 4 additions & 4 deletions tests/test_files/sample_test_file.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
The following is a list of test urls to extract.
- [test url 1](https://www.google.com/)
- [test url 2](https://github.com/SuperKogito)
- [test url 3](https://github.com/SuperKogito/URLs-checker)
- [test url 3](https://github.com/vsoch)
- [test url 4](https://github.com/SuperKogito/URLs-checker/blob/master/README.md)
- [test url 5](https://github.com/SuperKogito/URLs-checker/issues)
- [test url 6](https://github.com/SuperKogito/URLs-checker/issues/4)
- [test url 6](https://travis-ci.com/github/urlstechie)

- [test url 7](https://github.com/SuperKogito/spafe/)
- [test url 8](https://github.com/SuperKogito/spafe/issues)
- [test url 9](https://github.com/SuperKogito/spafe/issues/1)
- [test url 8](https://codecov.io/gh/urlstechie)
- [test url 9](https://github.com/urlstechie/urlchecker-action)

- [test url 10](https://github.com/SuperKogito/Voice-based-gender-recognition)
- [test url 11](https://github.com/SuperKogito/Voice-based-gender-recognition/issues)
Expand Down
5 changes: 3 additions & 2 deletions tests/test_files/sample_test_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
print("This is a test file with some URLs")
url1 = "https://www.google.com/"
url2 = "https://github.com/SuperKogito"
url3 = "https://github.com/SuperKogito/URLs-checker/README.md"
url3 = {"url": "https://github.com/SuperKogito/URLs-checker/README.md"}
url3 = "https://github.com/vsoch"
url4 = "https://github.com/SuperKogito/URLs-checker/README.md"
url5 = {"url": "https://github.com/SuperKogito/URLs-checker/README.md"}
print("Done.")
8 changes: 8 additions & 0 deletions tests/test_files/sample_test_file.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# This is a test file
https://github.com/urlstechie
https://github.com/urlstechie/urlchecker-python
https://urlstechie.github.io/
https://superkogito.github.io/blog/urlstechie.html
https://twitter.com
https://anaconda.org/conda-forge/urlchecker
https://urlchecker-python.readthedocs.io/en/latest/
2 changes: 1 addition & 1 deletion urlchecker/client/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ def get_parser():

check.add_argument(
"--timeout",
help="timeout (seconds) to provide to the requests library (defaults to 5)",
help="timeout (minutes) to provide to the aiohttp library (defaults to 5)",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
help="timeout (minutes) to provide to the aiohttp library (defaults to 5)",
help="timeout (seconds) to provide to the aiohttp library (defaults to 5)",

type=int,
default=5,
)
Expand Down
6 changes: 5 additions & 1 deletion urlchecker/client/check.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,11 @@ def main(args, extra):
sys.exit("Error %s does not exist." % path)

# Parse file types, and excluded urls and files (includes absolute and patterns)
file_types = args.file_types.split(",")
file_types = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if the user puts a singe character expansion, this wouldn't be included since it's == 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but as far as I know the shortest file type is 2 characters like .c or .*. How would a one sign file type look like?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try doing ls . and you'll see all files in the present working directory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but doing --files-types . doesn't seem correct. . is not a files type. ofc the user can use * but do we really need to account for that? btw the reason for this change is imo related to urlstechie/urlchecker-action#76

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue seems separate from asyncio - should it be a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I can add it in a separate one, I only did it here because we were waiting on the asyncio code so I thought I ship them together.

if "," in args.file_types:
file_types = [ft for ft in args.file_types.split(",") if len(ft) > 1]
else:
file_types.append(args.file_types)
exclude_urls = remove_empty(args.exclude_urls.split(","))
exclude_patterns = remove_empty(args.exclude_patterns.split(","))
exclude_files = remove_empty(args.exclude_files.split(","))
Expand Down
Loading