From 314272e136bcbc6ec2853ed9df55953a929b5ded Mon Sep 17 00:00:00 2001 From: Adrien Barbaresi Date: Mon, 2 Dec 2024 18:12:26 +0100 Subject: [PATCH] maintenance and docs: remove dependabot and update funding (#178) * maintenance: remove dependabot and update funding * update readme * add context * remove duplicate text * fix typos --- .github/FUNDING.yml | 2 +- .github/dependabot.yml | 27 --------------------------- README.md | 38 ++++++++++++++++++++------------------ docs/index.rst | 29 +++++++++++++++++++++++------ 4 files changed, 44 insertions(+), 52 deletions(-) delete mode 100644 .github/dependabot.yml diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml index 08ccbf34..0f192886 100644 --- a/.github/FUNDING.yml +++ b/.github/FUNDING.yml @@ -1,6 +1,6 @@ # These are supported funding model platforms -github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] +github: [adbar] patreon: # Replace with a single Patreon username open_collective: # Replace with a single Open Collective username ko_fi: adbarbaresi diff --git a/.github/dependabot.yml b/.github/dependabot.yml deleted file mode 100644 index e3f8b067..00000000 --- a/.github/dependabot.yml +++ /dev/null @@ -1,27 +0,0 @@ -# https://docs.github.com/en/code-security/dependabot/dependabot-version-updates/configuration-options-for-the-dependabot.yml-file -# https://docs.github.com/en/code-security/dependabot/working-with-dependabot/keeping-your-actions-up-to-date-with-dependabot - -version: 2 -updates: - - package-ecosystem: "pip" # See documentation for possible values - directory: "/" # Location of package manifests - schedule: - interval: "monthly" - # create a group of dependencies to be updated together in one pull request - groups: - # specify a name for the group, which will be used in pull request titles - # and branch names - dependencies: - # define patterns to include dependencies in the group (based on - # dependency name) - patterns: - - "*" # matches all dependencies in the package ecosystem - - - package-ecosystem: "github-actions" - directory: "/" - schedule: - interval: "monthly" - groups: - github-actions: - patterns: - - "*" # group all updates into a single larger pull request diff --git a/README.md b/README.md index fd61ef8b..cfe70815 100644 --- a/README.md +++ b/README.md @@ -13,12 +13,14 @@
-Find **original and updated publication dates** of any web page. **On -the command-line or with Python**, all the steps needed from web page -download to HTML parsing, scraping, and text analysis are included. The -package is used in production on millions of documents and integrated by -[multiple -libraries](https://github.com/adbar/htmldate/network/dependents). +Find **original and updated publication dates** of any web page. +It is often not possible to do it using just the URL or the server response. + +**On the command-line or with Python**, all the steps needed from web page +download to HTML parsing, scraping, and text analysis are included. + +The package is used in production on millions of documents and integrated into +[thousands of projects](https://github.com/adbar/htmldate/network/dependents). ## In a nutshell @@ -114,17 +116,20 @@ license](https://www.apache.org/licenses/LICENSE-2.0.html). Versions prior to v1.8.0 are under GPLv3+ license. -## Author +## Context -This project is part of methods to derive information from web documents -in order to build [text databases for -research](https://www.dwds.de/d/k-web) (chiefly linguistic analysis and -natural language processing). +Initially launched to create text databases for research purposes +at the Berlin-Brandenburg Academy of Sciences (DWDS and ZDL units), +this project continues to be maintained but its future development +depends on community support. -Extracting and pre-processing web texts to meet the exacting standards -is a significant challenge. It is often not possible to reliably -determine the date of publication or modification using either the URL -or the server response. For more information: +**If you value this software or depend on it for your product, consider +sponsoring it and contributing to its codebase**. Your support will +help maintain and enhance this popular package, ensuring its growth, +robustness, and accessibility for developers and users around the world. + +Reach out via the software repository or the [contact page](https://adrien.barbaresi.eu/) +for inquiries, collaborations, or feedback. [![JOSS article reference DOI: 10.21105/joss.02439](https://img.shields.io/badge/JOSS-10.21105%2Fjoss.02439-brightgreen)](https://doi.org/10.21105/joss.02439) [![Zenodo archive DOI: 10.5281/zenodo.3459599](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.3459599-blue)](https://doi.org/10.5281/zenodo.3459599) @@ -156,9 +161,6 @@ or the server response. For more information: Proceedings of the [10th Web as Corpus Workshop (WAC-X)](https://www.sigwac.org.uk/wiki/WAC-X), 2016. -You can contact me via my [contact page](https://adrien.barbaresi.eu/) -or [GitHub](https://github.com/adbar). - ## Contributing [Contributions](https://github.com/adbar/htmldate/blob/master/CONTRIBUTING.md) diff --git a/docs/index.rst b/docs/index.rst index 80323a17..e3d5b1d9 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -34,7 +34,14 @@ htmldate: find the publication date of web pages | -Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are included. +Find **original and updated publication dates** of any web page. +It is often not possible to do it using just the URL or the server response. + +**On the command-line or with Python**, all the steps needed from web page +download to HTML parsing, scraping, and text analysis are included. + +The package is used in production on millions of documents and integrated into +`thousands of projects `_. In a nutshell @@ -246,10 +253,22 @@ This package is distributed under the `Apache 2.0 license `_ for inquiries, collaborations, or feedback. -This effort is part of methods to derive information from web documents in order to build `text databases for research `_ (chiefly linguistic analysis and natural language processing). Extracting and pre-processing web texts to the exacting standards of scientific research presents a substantial challenge for those who conduct such research. There are web pages for which neither the URL nor the server response provide a reliable way to find out when a document was published or modified. For more information: .. image:: https://img.shields.io/badge/JOSS-10.21105%2Fjoss.02439-brightgreen :target: https://doi.org/10.21105/joss.02439 @@ -278,8 +297,6 @@ This effort is part of methods to derive information from web documents in order - Barbaresi, A. "`Generic Web Content Extraction with Open-Source Software `_", Proceedings of KONVENS 2019, Kaleidoscope Abstracts, 2019. - Barbaresi, A. "`Efficient construction of metadata-enhanced web corpora `_", Proceedings of the `10th Web as Corpus Workshop (WAC-X) `_, 2016. -You can contact me via my `contact page `_ or `GitHub `_. - Contributing ------------