Skip to content

Commit

Permalink
maintenance and docs: remove dependabot and update funding (#178)
Browse files Browse the repository at this point in the history
* maintenance: remove dependabot and update funding

* update readme

* add context

* remove duplicate text

* fix typos
  • Loading branch information
adbar authored Dec 2, 2024
1 parent 9c5f619 commit 314272e
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 52 deletions.
2 changes: 1 addition & 1 deletion .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# These are supported funding model platforms

github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2]
github: [adbar]
patreon: # Replace with a single Patreon username
open_collective: # Replace with a single Open Collective username
ko_fi: adbarbaresi
Expand Down
27 changes: 0 additions & 27 deletions .github/dependabot.yml

This file was deleted.

38 changes: 20 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@

<br/>

Find **original and updated publication dates** of any web page. **On
the command-line or with Python**, all the steps needed from web page
download to HTML parsing, scraping, and text analysis are included. The
package is used in production on millions of documents and integrated by
[multiple
libraries](https://github.com/adbar/htmldate/network/dependents).
Find **original and updated publication dates** of any web page.
It is often not possible to do it using just the URL or the server response.

**On the command-line or with Python**, all the steps needed from web page
download to HTML parsing, scraping, and text analysis are included.

The package is used in production on millions of documents and integrated into
[thousands of projects](https://github.com/adbar/htmldate/network/dependents).


## In a nutshell
Expand Down Expand Up @@ -114,17 +116,20 @@ license](https://www.apache.org/licenses/LICENSE-2.0.html).

Versions prior to v1.8.0 are under GPLv3+ license.

## Author
## Context

This project is part of methods to derive information from web documents
in order to build [text databases for
research](https://www.dwds.de/d/k-web) (chiefly linguistic analysis and
natural language processing).
Initially launched to create text databases for research purposes
at the Berlin-Brandenburg Academy of Sciences (DWDS and ZDL units),
this project continues to be maintained but its future development
depends on community support.

Extracting and pre-processing web texts to meet the exacting standards
is a significant challenge. It is often not possible to reliably
determine the date of publication or modification using either the URL
or the server response. For more information:
**If you value this software or depend on it for your product, consider
sponsoring it and contributing to its codebase**. Your support will
help maintain and enhance this popular package, ensuring its growth,
robustness, and accessibility for developers and users around the world.

Reach out via the software repository or the [contact page](https://adrien.barbaresi.eu/)
for inquiries, collaborations, or feedback.

[![JOSS article reference DOI: 10.21105/joss.02439](https://img.shields.io/badge/JOSS-10.21105%2Fjoss.02439-brightgreen)](https://doi.org/10.21105/joss.02439)
[![Zenodo archive DOI: 10.5281/zenodo.3459599](https://img.shields.io/badge/DOI-10.5281%2Fzenodo.3459599-blue)](https://doi.org/10.5281/zenodo.3459599)
Expand Down Expand Up @@ -156,9 +161,6 @@ or the server response. For more information:
Proceedings of the [10th Web as Corpus Workshop
(WAC-X)](https://www.sigwac.org.uk/wiki/WAC-X), 2016.

You can contact me via my [contact page](https://adrien.barbaresi.eu/)
or [GitHub](https://github.com/adbar).

## Contributing

[Contributions](https://github.com/adbar/htmldate/blob/master/CONTRIBUTING.md)
Expand Down
29 changes: 23 additions & 6 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,14 @@ htmldate: find the publication date of web pages

|
Find original and updated publication dates of any web page. From the command-line or within Python, all the steps needed from web page download to HTML parsing, scraping, and text analysis are included.
Find **original and updated publication dates** of any web page.
It is often not possible to do it using just the URL or the server response.

**On the command-line or with Python**, all the steps needed from web page
download to HTML parsing, scraping, and text analysis are included.

The package is used in production on millions of documents and integrated into
`thousands of projects <https://github.com/adbar/htmldate/network/dependents>`_.


In a nutshell
Expand Down Expand Up @@ -246,10 +253,22 @@ This package is distributed under the `Apache 2.0 license <https://www.apache.or
Versions prior to v1.8.0 are under GPLv3+ license.


Author
------
Context
-------

Initially launched to create text databases for research purposes
at the Berlin-Brandenburg Academy of Sciences (DWDS and ZDL units),
this project continues to be maintained but its future development
depends on community support.

**If you value this software or depend on it for your product, consider
sponsoring it and contributing to its codebase**. Your support will
help maintain and enhance this popular package, ensuring its growth,
robustness, and accessibility for developers and users around the world.

Reach out via the software repository or the `contact page
<https://adrien.barbaresi.eu/>`_ for inquiries, collaborations, or feedback.

This effort is part of methods to derive information from web documents in order to build `text databases for research <https://www.dwds.de/d/k-web>`_ (chiefly linguistic analysis and natural language processing). Extracting and pre-processing web texts to the exacting standards of scientific research presents a substantial challenge for those who conduct such research. There are web pages for which neither the URL nor the server response provide a reliable way to find out when a document was published or modified. For more information:

.. image:: https://img.shields.io/badge/JOSS-10.21105%2Fjoss.02439-brightgreen
:target: https://doi.org/10.21105/joss.02439
Expand Down Expand Up @@ -278,8 +297,6 @@ This effort is part of methods to derive information from web documents in order
- Barbaresi, A. "`Generic Web Content Extraction with Open-Source Software <https://hal.archives-ouvertes.fr/hal-02447264/document>`_", Proceedings of KONVENS 2019, Kaleidoscope Abstracts, 2019.
- Barbaresi, A. "`Efficient construction of metadata-enhanced web corpora <https://hal.archives-ouvertes.fr/hal-01371704v2/document>`_", Proceedings of the `10th Web as Corpus Workshop (WAC-X) <https://www.sigwac.org.uk/wiki/WAC-X>`_, 2016.

You can contact me via my `contact page <https://adrien.barbaresi.eu/>`_ or `GitHub <https://github.com/adbar>`_.


Contributing
------------
Expand Down

0 comments on commit 314272e

Please sign in to comment.