Documentation: https://alexmili.github.io/extract_favicon
Source Code: https://github.com/alexmili/extract_favicon
Extract Favicon is designed to easily retrieve favicons from any website. Built atop robust reachable
and BeautifulSoup
, it aims to deliver accurate and efficient favicon extraction for web scraping and data analysis workflows.
Key features include:
- Automatic Extraction: Detects multiple favicon references like
<link>
,<meta>
and inline base64-encoded icons. - Smart Fallbacks: When explicit icons aren’t defined, it checks standard fallback routes (like
favicon.ico
) to provide consistent results even on sites without standard declarations. - Size Guessing: Dynamically determines favicon dimensions, even for images lacking explicit size information, by partially downloading and parsing their headers.
- Base64 Support: Easily handles inline data URLs, decoding base64-encoded images and validating them on-the-fly.
- Availability Checks: Validates each favicon’s URL, following redirects and marking icons as reachable or not.
- Async Support: Offers asynchronous methods (via
asyncio
) to efficiently handle multiple favicon extractions concurrently, enhancing overall performance when dealing with numerous URLs.
Create and activate a virtual environment and then install extract_favicon
:
pip install extract_favicon
>>> import extract_favicon
>>> icons = extract_favicon.from_url("https://www.python.org/")
Favicon(url="https://www.python.org/static/apple-touch-icon-144x144-precomposed.png", width=144, height=144, format="png")
Favicon(url="https://www.python.org/static/apple-touch-icon-114x114-precomposed.png", width=114, height=114, format="png")
Favicon(url="https://www.python.org/static/apple-touch-icon-72x72-precomposed.png", width=72, height=72, format="png")
Favicon(url="https://www.python.org/static/apple-touch-icon-precomposed.png", width=0, height=0, format="png")
Favicon(url="https://www.python.org/static/favicon.ico", width=0, height=0, format="ico")
Directly from already downloaded HTML:
>>> import extract_favicon
>>> icons = extract_favicon.from_html(my_html, root_url="https://www.python.org/static/")
Favicon(url="https://www.python.org/static/apple-touch-icon-144x144-precomposed.png", width=144, height=144, format="png")
Favicon(url="https://www.python.org/static/apple-touch-icon-114x114-precomposed.png", width=114, height=114, format="png")
Favicon(url="https://www.python.org/static/apple-touch-icon-72x72-precomposed.png", width=72, height=72, format="png")
Favicon(url="https://www.python.org/static/apple-touch-icon-precomposed.png", width=0, height=0, format="png")
Favicon(url="https://www.python.org/static/favicon.ico", width=0, height=0, format="ico")
Download extracted favicons:
>>> import extract_favicon
>>> favicons = extract_favicon.from_html(my_html, root_url="https://www.python.org/static/")
>>> favicons_obj = extract_favicon.download(favicons)
[
RealFavicon(
url=FaviconURL(
url="https://www.python.org/static/apple-touch-icon-precomposed.png",
final_url="https://www.python.org/static/apple-touch-icon-precomposed.png",
redirected=False,
status_code=200,
),
format="png",
valid=True,
original=Favicon(
url="https://www.python.org/static/apple-touch-icon-precomposed.png",
format="png",
width=0,
height=0,
),
image=<PIL.PngImagePlugin.PngImageFile image mode=RGBA size=57x57>,
width=57,
height=57,
)
]
This library is an extension of the favicon package.