diff --git a/docs/2024-03-02-weekly.md b/docs/2024-03-02-weekly.md index 6d6f219..3de46fb 100644 --- a/docs/2024-03-02-weekly.md +++ b/docs/2024-03-02-weekly.md @@ -23,7 +23,7 @@ author: '豌豆花下猫' 上期周刊分享的可替换`pip`的`uv` 库,你用了么?感觉如何啊?文章作者给出了积极反馈,分享了自己一些配置文件的前后对比。 -4、 [Python 生成器未得到充分利用](https://www.slashtmp.io/posts/generators/) +4、[Python 生成器未得到充分利用](https://www.slashtmp.io/posts/generators/) Python 生成器的作用是能节省内存,这篇文章用很明白的例子对比了两种内存使用情况,让我们感受到生成器的好处,同时,文章也指出了需要避免的一些使用陷阱。 diff --git a/docs/en/2024-03-02-weekly.md b/docs/en/2024-03-02-weekly.md new file mode 100644 index 0000000..f453276 --- /dev/null +++ b/docs/en/2024-03-02-weekly.md @@ -0,0 +1,121 @@ +# Python Trending Weekly 40 (2024-03-02) + +Welcome to the Python Trending Weekly, a weekly newsletter about Python, AI and general programming techniques, with the majority links in English and a small portion in Chinese. + +The [original version](https://pythoncat.top/posts/2024-03-02-weekly) of the weekly was written in Chinese. What you are reading here is mostly translated by LLMs. + +**Substack Channel** : [Click to subscribe](https://pythoncat.substack.com/s/python-trending-weekly) + +## 🦄Articles & Tutorials + +1. [The White House recommends the use of memory-safe languages such as Python](https://pyfound.blogspot.com/2024/02/white-house-recommends-.html) + +A report released last year by agencies including CISA and NSA listed C#, Go, Java, **Python**, Rust, and Swift as memory-safe languages. This PSF article discusses Python's work on memory safety, including sandboxing the underlying code, migrating from C to Rust, and using compiler options to harden the C code builds. + +![A brief description of Python in the CSI table](https://img.pythoncat.top/2024-03-01_python.png) + +2. [A Retrospective on Requests](https://blog.ian.stapletoncordas.co/2024/02/a-retrospective-on-requests) + +The author of the article is one of the core maintainers of Requests. He listed several places where the library did poorly and also pointed out many reasons why he wanted to improve it but didn't. The epilogue says: "the project feels dead". It's heartbreaking. This weekly [issue 26](https://pythoncat.top/posts/2023-11-11-weekly) shared an apology from KR, the author of the library. However, there was little response in the community. Later, I saw that KR lost his job, and I felt even worse from the tweet that he was in a bad mental state. (Contributed by @frostming90) + +3. [Python's UV Tool Is Actually Pretty Good](https://micro.webology.dev/2024/02/29/pythons-uv-tool.html) + +Last issue, we shared the `uv` library, which can replace `pip`. Have you used it? How do you feel about it? The author of the article gave positive feedback and shared some before-and-after comparisons of his configuration files. + +4. [Python Generators Are Underutilized](https://www.slashtmp.io/posts/generators/) + +Python generators are useful for saving memory. This article uses clear examples to compare the memory usage of two different approaches, allowing us to appreciate the benefits of generators. Additionally, the article highlights some usage pitfalls to avoid. + +5. [Advanced Web Scraping With Python: Extract Data From Any Site](https://jacobpadilla.com/articles/advanced-web-scraping-techniques) + +This article discusses some advanced techniques for scraping the web, including topics such as how to better handle cookies and custom request headers, what TLS fingerprinting is and how to avoid it, common HTTP request headers to be aware of, how to integrate exponential backoff retries when making HTTP requests, and more. + +6. [Django REST Framework and Vue versus Django and HTMX](https://testdriven.io/blog/drf-vue-vs-django-htmx/) + +How are two combinations of web development frameworks used, and what are their advantages and disadvantages? This article implements the same functionality with these two combinations, analyzes the differences between the two technology stacks, and provides a comparison checklist to help us make better technology choices. + +7. [The Road to Composable Data Systems: Thoughts on the Last 15 Years and the Future](https://wesmckinney.com/blog/looking-back-15-years/) + +This article is written by Wes McKinney, the author of the `pandas` library and the book *Python for Data Analysis*. He reviews what he has done and the changes he has made in the field of data science since 2008, and analyzes and thinks about the future trends of modularity, interoperability, and composability. + +8. [Django SQLite Benchmark](https://blog.pecar.me/django-sqlite-benchmark) + +The author benchmarked SQLite with different major configurations and also compared the performance between SQLite and PostgreSQL. In short, enabling WAL mode, using IMMEDIATE transactions, `synchronous=NORMAL`, and memory-mapped I/O have little impact on the throughput. + +![](https://img.pythoncat.top/sqlite-django-benchmark.png) + +9. [How Python 3.13's JIT Is Implemented](https://zhuanlan.zhihu.com/p/682997904) + +The article introduces how Python's latest JIT is implemented, and the author tried to install the development version and then compared its performance with the version without JIT. Currently, the performance of the JIT version is slower than the normal version, and the official team still needs to continue to optimize it. + +10. [Web Scraping in Python - The Complete Guide](https://proxiesapi.com/articles/web-scraping-in-python-the-complete-guide) + +A detailed tutorial on web scraping, introducing how to use libraries like BeautifulSoup, Scrapy, and Selenium to achieve web scraping, and how to overcome challenges like complex web pages, rate limiting, anti-scraping, and dynamic JavaScript. + +11. [In Defense of Simple Architectures](https://danluu.com/simple-architectures/) + +Wave is a company with only 70 engineers but valued at $1.7 billion. Its product is just a standard CRUD application, built on a Python monolith architecture on top of Postgres. The article explains why they chose such an architecture, the rationality of such a choice, and the related challenges they overcame and the technical solutions they adopted in order to maintain it. + +12. [Scheduling Internals](https://tontinton.com/posts/scheduling-internals/) + +A very in-depth and long article that delves into "concurrency", explaining how a single-threaded server can handle millions of tasks through asynchronous IO and event-driven programming. It discusses various approaches and tools for implementing concurrency, and the implementations in different programming languages. The article contains many animations to help readers understand. + +🎁 Python Trending Weekly 🎁 organizes its content into seasons, with every 30 issues forming a season. The highlights from the first season have been compiled for your convenience. You can access them online [here](https://pythoncat.top/posts/2023-12-11-weekly) (Chinese). + +## 🐿️Projects & Resources + +1. [ingestr: a CLI tool to copy data between any databases with a single command seamlessly](https://github.com/bruin-data/ingestr) + +This is a CLI tool to seamlessly copy data between any databases with a single command. It supports incremental loading: `append`, `merge`, and `delete+insert` modes. (1.3K stars) + +2. [justpath: Inspect and refine PATH environment variable on both Windows and Linux](https://github.com/epogrebnyak/justpath) + +A command-line tool for managing the PATH environment variable on your operating system, with typical features including: filtering directories, identifying and cleaning up invalid configurations, dumping PATH to JSON, creating new environment variables, and counting the number of. + +3. [mountaineer: Mountaineer is a batteries-included web framework for Python and React](https://github.com/piercefreeman/mountaineer) + +A full-stack web development framework featuring full-stack type checking, friendly service communication and data binding, server-side rendering, enhanced validation with static analysis for the web, and more. + +4. [generate: One API to Access World-Class Generative Models](https://github.com/wangyuxinwhy/generate) + +Use a unified API to access large models, featuring: unified API, multi-modal support, support for 10+ large model platforms, asynchronous & streaming and concurrency, self-contained, lightweight, high-quality code. (Submitted by @wangyuxinwhy) + +5. [StringZilla: Up to 10x faster strings for C, C++, Python, Rust, and Swift](https://github.com/ashvardanian/StringZilla) + +"The world wastes at least $100 million per year on inefficient string operations." This project replaces the native string type in programming languages to improve performance. It speeds up exact and fuzzy string matching, edit distance calculations, sorting, lazy evaluation of ranges to avoid memory allocations, and even a random string generator. (1.4k stars) + +6. [DrissionPage: Web automation tool that can control browsers and send and receive data packets](https://github.com/g1879/DrissionPage) + +It uses a completely self-developed kernel and has the following advantages over Selenium: no webdriver features, cross-iframe element search, treats iframes as ordinary elements, can operate multiple tabs at the same time, can directly read browser cache to save pictures, can take screenshots of the entire web page, etc. (star 4.1K) + +7. Daft: Distributed DataFrame for Python designed for the cloud, powered by Rust [https://github.com/Eventual-Inc/Daft](https://github.com/Eventual-Inc/Daft) + +A distributed query engine for large-scale data processing built with Rust, featuring a familiar interactive API, focus on query optimization, integrated data catalog, rich polymorphic type system, and built for the cloud. (1.4k stars) + +8. [magika: Detect file content types with deep learning](https://github.com/google/magika) + +Google's latest open-source project uses AI to detect file types with 99% accuracy. Available as a Python command-line tool and API, it supports over 100 file types with an inference time of around 5 milliseconds per file. (7k stars) + +![Comparison of magika with other tools](https://img.pythoncat.top/google-magika.png) + +9. [frappe: Low code web framework for real world applications, in Python and Javascript](https://github.com/frappe/frappe) + +A full-stack web framework with batteries included, low code, server-side with Python and MariaDB, featuring: metadata first, admin interface, roles and permissions out of the box, plugins support, task scheduler, email management, multi-tenancy, and more. (6.3k stars) + +10. [Umi-OCR: Open source, free offline OCR software, supports screenshot/batch import of images](https://github.com/hiroi-sora/Umi-OCR) + +Unzip and use, run offline, no network required; comes with a high-efficiency offline OCR engine, built-in multilingual recognition library; supports multiple invocation methods such as command line, HTTP interface; screenshot OCR / batch OCR / PDF recognition / QR code. (19.4k stars) + +11. [xonsh: Python-powered, cross-platform, Unix-gazing shell](https://github.com/xonsh/xonsh) + +This project is a superset of Python 3.6+ with shell primitives. It can be used as a shell and Python separately, or you can write shell in Python and Python in shell. (7.8k stars) + +![Image of "What is xonsh?"](https://img.pythoncat.top/what_is_xonsh.png) + +## 🐼Subscribe Welcome + +- [Blog](https://pythoncat.top): Explore my independent blog where you can find a collection of original/translated technical articles over the years, along with some reflections since 2009. +- [Newsletter](https://pythoncat.substack.com/s/python-trending-weekly): Subscribe to my channel on Substack for a curated newsletter delivered straight to your inbox, keeping you updated on current affairs. +- [Github](https://github.com/chinesehuazhou/python-weekly): Access the Markdown source files of this weekly digest on Github and feel free to use them for anything you have in mind! +- [Telegram](https://t.me/pythontrendingweekly): Beyond notifications for the weekly digest, I consider it an "extra edition," providing additional, more diverse information. +- [Twitter](https://twitter.com/chinesehuazhou): Follow me on Twitter where my feed is filled with numerous accounts of developers and organizations in the Python community. \ No newline at end of file diff --git a/resources/weekly_summary_en.py b/resources/weekly_summary_en.py new file mode 100644 index 0000000..93e7db0 --- /dev/null +++ b/resources/weekly_summary_en.py @@ -0,0 +1,89 @@ +import datetime +import os +import re +import sys + + +def read_md(file_path): + """ + 解析markdown文件,返回内容二级标题及其子标题;不含子标题为空的部分 + :param file_path: md文件 + :return: 内容摘要的字典 + """ + with open(file_path, 'r', encoding="utf-8") as f: + file_content = f.read() + origin_content = parse_md(file_content) + new_content = {key: value for key, value in origin_content.items() if value} + return new_content + + +def parse_md(file_content): + """ + 解析markdown文件,返回内容二级标题及其子标题 + :param file_content: md文件内容 + :return: 内容摘要的字典 + """ + titles = re.findall(r'## (.*?)\n', file_content) + sub_titles = re.findall(r'## (.*?)\n|\d、\[(.*?)\]\(.*?\)', file_content) + + parsed_content = {title: [] for title in titles} + + current_title = None + for title, sub_title in sub_titles: + if title: + current_title = title + elif current_title is not None: + parsed_content[current_title].append(sub_title.strip()) + + return parsed_content + + +def content_to_string(contents): + message = "" + for section, sub_sections in contents.items(): + if sub_sections: + message += "**" + section + "** \n\n" + for i, sub_section in enumerate(sub_sections, start=1): + message += f"{chr(9311 + i)} " + sub_section + "\n" + message += "\n" + return message + + +def write_to_md_file(weekly_no, md_body, url): + """ + 将内容写入到markdown文件中 + :param content: 要写入的内容 + :param file_name: md文件名 + """ + file_name = f"Python 潮流周刊第 {weekly_no} 期(摘要)" + if os.path.exists(file_name + ".md"): + return + print("Writing summary to local file") + with open(file_name + ".md", 'w', encoding="utf-8") as f: + f.write(f"# {file_name}\n\n") + f.write("本周刊由 Python猫 出品,精心筛选国内外的 250+ 信息源," + "为你挑选最值得分享的文章、教程、开源项目、软件工具、播客和视频、热门话题等内容。" + "愿景:帮助所有读者精进 Python 技术,并增长职业和副业的收入。\n\n") + f.write(f"周刊全文:{url} \n\n") + f.write("以下是本期摘要: \n\n") + + # 添加换行符,解决某些平台无法正确换行的问题 + for i in range(1, 20): + md_body = md_body.replace(chr(9311 + i), "\n" + chr(9311 + i)) + f.write(md_body + "\n\n") + + f.write(f"**查看全文**:{url} \n\n") + f.write("**微信订阅**:https://img.pythoncat.top/python_cat.jpg \n\n") + f.write("**邮箱订阅**:https://pythoncat.substack.com") + + +def main(): + current_date = datetime.datetime.now().strftime('%Y-%m-%d') + file_name = f"{current_date}-weekly" + file_path = os.path.join("docs", f"{file_name}.md") + if not os.path.exists(file_path): + print(f"File {file_path} does not exist.") + sys.exit(1) + + +main()