UPDATE (February 4, 2024): This is the discussion about this project on HN: here. Please specifically read @dang's comment regarding the core assumption of this project: here. On a personal note, the number of Stories removed yesterday (Saturday, February 3, 2024) was the lowest ever recorded by the service. This includes 2 duplicate Stories. As a side note, in the list always check whether a Story is a duplicate or not: this is a very reasonable reason for removal and unfortunately I have no way of automatically determining it in the service!
The purpose of this project is to try to understand the type and scale of the moderation of the Hacker News Front Page.
NOTE: I love Hacker News. I try to read it every day. In the case of OnnxStream (here for example), 95% of the comments were helpful and intelligent. I also understand that moderating a site with huge traffic and where users are basically anonymous must be a very difficult task.
Returning to the purpose of this project, from what I have been able to see, the "public" (i.e. observable from the outside) moderation of the Front Page consists of two main tools: modification of the title of a Story (voluntarily or involuntarily influencing its growth in terms of rank) or directly its removal.
Regarding the first type of moderation, an excellent site is already available that tracks changes to Story titles. Here instead I will focus on the second type.
For the reasons explained in the "Why?" section below, I have developed a small application that logs all the Stories that are removed from the Front Page, for personal use. I later discovered that there is no tool/website that provides this type of information and I decided to make it public here. It was a difficult decision but my rationale is: is it better to have more transparency or less transparency?
If you know of a tool/website similar to this, please let me know: I will archive this repo or set it to private.
A possible very positive outcome for this project could be to have a list similar to this, but available directly among the HN lists. Or even to notify a user when a Story is penalized on the Front Page, perhaps indicating the number of flags and/or the reason, for example.
Feel free to skip this part or click to expand
A friend of mine posted two Stories on Hacker News related to OnnxStream (31 days apart), the first related to SDXL Turbo support and the second related to TinyLlama and Mistral 7B support.
In the case of the first, the Story was among the first on the Front Page, until its title was changed from "Stable Diffusion Turbo on a Raspberry Pi Zero 2 generates an image in 29 minutes" to "OnnxStream: Stable Diffusion XL 1.0 Base on a Raspberry Pi Zero 2". This effectively "killed" the Story. One user pointed out that the new title didn't reflect the spirit of the Story (thanks @practice9).
In the case of the second, the Story was in third place on the Front Page, less than an hour after the submission. In this case it was simply removed from the Front Page.
Having discovered this, perplexed, I sent an email to the moderator. @dang, who was very kind and quick in his response, explained to me that the Story had been flagged by users even without being explicitly [flagged], and that he could therefore only hypothesize the causes of the flag. His hypothesis was that (some?) users might be fed up with news related to LLMs.
While I have no reason to doubt Daniel's good faith, it's hard to believe that HN users would be tired of LLM-related news.
So I decided to develop a small console application to determine the frequency of this phenomenon (actually I was also motivated by the prospect of writing some C# code, after more than 2 years of complete abstinence). I subsequently discovered that there were no tools/websites that monitored this specific phenomenon and I therefore decided to make it public here.
Using the official HN API, the service fetches 90 Top Stories every minute and makes a comparison with the first 30 Top Stories (i.e. the Front Page) fetched the previous minute. It logs all missing Stories here. The assumption is that a Story cannot go from the top 30 to a position greater than 90 in a single minute, without having been explicitly removed. If a Story reappears on the Front Page, it is removed from this log. All Stories present in the second-chance pool are excluded from the log. Title and URL are those from when the Story first appeared in the top 30. The number of points and comments and the rank are those from when the Story was removed from the Front Page. The ID points to the news.social-protocols.org page for that Story, which provides a graph of the Story's position on the Front Page over time.
NOTE: always check whether a Story is a duplicate or not: this is a very reasonable reason for removal and unfortunately I have no way of automatically determining it in the service!
- 42252041 #3 27 points 2 comments -> Reply on Bluesky and Decentralization
- 42250429 #22 37 points 14 comments -> Show HN: Clean Your Mac with a Script
- 42252904 #21 4 points 0 comments -> Why Rust and Its Memory Safety Lulls Developers into a False Sense of Security
- 42255043 #18 5 points 1 comments -> Raspberry Pi Compute Module 5
- 42255092 #22 9 points 1 comments -> SpaceX rocket explosion shredded the upper atmosphere
- 42221152 #11 9 points 0 comments -> Ancient forest world discovered 630ft down sinkhole in China
- 42217963 #25 32 points 14 comments -> Creating a social photo frame from scratch
- 42255266 #26 4 points 2 comments -> Huffman Coding
- 42255559 #11 17 points 11 comments -> How to build 99.999% uptime payment systems
- 42257124 #22 11 points 0 comments -> Probable extinction of influenza B/Yamagata and its public health implications
- 42260401 #9 5 points 1 comments -> Factoring in the Chicken McNugget monoid (2017)
- 42260998 #13 9 points 8 comments -> Database full of 1000+ validated problems that can be turned into applications
- 42259950 #26 15 points 26 comments -> The Rise of the NormieNet – Echo chamber politics
- 42262065 #13 14 points 1 comments -> Alibaba's OpenAI Challenger: The New AI Reasoning Titan
- 42262533 #25 9 points 2 comments -> Appwrite: Open-Source Back End as a Service
- 42261707 #18 137 points 41 comments -> Dell is posting unsigned updates to their website which fail to install
- 42262057 #18 47 points 19 comments -> Tornado Cash Sanctions Found Illegal, in Legal Win for Crypto
- 42253756 #20 92 points 158 comments -> Python type hints may not be not for me in practice
- 42258407 #17 71 points 18 comments -> Generate video sprites using just FFmpeg
- 42213125 #21 57 points 26 comments -> Psychoacoustic and archeoacoustic nature of ancient Aztec skull whistles
- 42262089 #25 23 points 16 comments -> Pocket 4: Modular full-featured Handheld AI PC
- 42261314 #15 55 points 20 comments -> RomCom exploits Firefox and Windows zero days in the wild
- 42263724 #5 8 points 1 comments -> IMBA: A Curated Self-Learning MBA Inspired by 'The Personal MBA
- 42198072 #17 6 points 0 comments -> Why can't we separate YAML from ML?
- 42263315 #26 21 points 12 comments -> Microsoft says it's built an Xbox game store on Android but can't launch it
- 42263598 #29 6 points 1 comments -> Spotify cuts developer access to several of its recommendation features
- 42232496 #12 4 points 0 comments -> Best Nigerian Movies: A Journey Through Nollywood's Finest
- 42213264 #16 27 points 6 comments -> Reliably Benchmarking Small Changes – Ankush Menat
- 42265818 #18 7 points 2 comments -> Hetzner – 20 times less traffic for a higher price
- 42265226 #3 9 points 2 comments -> How we improved GPT-4o multi-step function calling success rate by 4x
- 42265667 #9 20 points 43 comments -> It is humiliating to have to do LeetCode grinding for
- 42265485 #26 12 points 5 comments -> Australia to ban under-16s from social media after passing landmark law
- 42265051 #21 10 points 11 comments -> Qodo automatically verifies PR complies with Jira ticket or GitHub issue
- 42267519 #29 7 points 0 comments -> Reddit overtakes X in popularity of social media platforms in UK
- 42260030 #26 170 points 2 comments -> Float Self-Tagging
- 42242784 #26 7 points 6 comments -> Every Hands-Free Driving System Available in 2024
- 42268475 #3 294 points 4 comments -> Hetzner raises prices while significantly lowering bandwidth (US)
- 42270893 #7 14 points 1 comments -> An updated record of Tesla fatalities and Tesla accident deaths
- 42270965 #5 9 points 2 comments -> New report advises mental health support for 'incels'
- 42270935 #20 46 points 12 comments -> US antitrust watchdog launches broad Microsoft investigation
- 42272813 #13 16 points 3 comments -> New Zealand Navy ship sank off Samoa because autopilot was left on, inquiry
- 42272732 #29 4 points 0 comments -> 'Would you survive 72 hours?' Germany&Nordics prepare citizens for possible war
- 42273650 #25 7 points 1 comments -> NGI Projects adopt Mastodon and PeerTube as main communication channels
- 42273349 #27 6 points 1 comments -> Stowaway found after boarding flight from New York to Paris
- 42275062 #1 7 points 1 comments -> What I learned solo bootstrapping 8 software products
- 42275734 #7 12 points 1 comments -> Speaking at PyTexas – CFP closes December 1, 2024
- 42275784 #5 56 points 31 comments -> Javier Milei: "My contempt for the state is infinite"
- 42273780 #20 114 points 6 comments -> Alibaba releases an 'open' challenger to OpenAI's O1 reasoning model
- 42277438 #20 8 points 3 comments -> Apple removes Active Noise Cancelling from AirPods Pro 2
- 42277963 #22 80 points 101 comments -> The Engagement Is Better on Bluesky
- 42277931 #11 20 points 4 comments -> Bluesky intends their indexing to be used by third parties
- 42276700 #14 29 points 8 comments -> Australian Online Safety Amendment (Social Media Minimum Age) Bill 2024
- 42275919 #13 105 points 58 comments -> Simple Sabotage for the 21st Century – Specific Suggestions
- 42244482 #14 121 points 26 comments -> Prioritize work at the task level
- 42273966 #16 70 points 19 comments -> How We Got the Lithium-Ion Battery
- 42278160 #19 26 points 37 comments -> TfL abandons plans for driverless tube trains
- 42244409 #25 51 points 35 comments -> Rust: Tools (early access edition)
- 42275834 #15 133 points 60 comments -> Chinese pebble-bed nuclear reactor passes "meltdown" test
- 42278617 #13 22 points 0 comments -> Virtual Geometry in Bevy 0.15
- 42276078 #12 -> Calmy Leon – The Ultimate Relaxing Music and Sound Generator
- 42279521 #15 10 points 8 comments -> 'Switches' are turning handguns into machine guns on Ontario streets
- 42280641 #11 7 points 4 comments -> Formance – The Color of Money: Towards a New Data Model for Fintech, Part II
- 42249801 #8 15 points 3 comments -> California scientists accidentally find nuclear fever dream in Arctic snow
- 42281066 #21 52 points 77 comments -> Tesla is looking to hire a team to remotely control its 'self-driving' robotaxis
- 42280865 #22 -> Possible new ancient human species uncovered
- 42281470 #23 8 points 0 comments -> In C, memory management begins – The Craft of Coding
- 42281585 #20 10 points 0 comments -> What can we learn from the Andrew Tate data breach?
- 42282020 #1 5 points 4 comments -> I made an AI specifically for teachers
- 42254385 #21 3 points 0 comments -> M4 chips: Matrix processing and Power Modes
- 42282078 #24 24 points 10 comments -> Chinese researchers indicate diamonds can store data for millions of years
- 42224492 #12 30 points 0 comments -> A Reintroduction to Programming
- 42283123 #6 14 points 1 comments -> 'I couldn't stop watching': stories of how porn obsession takes over lives
- 42283331 #24 4 points 0 comments -> Florida Man Who Spied on Verizon for China Gets 4 Years in Prison
- 42282761 #25 6 points 2 comments -> NASA's X-59 plane is aiming for a sonic thump, not a boom
- 42283949 #8 -> Bluesky Quadruples Moderation Team
- 42285196 #5 12 points 3 comments -> We need data engineering benchmarks for LLMs
- 42212377 #23 9 points 0 comments -> Bodging GenServers Together
- 42254146 #15 56 points 31 comments -> IE7 and IE7 (2005)
- 42286306 #26 5 points 1 comments -> Day 1 – Advent of Code 2024
- 42286133 #13 90 points 83 comments -> Education and Healthcare Suck for the Same Reasons
- 42286035 #18 53 points 19 comments -> Canon ships its first nanoprint lithography machine, rivals ASML
- 42285467 #25 41 points 7 comments -> Office CMBS Delinquency Rate Spikes to 10.4%, Just Below Worst of GFC Meltdown
- 42287946 #14 3 points 0 comments -> ChatGPT Learned to Reason [video]
- 42288372 #3 30 points 3 comments -> Advent of No-Code 2024
- 42288948 #27 15 points 3 comments -> America Got Mean
- 42289143 #5 6 points 2 comments -> Show HN: Open-source widget to embed OpenAI Assistant on your website
- 42288483 #27 16 points 0 comments -> Tiny Arcades [video]
- 42290663 #7 4 points 0 comments -> Hiroshi Nagai: Japan's Sun-Drenched Americana
- 42290519 #28 16 points 10 comments -> Uber's Dark Descent: How Abandoning Innovation Hurt Drivers and Gouges Riders
- 42291286 #9 25 points 13 comments -> The Imminence of the Destruction of the Space Program
- 42291365 #30 6 points 1 comments -> 'Call of Duty: Modern Warfare' Rewrites the Highway of Death as a Russian Attack
- 42292067 #26 48 points 23 comments -> Feds: Tether Has Become a Money Laundering Tool for Mexican Drug Trafficker
- 42292235 #1 57 points 11 comments -> ICP-Brasil: Mis-issued certificate
- 42292170 #15 50 points 11 comments -> Amazon Workers on Strike from Black Friday to Cyber Monday
- 42257432 #30 7 points 4 comments -> Show HN: A program to learn phrases in multiple languages at once
- 42292204 #20 40 points 14 comments -> Discovery of CVE-2024-2550 (Palo Alto)
- 42262791 #26 14 points 18 comments -> 'The Endless Refrain' asks: Do we even want new music anymore?
- 42290245 #28 65 points 42 comments -> IBM RISC System/6000 Family – Computer Ads from the Past
- 42248272 #26 127 points 197 comments -> Intel Gets Up to $7.9B Award for U.S. Chip-Plant Construction
- 42264595 #17 5 points 1 comments -> Gene behind orange fur in cats found at last
- 42296228 #30 11 points 27 comments -> Carbon dioxide capture from open air using covalent organic frameworks
- 42268307 #18 16 points 74 comments -> Bury me on the moon, preferably on the far side
- 42299687 #7 10 points 4 comments -> Natural soundscapes enhance mood recovery amid anthropogenic noise pollution
- 42302342 #25 27 points 12 comments -> European Federation of Journalists to stop posting content on X
- 42304161 #16 10 points 7 comments -> Jaguar Introduces Type 00