Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Substack with a custom domain #3244

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 36 additions & 3 deletions Substack.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@
"translatorID": "ac3b958f-0581-4117-bebc-44af3b876545",
"label": "Substack",
"creator": "Abe Jellinek",
"target": "^https://[^.]+\\.substack\\.com/(p/|archive)",
"target": "/p/|/archive",
Copy link
Contributor

@alex-ter alex-ter Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is of course quite generic, so this translator will be considered for quite a few URLs I suspect (e.g., many sites plausibly have some "archive"). A couple of suggestions in view of that:

  1. The priority field should be bumped up to "250", I think. See translator priority docs for details. This is accounting for that a.footer-substack-cta check, which I think can be counted as a "unique check in detectWeb".
  2. I'd suggest specializing the target pattern as much as possible and at least adding a (?:$|\?) (non-capturing because performance 🙂) for the archive part (see a proposal and a counterexample below). /p/ is trickier as I'm not sure what symbols Substack could use there, but maybe something along the same lines, i.e., /p/<something-but-not-/>$? Though I see in the test set that there may be a /p/<post name>/comments, so maybe not exactly that - essentially, this is just a suggestion to think twice of any additional options.

As for the archive, below is the proposed alteration and here's the counterexample link I could think of OTMH: IACR ePrint paper version record.

Suggested change
"target": "/p/|/archive",
"target": "/p/|/archive(?:$|\?)",

"minVersion": "3.0",
"maxVersion": "",
"priority": 100,
"inRepository": true,
"translatorType": 4,
"browserSupport": "gcsibv",
"lastUpdated": "2022-10-05 15:16:38"
"lastUpdated": "2024-01-29 20:12:55"
}

/*
Expand Down Expand Up @@ -37,6 +37,8 @@


function detectWeb(doc, url) {
if (!url.match(/^https:\/\/[^.]+\.substack\.com\/(p\/|archive)/) && !text(doc, "a.footer-substack-cta"))
return false;
if (url.includes('/p/')) {
return "blogPost";
}
Expand All @@ -49,7 +51,7 @@ function detectWeb(doc, url) {
function getSearchResults(doc, checkOnly) {
var items = {};
var found = false;
var rows = doc.querySelectorAll('a.post-preview-title[href*="/p/"]');
var rows = doc.querySelectorAll('a[data-testid="post-preview-title"][href*="/p/"]');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good one - looks like this part is plain broken right now due to this change in the markup. In view of that, even if the pattern change doesn't get in (e.g., I can imagine maintainers proposing an alternative approach to keep the target specialized due to performance reasons), this one is worthwhile as a standalone fix.

for (let row of rows) {
let href = row.href;
let title = ZU.trimInternal(row.textContent);
Expand Down Expand Up @@ -225,6 +227,37 @@ var testCases = [
"seeAlso": []
}
]
},
{
"type": "web",
"url": "https://www.latent.space/p/ai-ux-moat",
"items": [
{
"itemType": "blogPost",
"title": "How to Make AI UX Your Moat",
"creators": [
{
"firstName": "Anshul",
"lastName": "Ramachandran",
"creatorType": "author"
}
],
"date": "2023-07-07",
"abstractNote": "Design great AI Products that go beyond \"just LLM Wrappers\": make AI more present, more practical, and then more powerful.",
"blogTitle": "Latent Space",
"url": "https://www.latent.space/p/ai-ux-moat",
"websiteType": "Substack newsletter",
"attachments": [
{
"title": "Snapshot",
"mimeType": "text/html"
}
],
"tags": [],
"notes": [],
"seeAlso": []
}
]
}
]
/** END TEST CASES **/