Missing images in full content mode #506

jocmp · 2024-11-14T04:36:55Z

Background

Capy Reader uses a library called Readability4J that has a few rules to parse the article's full content.

Sometimes those rules fail leading to missing images in Capy's full content mode. This is an annoying issue without a single fix-all solution. Every website is different and changes over time which is part of the beauty and chaos of the web.

If you run into this issue with a feed, please post a link to the feed with an example to this thread. I'll track these to fix some point in the future. Thanks!

Feeds

PhilC813 · 2024-11-29T14:22:20Z

The articles' main image isn't shown in Capy's full content mode for the following feed:
https://mobilesyrup.com/feed/

Article example:
https://mobilesyrup.com/2024/11/28/google-releases-ai-generated-pieces-chess-game/

(I only noticed this today so maybe it used to work?)

HTML of the image:
<img fetchpriority="high" width="1867" height="1046" src="https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess.jpg" class="attachment-full size-full wp-post-image" alt="" decoding="async" srcset="https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess.jpg 1867w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-300x168.jpg 300w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-1024x574.jpg 1024w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-768x430.jpg 768w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-1536x861.jpg 1536w, https://cdn.mobilesyrup.com/wp-content/uploads/2024/11/gen-ai-chess-417x235.jpg 417w" sizes="(max-width: 1867px) 100vw, 1867px" />

jocmp · 2024-11-30T03:19:48Z

@PhilC813 an update. I'm toying around with Mercury Parser again and seeing some potential upsides. Here's a comparison of a Les Versants article.

Before	After

Mobile Syrup

Before	After

PhilC813 · 2024-11-30T03:27:07Z

Waw, seems very promising.

Do you mind checking with this article?
https://mobilesyrup.com/2024/11/28/here-are-the-2024-staples-black-friday-deals/

It's an article with Black Friday deals, and the current parser basically removes all the bullet points in which the deals are listed 😅

jocmp · 2024-11-30T03:54:26Z

The new parser skips over lists by default, but with a little bit of code it works: https://github.com/jocmp/capyreader/pull/569/files#diff-a5310ab57bf17835286b2a012ceca522b0f9af190ceeea2dcf80c52f82c6479dR41-R49

PhilC813 · 2024-11-30T04:24:26Z

So you can easily specify the <ul> tag as an exception, sweet. Frankly I don't really see a reason why they would be excluded by default. They are more likely to be content than ads.

Also, is there any parser that is still actively maintained? Mercury seems abandoned like Readability4J. It's not necessarily a problem, but having an active project is always a +.

jocmp · 2024-11-30T04:32:41Z

Couldn't agree more. I think Mercury is more extensible and maintainable between the two. I forked it and I'm working on bringing its dependencies up to date here: https://github.com/jocmp/mercury-parser.

PhilC813 · 2024-12-04T06:26:19Z

I've updated the app to 2024.12.1080-dev and despite the reintroduction of Mercury, I'm not seeing the results you shared above with the quick check I've done with the feed "Les Versants".

Screenrecorder-20241204-011225.mp4

As you can see, in the same article you used for testing, the headline is still missing, and all those grey enclosures further down actually correspond to ad placements. Then there's the last ad of the page that does manage to render.

Also, it seems like the sticky configuration of the "Extract full content" button doesn't work properly in this build.

In an article, if you tap the button to turn it off, then tap it again to turn it back on, and move to article of the same feed, it will be off upon opening an article of the same feed.

jocmp · 2024-12-04T20:23:02Z

Let me take another look. I may be able to filter out those ad placements too. Just to make sure I'm testing the same thing, are you using a local account?

About the sticky config, I'm able to reproduce that bug. I'll follow up with a different ticket to fix that. #576

PhilC813 · 2024-12-05T05:01:37Z

Just to make sure I'm testing the same thing, are you using a local account?

I'm using Capy with my Feedbin account.

About the sticky config, I'm able to reproduce that bug. I'll follow up with a different ticket to fix that. #576

Don't give up!! 😆

jocmp · 2024-12-05T06:09:39Z

Aha, I use Feedbin's copy of Mercury Parser for those accounts. Local accounts rely on the Mercury Parser that I'm updating. So they're different right now.

I'll see what I can do to use the same version of the parser everywhere. It should result in a more consistent experience across the board.

jocmp · 2024-12-07T00:43:21Z

@PhilC813 I enabled the updated Mercury Parser for Feedbin accounts in 2024.12.1081-dev and also fixed the sticky content bug. Let me know how it works for you!

PhilC813 · 2024-12-07T01:12:12Z

Seeing some extremely positive results so far. I'm also seeing some YouTube videos that were filtered out before now being displayed properly. Solid update..!

privacyadmin · 2024-12-21T10:07:14Z

Possible to fix articles for this domain?

Seems like all the text and images in their articles are missing/incomplete.

Below are some examples

https://www.hardwarezone.com.sg/feature-how-spot-potential-scam-messages-ios-and-android-singapore-rcs-sms

https://www.phoronix.com/news/Raspberry-Pi-HEVC-H265-Decode

jocmp · 2024-12-22T01:21:56Z

hey @privacyadmin I'll take a look. Can you open a new issue for each of those feeds using this template? https://github.com/jocmp/capyreader/issues/new?labels=full%20content%20request&template=2-full-content-request.yml

I want to close out this mega-issue since it's hard to track

jocmp added the bug Something isn't working label Nov 14, 2024

jocmp self-assigned this Nov 14, 2024

jocmp added this to Capy Reader Nov 14, 2024

jocmp moved this to On Deck in Capy Reader Nov 14, 2024

jocmp removed the status in Capy Reader Nov 27, 2024

jocmp changed the title ~~Investigate missing images~~ Missing images in full content mode Nov 27, 2024

jocmp pinned this issue Nov 27, 2024

jocmp moved this to Parking Lot in Capy Reader Nov 27, 2024

jocmp mentioned this issue Nov 30, 2024

Bring back Mercury #569

Merged

jocmp mentioned this issue Dec 3, 2024

can't fullscreen images in full content version #352

Closed

jocmp mentioned this issue Dec 4, 2024

Sticky config doesn't stick after toggle #576

Closed

jocmp unpinned this issue Dec 7, 2024

jocmp added full content request and removed bug Something isn't working labels Dec 7, 2024

jocmp closed this as completed Dec 22, 2024

github-project-automation bot moved this from Parking Lot to Done in Capy Reader Dec 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing images in full content mode #506

Missing images in full content mode #506

jocmp commented Nov 14, 2024 •

edited

Loading

PhilC813 commented Nov 29, 2024

jocmp commented Nov 30, 2024

PhilC813 commented Nov 30, 2024

jocmp commented Nov 30, 2024

PhilC813 commented Nov 30, 2024 •

edited

Loading

jocmp commented Nov 30, 2024

PhilC813 commented Dec 4, 2024 •

edited

Loading

jocmp commented Dec 4, 2024 •

edited

Loading

PhilC813 commented Dec 5, 2024

jocmp commented Dec 5, 2024

jocmp commented Dec 7, 2024

PhilC813 commented Dec 7, 2024

privacyadmin commented Dec 21, 2024 •

edited

Loading

jocmp commented Dec 22, 2024

Missing images in full content mode #506

Missing images in full content mode #506

Comments

jocmp commented Nov 14, 2024 • edited Loading

Background

Feeds

PhilC813 commented Nov 29, 2024

jocmp commented Nov 30, 2024

PhilC813 commented Nov 30, 2024

jocmp commented Nov 30, 2024

PhilC813 commented Nov 30, 2024 • edited Loading

jocmp commented Nov 30, 2024

PhilC813 commented Dec 4, 2024 • edited Loading

jocmp commented Dec 4, 2024 • edited Loading

PhilC813 commented Dec 5, 2024

jocmp commented Dec 5, 2024

jocmp commented Dec 7, 2024

PhilC813 commented Dec 7, 2024

privacyadmin commented Dec 21, 2024 • edited Loading

jocmp commented Dec 22, 2024

jocmp commented Nov 14, 2024 •

edited

Loading

PhilC813 commented Nov 30, 2024 •

edited

Loading

PhilC813 commented Dec 4, 2024 •

edited

Loading

jocmp commented Dec 4, 2024 •

edited

Loading

privacyadmin commented Dec 21, 2024 •

edited

Loading