Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ppcomp: many updates, mainly for HTML5 #5

Open
wants to merge 2 commits into
base: rtonsing-ppcomp
Choose a base branch
from

Conversation

rtonsing
Copy link

Since I'm using a different file name, you should be able to switch easily for testing. I would still like to handle more footnote styles, but it will take time.

@rtonsing rtonsing changed the title Many updates, mainly for HTML5 ppcomp: many updates, mainly for HTML5 May 30, 2022
@tangledhelix tangledhelix self-assigned this May 31, 2022
@tangledhelix tangledhelix added the enhancement New feature or request label May 31, 2022
@tangledhelix
Copy link
Owner

Pulling in some notes that were included in the original submission

rtonsing said:

Not really an issue, but I'm not sure how else to submit it, I didn't think a pull request was appropriate.

This is humbly supplied for you to use or not as you see fit. The main purpose is to handle HTML5. I hope you are OK with the liberties I have taken.

Github project, file /ppcomp/ppcomp.py

I think this is at least ready for beta testing, if you wish to try it out. I have been comparing it to results from both PPWB and PPTools versions, and don't see any unexpected differences.

Changes:

  • use html5parser to handle new tags.
  • eliminate oe ligature transforms, treat same as ae.
  • character "downgrades" such as curly quotes to straight is only done if one file is from rounds (projectID########.txt).
  • --css-bold option has default of '=',
  • changed boilerplate removal to latest format
  • added super/subscript conversions (<sup>1</sup> to ¹ or ^1.
  • I did a LOT of rewriting, mainly to make each functionality accessible for unit testing, partly code "cleanup" or just adapting to my style so I could understand it better.

--extract-footnotes needs work, but it shouldn't be any worse.

@tangledhelix
Copy link
Owner

Integrated and up for testing at https://dev.pptools.tangledhelix.com/

@tangledhelix tangledhelix changed the base branch from main to rtonsing-ppcomp May 21, 2023 21:18
@tangledhelix
Copy link
Owner

I haven't closed this yet, by the way, because it only addresses part of the overall HTML5 support. I am leaving this PR open until I implement the rest, because I don't plan to merge this until the others are also ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants