Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Footnotes #137

Open
0x4007 opened this issue Oct 3, 2024 · 8 comments
Open

Footnotes #137

0x4007 opened this issue Oct 3, 2024 · 8 comments

Comments

@0x4007
Copy link
Member

0x4007 commented Oct 3, 2024

ubiquity-os-marketplace/text-vector-embeddings#25 (comment) caught my eye as being too high but I'm reviewing the statistics:

content:
  content: # strange there's content.content
    p:
      score: 0
      elementCount: 3
    a:
      score: 5 # looks like its counting the footnotes as links. I only see three related to the footnotes (these should be hardcoded to be removed.) but there is one unaccounted for that I cant find?
      elementCount: 2
    pre:
      score: 0
      elementCount: 1
    h6:
      score: 1
      elementCount: 1
  result: 11

regex:
  wordCount: 116
  wordValue: 0.2
  result: 11.37

I think we have an unaddressed scenario of dealing with footnotes. So we'll need to make a new task then. I also have a feeling that it is counting words within the code block, we should not have this. This should be indicated in the analytics overview if a tag words are being ignored. Also its strange to me that it parsed it as pre instead of code perhaps its because I didn't include the syntax highlighting header?

Regarding config, I think wordCount should probably default to 0.1 including for author, not sure why its 0.2!

  1. Ignore links related to footnotes
  2. Do not include footnotes in word count credit

Originally posted by @0x4007 in ubiquity-os-marketplace/text-vector-embeddings#25 (comment)

@zugdev
Copy link

zugdev commented Oct 9, 2024

/start

Copy link
Contributor

ubiquity-os bot commented Oct 9, 2024

Deadline Wed, Oct 9, 10:33 PM UTC
Beneficiary 0xbB689fDAbBfc0ae9102863E011D3f897b079c80F

Tip

  • Use /wallet 0x0000...0000 if you want to update your registered payment wallet address.
  • Be sure to open a draft pull request as soon as possible to communicate updates on your progress.
  • Be sure to provide timely updates to us when requested, or you will be automatically unassigned from the task.

@zugdev
Copy link

zugdev commented Oct 9, 2024

@0x4007 I am under the impression that I could modify data-purge-module.ts or formatting-evaluator-module.ts. From my understanding of the specification, footnotes shouldn't count as links nor paragraphs nor increment word count. Therefore, I could use regexp to trim footnote entirely in purge module. If you want to keep it but just not count it then I could use formatting evaluator module.

@0x4007
Copy link
Member Author

0x4007 commented Oct 9, 2024

There's two approaches I have in mind.

  1. Ideally you look at the edit history and remove whatever was added by any bot (or even better, credit anybody who contributed to editing the specification, but bots are automatically excluded)
  2. Regex for the ⚠️ symbol footnote which we are currently using for indicating a potential match. But this solution is not robust and not preferred.

@zugdev
Copy link

zugdev commented Oct 10, 2024

So, 1. is to evaluate content's origin and ignore bot added content, simultaneously rewarding other users quoted content. This one should also solve #74 and seems much better. I am up for it, just need to get my kernel running locally first.

@0x4007
Copy link
Member Author

0x4007 commented Oct 10, 2024

I think #74 is different because using the context menu to repost another persons comment is a one shot write. The revision history would not show the other user's write. Only the author's.

The reward algorithm only considers the current issue and linked pull.

A more advanced version could scrape the "originally posted at" link, match the quoted text, and credit the original author instead.

@zugdev
Copy link

zugdev commented Oct 10, 2024

I'll unassign myself from this for now since there are other tasks I believe to be more urgent/interesting.

@zugdev
Copy link

zugdev commented Oct 10, 2024

/stop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants