Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic html loader with crawly #22

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

warnero
Copy link

@warnero warnero commented Oct 20, 2023

  • Added Document and DocumentLoader Behaviours
  • Added Crawly DocumentLoader

warnero and others added 3 commits October 14, 2023 16:52
* adding readme function overview image

* updated image

* fixing image alt text

* centering image?

* reduced image

* tweak image url for cache busting

* rename image to update

* chat chain logo image

* added logo image

* logo-32px

* don't commit graphic sources

* graphic updates

* fixed spelling

* cleanup the configuration README

* make chatgpt response tests more robust

Even when given specific instructions like "Return the response 'Hi'."
ChatGPT (and LLMs in general) don't always follow the
instructions *exactly* (for example, ChatGPT will often respond to the above
prompt with "Hi!").

As a result, equality testing on the response makes for flaky tests. This change
keeps the test prompts, but instead matches on the responses with `=~`. Still
not perfect, but less likely to be flaky, which in tests seems like a win.

* link to demo project

* add "update_custom_context" to LLMChain
- added tests

* add support for setting the `OpenAI-Organization` header in requests to the OpenAI API

* set pattern match in `DataExtractionChain` to look for `role: :assistant` as it appears to be the only valid result at this stage

* improved the data extraction prompt
- didn't consistently handle 'null' values

* update readme
- add example of openai_org_id config

* improve pattern match on data extraction chain

* update version

* updated changelog

* put "Elixir" in the Readme title

---------

Co-authored-by: Mark Ericksen <[email protected]>
Co-authored-by: Ben Swift <[email protected]>
Co-authored-by: Adam Mokan <[email protected]>
@warnero
Copy link
Author

warnero commented Oct 24, 2023

Hey @brainlid I wanted to split up my work into smaller chunks so I can get it in (and others can play with the blocks/revamp/etc.). How does this one look?

@matthusby
Copy link
Contributor

@brainlid I see this has been sitting for a while. I am planning on doing some data loading from api's soon, and was wondering if there are plans to integrate this PR or some sort of document in general?

@brainlid
Copy link
Owner

brainlid commented Aug 24, 2024 via email

@matthusby
Copy link
Contributor

I am not doing anything too fancy, just planning to pull in some jira tickets and maybe github issues.

My main question is what do you think of using the Document model that is in this PR? I would like to stick to a standard way of doing the document loading etc, at first glance this seems fine - but wanted to make sure I wasn't missing something.

@brainlid
Copy link
Owner

what do you think of using the Document model that is in this PR

I think the Document model was incomplete. The idea was to base it on the TS/Python LangChain Document idea. I'm not using it personally nor do I have any short-term needs for it. However, I'm open to that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants