Skip to content

Commit

Permalink
fix conflicts
Browse files Browse the repository at this point in the history
  • Loading branch information
thiagosalvatore committed Oct 25, 2024
2 parents a318400 + 2652618 commit 94d0011
Show file tree
Hide file tree
Showing 23 changed files with 5,239 additions and 1,049 deletions.
14 changes: 4 additions & 10 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ assignees: ''

---

_Note: we're aware of some missing content in the output and layout issues on tables. Please refrain from opening new issues on this topic unless if you think it's different from what has already been reported._

**Describe the bug**
Write a concise description of what the bug is.

Expand All @@ -19,19 +17,15 @@ If possible, please provide the PDF file causing the issue.
If you have it, please provide the ID of the job you ran.
You can find it here: https://cloud.llamaindex.ai/parse in the "History" tab.

**Screenshots**
Feel free to also provide screenshots if relevant.

**Client:**
Please remove untested options:
- Frontend (cloud.llamaindex.ai)
- Python Library
- API
- Frontend (cloud.llamaindex.ai)
- Typescript Library
- Notebook
- API

**Options**
What options did you use? Multimodal, fast mode, parsing instructions, etc.

**Additional context**
Add any additional context about the problem here.
What options did you use? Premium mode, multimodal, fast mode, parsing instructions, etc.
Screenshots, code snippets, etc.
55 changes: 46 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,26 @@
# LlamaParse

LlamaParse is an API created by LlamaIndex to efficiently parse and represent files for efficient retrieval and context augmentation using LlamaIndex frameworks.
[![PyPI - Downloads](https://img.shields.io/pypi/dm/llama-parse)](https://pypi.org/project/llama-parse/)
[![GitHub contributors](https://img.shields.io/github/contributors/run-llama/llama_parse)](https://github.com/run-llama/llama_parse/graphs/contributors)
[![Discord](https://img.shields.io/discord/1059199217496772688)](https://discord.gg/dGcwcsnxhU)

LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).
LlamaParse is a **GenAI-native document parser** that can parse complex document data for any downstream LLM use case (RAG, agents).

It is really good at the following:

Free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page.
-**Broad file type support**: Parsing a variety of unstructured file types (.pdf, .pptx, .docx, .xlsx, .html) with text, tables, visual elements, weird layouts, and more.
-**Table recognition**: Parsing embedded tables accurately into text and semi-structured representations.
-**Multimodal parsing and chunking**: Extracting visual elements (images/diagrams) into structured formats and return image chunks using the latest multimodal models.
-**Custom parsing**: Input custom prompt instructions to customize the output the way you want it.

LlamaParse directly integrates with [LlamaIndex](https://github.com/run-llama/llama_index).

There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse**](https://cloud.llamaindex.ai/parse).
The free plan is up to 1000 pages a day. Paid plan is free 7k pages per week + 0.3c per additional page by default. There is a sandbox available to test the API [**https://cloud.llamaindex.ai/parse**](https://cloud.llamaindex.ai/parse).

Read below for some quickstart information, or see the [full documentation](https://docs.cloud.llamaindex.ai/).

If you're a company interested in enterprise RAG solutions, and/or high volume/on-prem usage of LlamaParse, come [talk to us](https://www.llamaindex.ai/contact).

## Getting Started

First, login and get an api-key from [**https://cloud.llamaindex.ai/api-key**](https://cloud.llamaindex.ai/api-key).
Expand All @@ -27,7 +38,22 @@ Lastly, install the package:

`pip install llama-parse`

Now you can run the following to parse your first PDF file:
Now you can parse your first PDF file using the command line interface. Use the command `llama-parse [file_paths]`. See the help text with `llama-parse --help`.

```bash
export LLAMA_CLOUD_API_KEY='llx-...'

# output as text
llama-parse my_file.pdf --result-type text --output-file output.txt

# output as markdown
llama-parse my_file.pdf --result-type markdown --output-file output.md

# output as raw json
llama-parse my_file.pdf --output-raw-json --output-file output.json
```

You can also create simple scripts:

```python
import nest_asyncio
Expand Down Expand Up @@ -76,13 +102,18 @@ parser = LlamaParse(
language="en", # Optionally you can define a language, default=en
)

with open("./my_file1.pdf", "rb") as f:
documents = parser.load_data(f)
file_name = "my_file1.pdf"
extra_info = {"file_name": file_name}

with open(f"./{file_name}", "rb") as f:
# must provide extra_info with file_name key with passing file object
documents = parser.load_data(f, extra_info=extra_info)

# you can also pass file bytes directly
with open("./my_file1.pdf", "rb") as f:
with open(f"./{file_name}", "rb") as f:
file_bytes = f.read()
documents = parser.load_data(file_bytes)
# must provide extra_info with file_name key with passing file bytes
documents = parser.load_data(file_bytes, extra_info=extra_info)
```

## Using with `SimpleDirectoryReader`
Expand Down Expand Up @@ -126,3 +157,9 @@ Several end-to-end indexing examples can be found in the examples folder
## Terms of Service

See the [Terms of Service Here](./TOS.pdf).

## Get in Touch (LlamaCloud)

LlamaParse is part of LlamaCloud, our e2e enterprise RAG platform that provides out-of-the-box, production-ready connectors, indexing, and retrieval over your complex data sources. We offer SaaS and VPC options.

LlamaCloud is currently available via waitlist (join by [creating an account](https://cloud.llamaindex.ai/)). If you're interested in state-of-the-art quality and in centralizing your RAG efforts, come [get in touch with us](https://www.llamaindex.ai/contact).
Binary file added examples/data/BP_Excel.xlsx
Binary file not shown.
2 changes: 1 addition & 1 deletion examples/demo_json.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "llama-parse-aNC435Vv-py3.10",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand Down
1,515 changes: 1,515 additions & 0 deletions examples/excel/o1_excel_rag.ipynb

Large diffs are not rendered by default.

Binary file added examples/excel/references/query1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/excel/references/query2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/excel/references/query3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/excel/references/query4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added examples/excel/references/query5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 94d0011

Please sign in to comment.