Skip to content

Commit

Permalink
adds the format option
Browse files Browse the repository at this point in the history
  • Loading branch information
joanfabregat committed Feb 22, 2024
1 parent 7887aa1 commit 3318500
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 15 deletions.
12 changes: 7 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Additional parameters can be sent to customize the conversion process:
* `lastPage`: The last page to extract. Default is the last page of the document.
* `password`: The password to unlock the PDF. Default is none.
* `normalizeWhitespace`: If set to `true`, the server normalizes the whitespace in the extracted text. Default is `true`.
* `raw`: If set to `true`, the server returns the raw text extracted from the PDF as `text/plain`, else the text is returned as `text/json`. Default is `false`.
* `format`: The output format. Supported values are `text` (the server returns the raw text as `text/plain`) or `json` (the server returns a JSON object as `text/json`). Default is `text`.

The server returns `200` if the conversion was successful and the images are available in the response body. In case of error, the server returns a `400` status code with a JSON object containing the error message (format: `{error: string}`).

Expand All @@ -36,16 +36,18 @@ Convert a PDF file to text with a JSON response:
curl -X POST -F "file=@/path/to/file.pdf" http://localhost:3000/convert -o example.json
```

Convert a password-protected PDF file to text with a JSON response:
Convert a PDF file to text:
```bash
curl -X POST -F "file=@/path/to/file.pdf" -F "password=XXX" http://localhost:3000/convert -o example.json
curl -X POST -F "file=@/path/to/file.pdf" http://localhost:3000/convert
```

Convert a PDF file to text with a raw text response:
Extract a password-protected PDF file's text content as JSON and save it to a file:
```bash
curl -X POST -F "file=@/path/to/file.pdf" -F "raw=true" http://localhost:3000/convert -o example.txt
curl -X POST -F "file=@/path/to/file.pdf" -F "password=XXX" -F "format=json" http://localhost:3000/convert -o example.json
```



## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
20 changes: 10 additions & 10 deletions main.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,16 @@ app.post('/convert', upload.single('file'), async (req, res) => {
}
)();

// send the text
if (req.body.raw === "true") {
// send the content as raw text or JSON
if (String(req.body.format).toLowerCase() === "json") {
delete extractResult.filename;

extractResult.pages.forEach(page => {
page.content.forEach(content => delete content.fontName);
});

res.json(extractResult);
} else {
res.send(
extractResult.pages.reduce(
(acc1, page) => acc1 + page.content.reduce(
Expand All @@ -60,14 +68,6 @@ app.post('/convert', upload.single('file'), async (req, res) => {
'',
),
);
} else {
delete extractResult.filename;

extractResult.pages.forEach(page => {
page.content.forEach(content => delete content.fontName);
});

res.json(extractResult);
}

// cleaning up
Expand Down

0 comments on commit 3318500

Please sign in to comment.