💡PapaParse 6.0 #748

dboskovic · 2019-11-23T22:11:45Z

Would love to get your feedback on my wishlist for Papaparse. We're going to dedicate some time for the team @ Flatfile before the end of year to move anything on this list we're agreed on to a next release.

1. Migrate to ES7 + Typescript or Flow

This is something that, after a deep audit of the source code we feel is necessary to ensure long term reliability. In a data oriented library like this, type strength will allow for easier reasoning about the code, add stability and prevent unseen bugs. ES7 will allow for more readable code

Since not everybody understands TS or Flow deeply enough to have confidence to contribute. My recommendation is to keep the config permissive to allow for vanillaJS contributions and have core contributors add stronger typing before merging to master.

2. Separate NodeJS build from Browser build

This will allow for a lighter package when using in the browser (vast majority based on cursory analysis), as well as open more freedom to invest in optimizations for each stack independently. These could either be distributed as different packages (eg. @papaparse/core and @papaparse/node) or a second build in the same package. (eg. (import PapaParse from 'papaparse/node')

3. Reduce core sugar and add plugin framework

We've noticed a lot of the open issues relate to desired support of edge cases or unique data scenarios that shouldn't be treated as part of the core "csv parser" but are entirely legitimate use cases. The goal here is to distribute a core package with common functionality and allow users to choose additional use cases as needed.

Candidates:

@papaparse/http - adapter for downloading or streaming data from web - can be optimized separately for nodejs and browser as well as opens up for other adapters for things like S3 with plenty of optimizations.
@papaparse/types - split out the typecasting logic, there's a lot of room for improvement here w/better understanding of boolean types, dates, etc. But it doesn't make sense to invest that into a core csv parser library.
@papaparse/unparse - there's been a decent amount of confusion with users about how different configurations relate to parse vs. unparse. These are also distinctly different problems to solve for.

Future Candidates:
Things like detect-encoding to auto detect file encoding, generous-escaping (for the common unescaped quotes situation), and many other user requests. Additionally framework specific components like an HoC for React could be awesome.

4. Improved docs

Would love to see updated searchable docs with both auto generated API references as well as guides, fiddles, and improved demo. I'm a fan of docusaurus for this. We'd be happy to contribute content & design here.

5. Reorganized source code

With almost 2000 lines in papaparse.js it's time to tackle deconstructing that a bit into components that are easier to reason about. Since 4b16215 6 years ago (335 lines) when most of the tools we have at our disposal today weren't available, we haven't changed much. Time for a src folder! Let's follow https://sourcemaking.com/refactoring as a guide

6. Tests, coverage, cross-browser testing & CI based distribution

We should take advantage of setting up the Sauce testing matrix so we don't break things in old browsers as we go. Also, it'd be great if we could use Github Actions to auto deploy master and release candidates to npm / bower / etc. In addition we should improve unit test coverage in addition to the mainly acceptance testing we have now.

7. Other: Pipes, Promises, etc.

Piping for easier composition of logic, also relatively required for plugin framework.
Promises because it's 2019 (shim for legacy browsers)
Allow for some dependencies and ensure fossa.com scans are run on them - let's not re-invent all the wheels.
Functional first - no classes unless necessary
Package decomposition - As seen in the above example, switching to an npm org w/multiple packages and likely using something like lerna to manage the packages.

8. Backwards compatibility adapter

Because this would be a pretty robust overhaul, we should publish an adapter that's fully backwards compatible with re-composed elements. Possibly @papaparse/legacy - allowing people to move forwards without a complete overhaul. It could also identify the things they aren't using and give them a custom migration checklist.

We're happy to take on the work of this overhaul here at Flatfile - so keep in mind we're not asking for a lot of work from the community. But do please provide feedback on all of this, we want to chart a path forwards that makes sense to everybody.

Also, what do you want to see? Comment with new ideas or criticisms / approval of the above.

The text was updated successfully, but these errors were encountered:

caverna · 2019-11-25T08:38:40Z

👍 for Typescript! (or at least an official .d.ts file!)

pokoli · 2019-11-28T10:39:34Z

Hi @dboskovic,

First of all thank you so much for all the effort put on your issue. I think they are very nice proposals and we should go forward and implement most of them. I have some concers I will like to discuss without going forward:

About TypeScript I seems good to me as far as we also allow to use PapaParse with vanilla javascript.
About the plugin I'm not sure if this will be to much enginering for a CSV Parsing library. Maybe we should implement first one of the proposed plugins to have a good structure and the implement the others.
Another doubt about the plugin system is that I'm not sure if they will be distrubuted on the same package or included on separate packages. In case the proposal is to create separate packages we need to define where this code wil be managed (on the same repository, on a separate repository) and how the integration should work.

I do not much the idea of a backwards compatible layer. I prefer to have a timeline to support the 5.0x series, so we fix issues on this branch while allowing users to test the new version.

For the documentation we only need to generate some static files in html format than can be latter uploaded to our website, so any dynamism here makes sense. I think that we can change the documentation format without requiring a new version to be released.

For all the other points, please go forward on them and strat creating PR for them!
My idea is to release 5.2.0 before merging breaking changes to master branch.

DRocksCoding · 2020-02-08T16:41:23Z

Thanks, I'm new to javascript and I was able to read a csv with your website example and your documentation it's good.

I also like the idea of getting core parts available on their own for less complex situations.
Thank you,
Cheers

Alex

jimmywarting · 2020-03-11T23:28:33Z

es7 + jsdoc and/or d.ts

wish to be able to use native import in browser without having to bundle or transpile the scripts. typescript is not javascript and dose not work in browser.
(and for the record use full path with extension so browser can properly require related files)

jimmywarting · 2020-03-11T23:35:19Z

if you would like to write cross node/browser maybe you should consider abending streams and use async iterator instead

maxwell8888 · 2020-09-03T11:35:22Z

It would be great to provide an ES module build. Most javascript packages provide both a node.js build, and an ES module build. If you use tools like Webpack or Rollup (recommended) it is simply a case of specifying in the config you would like to additionally output a module build, which is then referenced for example module: es_module/index.jsinpackage.json`. At the moment, I am not able to bundle PapaParse with Rollup. I assume that other users are either not bundling and simply including the browser script, or they are using Webpack, which is the most popular tool, however rollup is the second most popular and very heavily used.

gunn · 2020-12-14T10:18:43Z

I would like to see support for streaming data from node as it is unparsed as per #568 and #652.

DaSchTour · 2020-12-20T18:02:25Z

I would like to see support for streaming data from node as it is unparsed as per #568 and #652.

But please implement it in a way that this library still can be used in browser ;)

- Use RollupJS and Terser for bundling - Bumped up dev dependency versions - Bumped up Travis NodeJS version matrix - Refactored Papa worker identification - Refactored worker logic to use `import.meta.url` or `document.currentScript.src` mholt#748 mholt#813

MarsianMan · 2021-07-13T06:59:41Z

Promises because it's 2019 (shim for legacy browsers)

Maybe move the milestone to async/await for 2021 😄

jimmywarting · 2021-07-13T09:35:07Z

I wish you would use a async generator instead of importing node streams
https://github.com/cross-js/cross-js#dont-create-node-or-web-readable-stream-yourself

(or use whatwg stream on nodejs) so it could work grate across Web/Deno/NodeJS

MarsianMan · 2021-07-14T11:03:40Z

I wish you would use a async generator instead of importing node streams
https://github.com/cross-js/cross-js#dont-create-node-or-web-readable-stream-yourself

(or use whatwg stream on nodejs - node just shipped it as experimental) so it could work grate across Web/Deno/NodeJS

Async iterator is the main reason I decided to use csv-parse over papaparse even though there is a slight performance penalty.

jimmywarting · 2021-07-14T11:42:01Z

Async iterator is the main reason I decided to use csv-parse over papaparse even though there is a slight performance penalty.

I do it for similar reasons on other libraries as well just for better cross environment compatibility. and not having to import a hole node module into browsers

reason why it can be slower is that it don't have a bucket. the highwatermark is basically 1

a grate way to boost it doe is to use const newAsyncIterable = stream.Readable.from(iterator, { highWaterMark: x })

FunMiles · 2021-12-09T15:04:36Z

Would it be possible to have async functions for step ? Let's say I need to store each parsed row in a database (e.g. indexeddb or some remote (no-)SQL database) that has an async interface. Currently I build an array and the process the array in complete. Having step being async and having the parser await the promise returned by step would allow me to avoid that.

Edit: Of course I can use pause and resume (though not if using a worker, according to the docs). It does address my immediate need, but having the possibility of step being async is much more elegant and easier to write and read.

dmost714 · 2022-01-05T14:51:58Z

PapaParse has been a pleasure to use.
I'm late to the party, but here's my wishlist:

Handle Byte Order Mark (BOM) chars at the beginning of the file. If these aren't stripped, the 1st header gets messed up. It's easy to fix using beforeFirstChunk, but would be a nice flag.
Async step function will be great. However, if network calls are involved (e.g. writing to DB or validating a postal address) it's too taxing to process records one at a time. I frequently batch-up records in step until I have batch_size rows, then process them all. Being able to use step and set a 'batch_size' would be great, but at the expense of bloating the library and documentation just a tad.
unparse that supports node streams

riosje · 2022-03-23T04:22:06Z

Guys you need to implement this specification, to allow much more complex structures.
https://csvjson.org/
https://csvjson.com/csvjson2json

anyway, this is a nice tool, congrats...

sandstrom · 2022-04-14T09:47:52Z

I think an update should strive to add as little new stuff as possible. Instead, try to reduce the surface, making the project easier to maintain.

I would remove the following:

jQuery (rarely used these days)
Bower (rarely used these days)
remote file handling (easily handled outside PapaParse)
minification (better handled outside this lib; out of scope)
parsing local files (easily handled outside this library)
preview functionality (just slice the data instead; out of scope)
And any other exotic stuff

On typescript, it's double-edged. Yes, easier to maintain when you know it, but also harder for people to contribute to the project. Also, pretty large rewrite, not necessarily worth it.

Also, instead of plugins, just provide a few simple callback JS hooks. For example beforeParse, afterParse, etc. that people can use for customization.

Much less work to maintain than a full-fledged plugin system.

rohanrajpal · 2022-11-16T12:58:16Z

Also, since Node 18 has a native fetch implementation, would it be possible to shift from XMLHttpRequest to Fetch? This would enable the remove CSV parsing option for Node backends!

pokoli · 2022-11-16T12:59:58Z

@rohanrajpal what about users running PapaParse on the browser? Do we also have a fetch implementation available on browsers?

rohanrajpal · 2022-11-16T13:47:43Z

@rohanrajpal what about users running PapaParse on the browser? Do we also have a fetch implementation available on browsers?

It is available, right?
https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API

I happen to use fetch() only for our frontend API calls, or am I confusing Fetch with something else here? Still new to this world of web dev so I might be wrong here.

Elias-Graf · 2022-11-17T07:59:54Z

@pokoli @rohanrajpal The fetch API (https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API) was a browser only (or at least not in node) API up until recently. There has been a package to implement the behaviour of fetch called node-fetch (https://www.npmjs.com/package/node-fetch). As of node v18, there exists a native, experimental fetch implementation: https://nodejs.org/en/blog/announcements/v18-release-announce.

pokoli · 2022-11-17T08:17:34Z

@Elias-Graf thanks for the info.
The problem is that we need to support all maintaned versions of node so until node16 is unmaintained we can not introduce features of node18.

Elias-Graf · 2022-11-17T09:36:27Z

@pokoli I'm aware. I was just trying to clarify what #748 (comment) said.

Btw. I have quite a bit of TypeScript experience, if you guys need help with that :)

RuBaWa · 2023-01-13T11:19:54Z

Hi, is there any progress, to get papaparse as an es module? I have problems with vite 4, because version 5 is only available for commonjs.

Thanks for your work on this project :-)

landisdesign · 2023-03-24T21:38:28Z

I'd love to be part of the conversion to a functional programming model. So much of what I see here seems to easily translate to that, and making the data/variable flows easier to track would be a joy to behold.

o-t-w · 2023-03-26T13:20:56Z

On point 2 "Separate NodeJS build from Browser build" - there has been so much standardisation between Node and web browsers now (and ideally people could also use this in non-Node backend environments like Deno and Bun) couldn't you just make the code interoperable rather easily? Maybe use web streams instead of Node streams?

ishowman · 2023-03-29T01:44:27Z

When could we start?I am glad to contribute to it.

vincerubinetti · 2023-08-03T15:49:50Z

I wish you would use a async generator instead of importing node streams

Is there a clever way anyone can think of to make an async generator wrapper function around papaparse.parse(stream, { step, complete })?

Related:
https://stackoverflow.com/questions/50862698/how-to-convert-node-js-async-streaming-callback-into-an-async-generator
https://stackoverflow.com/questions/63749853/possible-to-make-an-event-handler-wait-until-async-promise-based-code-is-done

o-t-w · 2023-08-03T21:22:10Z

@Elias-Graf thanks for the info.

The problem is that we need to support all maintaned versions of node so until node16 is unmaintained we can not introduce features of node18.

Active support for Node 16 ended in Oct 2022. Security support ends one month from now.

leeoniya · 2023-09-04T16:50:30Z

by the looks of it, there's not a whole lot of movement towards 6.0 here, so i'll mention that i just published v0.5.0 of https://github.com/leeoniya/uDSV which some may find interesting.

the performance i'm seeing in Linux / Node v20.5.1 is 1x-5x faster than Papa Parse, with 2.5x being typical.

i spent entirely too long on getting the benchmarks right, and i believe their scope is the most thorough in breadth and depth: https://github.com/leeoniya/uDSV/tree/main/bench.

feedback and corrections/criticism is welcome :)

adamreisnz · 2024-10-23T22:50:58Z

For what it's worth, I would love to see the ability to pause parsing in workers. Currently this is unsupported, which is a problem if you want to use workers but need to stagger the data coming out of the chunks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

💡PapaParse 6.0 #748

💡PapaParse 6.0 #748

dboskovic commented Nov 23, 2019 •

edited

Loading

caverna commented Nov 25, 2019

pokoli commented Nov 28, 2019

DRocksCoding commented Feb 8, 2020

jimmywarting commented Mar 11, 2020 •

edited

Loading

jimmywarting commented Mar 11, 2020

maxwell8888 commented Sep 3, 2020 •

edited

Loading

gunn commented Dec 14, 2020

DaSchTour commented Dec 20, 2020

MarsianMan commented Jul 13, 2021

jimmywarting commented Jul 13, 2021 •

edited

Loading

MarsianMan commented Jul 14, 2021

jimmywarting commented Jul 14, 2021 •

edited

Loading

FunMiles commented Dec 9, 2021 •

edited

Loading

dmost714 commented Jan 5, 2022

riosje commented Mar 23, 2022

sandstrom commented Apr 14, 2022 •

edited

Loading

rohanrajpal commented Nov 16, 2022

pokoli commented Nov 16, 2022

rohanrajpal commented Nov 16, 2022

Elias-Graf commented Nov 17, 2022 •

edited

Loading

pokoli commented Nov 17, 2022

Elias-Graf commented Nov 17, 2022

RuBaWa commented Jan 13, 2023

landisdesign commented Mar 24, 2023

o-t-w commented Mar 26, 2023

ishowman commented Mar 29, 2023

vincerubinetti commented Aug 3, 2023 •

edited

Loading

o-t-w commented Aug 3, 2023

leeoniya commented Sep 4, 2023 •

edited

Loading

adamreisnz commented Oct 23, 2024

sebastianpaige commented Nov 16, 2024

💡PapaParse 6.0 #748

💡PapaParse 6.0 #748

Comments

dboskovic commented Nov 23, 2019 • edited Loading

1. Migrate to ES7 + Typescript or Flow

2. Separate NodeJS build from Browser build

3. Reduce core sugar and add plugin framework

4. Improved docs

5. Reorganized source code

6. Tests, coverage, cross-browser testing & CI based distribution

7. Other: Pipes, Promises, etc.

8. Backwards compatibility adapter

caverna commented Nov 25, 2019

pokoli commented Nov 28, 2019

DRocksCoding commented Feb 8, 2020

jimmywarting commented Mar 11, 2020 • edited Loading

jimmywarting commented Mar 11, 2020

maxwell8888 commented Sep 3, 2020 • edited Loading

gunn commented Dec 14, 2020

DaSchTour commented Dec 20, 2020

MarsianMan commented Jul 13, 2021

jimmywarting commented Jul 13, 2021 • edited Loading

MarsianMan commented Jul 14, 2021

jimmywarting commented Jul 14, 2021 • edited Loading

FunMiles commented Dec 9, 2021 • edited Loading

dmost714 commented Jan 5, 2022

riosje commented Mar 23, 2022

sandstrom commented Apr 14, 2022 • edited Loading

rohanrajpal commented Nov 16, 2022

pokoli commented Nov 16, 2022

rohanrajpal commented Nov 16, 2022

Elias-Graf commented Nov 17, 2022 • edited Loading

pokoli commented Nov 17, 2022

Elias-Graf commented Nov 17, 2022

RuBaWa commented Jan 13, 2023

landisdesign commented Mar 24, 2023

o-t-w commented Mar 26, 2023

ishowman commented Mar 29, 2023

vincerubinetti commented Aug 3, 2023 • edited Loading

o-t-w commented Aug 3, 2023

leeoniya commented Sep 4, 2023 • edited Loading

adamreisnz commented Oct 23, 2024

sebastianpaige commented Nov 16, 2024

dboskovic commented Nov 23, 2019 •

edited

Loading

jimmywarting commented Mar 11, 2020 •

edited

Loading

maxwell8888 commented Sep 3, 2020 •

edited

Loading

jimmywarting commented Jul 13, 2021 •

edited

Loading

jimmywarting commented Jul 14, 2021 •

edited

Loading

FunMiles commented Dec 9, 2021 •

edited

Loading

sandstrom commented Apr 14, 2022 •

edited

Loading

Elias-Graf commented Nov 17, 2022 •

edited

Loading

vincerubinetti commented Aug 3, 2023 •

edited

Loading

leeoniya commented Sep 4, 2023 •

edited

Loading