Skip to content

5.01 Architecture, notes, licence

Boris Shilov edited this page Jun 21, 2019 · 2 revisions

Architecture


CollSeq

CollSeq is a wrapper around IndexedSeq[Product]. CollSeq also implements Product itself.

CollSeqN

CollSeqN are concrete implementations of CollSeq. They extend IndexedSeq[ProductN[T1,..,TN]] and implement ProductN. CollSeqN has only one novel method: flatZip (s:Seq[A]): CollSeqN+1[T1,..TN,A]

CsvParser

CsvParser is a simple Csv reader/parser.` There are concrete parsers implemented for each arity. The actual gruntwork is done by opencsv on the JVM and internal parser on JS. Opencsv will be removed once the internal parser is considered reliable and well tested.

Implicit Conversions
Seq[Product1[T]] => CollSeq1[T]  
Seq[Product2[T1,T2]] => CollSeq2[T1,T2]
Seq[T] => CollSeq1[T]

Status

Stable.

Future

In no particular order:

  • How to incorporate classes that implement ProductN (future case classes)? This bug was originally milestoned for scala 2.11 but seems to have been pushed back a bit.
  • Column access by named method (using macros?)

Non-goals:

  • A mutable version
  • Exceeding scala arity limits

Scalability

CollSeq is known to scale to thousands of rows without difficulty. CollSeq is a thin wrapper around a scala IndexedSeq so should scala in exactly the same way. CsvParser's Iterator has been reported to process millions of rows without spiking the JVM's memory.

Build Dependencies

product-collections relies heavily on sbt-boilerplate. sbt-boilerplate is a cleverly designed yet simple code generating sbt-plugin.

Pull Requests

Pull requests are welcome. Please keep in mind the KISS character if you extend the project. Feel free to discuss your ideas on the issue tracker.

Licence

Two clause BSD Licence.

Alternatives

Product-collections is around 400 lines of code (before template expansion). The alternatives are substantially larger and have far more features.

Shapeless

HLists are similar in concept. Shapeless allows one to abstract over arity.

Saddle

Backed by arrays. Heavily specialized. Matrix operations.

Framian

Simple abstractions for working with ordered series data (eg. time series), as well as heterogeneous data tables (similar to R's data frame). Based on Spire and Shapeless.

With Framian you specify the data type at retrieval time (weakly typed).

scala-datatable

Simple immutable data structure. Weakly typed. Quite a young project with emphasis on sorting.

Testimonials

The brilliance of [product-collections] is the tight focus on being really good at one or two things, which, in my opinion, includes not just the powerful type-safe column- and row-oriented operations, but the extensible use of implicit string converters...

In product-collections you've hit the ultimate sweet-spot from an idiomatic Scala point of view."

Simeon H.K. Fitch, Director of Software Engineering, Elder Research, Inc.