Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Fluo's front page and short description. #80

Open
keith-turner opened this issue Aug 11, 2017 · 0 comments
Open

Improve Fluo's front page and short description. #80

keith-turner opened this issue Aug 11, 2017 · 0 comments

Comments

@keith-turner
Copy link
Contributor

Recent events have caused discussion in the community about how best to quickly describe Fluo. As a result we have a few different short descriptions floating around.

Description from Github readme

Apache Fluo is an open source implementation of Percolator (which populates
Google's search index) for Apache Accumulo. Fluo makes it possible to update
the results of a large-scale computation, index, or analytic as new data is
discovered. Check out the Fluo project website for news and general
information.

Description on website

Apache Fluo is an open source implementation of Percolator (which populates
Google's search index) for Apache Accumulo. With Fluo, users can continuously
join new data into large existing data sets without reprocessing all data.
Unlike batch and streaming frameworks, Fluo offers much lower latency and can
operate on extremely large data sets.

Description in August 2017 board report.

 - Apache Fluo is a distributed processing system built on Apache Accumulo.  Fluo
   users can easily setup workflows that execute cross node transactions when data
   changes.  These workflows enable users to continuously join new data into large
   existing data sets with low latency while avoiding reprocessing all data.

Below are some of the concepts these short descriptions are trying to communicate. What else needs to be in this outline? Can we improve the front page of the website to be more informative and succinct? The front page does not have to touch on all aspects, it could possibly link out for more details or omit some aspects.

  • History
    • Based on Percolator design.
  • What capabilities it offers to users
    • Continuously join new data into large existing data sets without reprocessing all data
    • Keep multiple dependent derived data sets (similar to materialized views)
    • Emit changes in derived data sets to external systems
      • Continually keep a large index up to date as new data arrives.
      • Update external analytic systems.
  • How it works
    • Cross node transactions
    • Notifications
    • Observers that execute based on notifications
  • Context, how does it compare to other technologies. Explaining Fluo relative to other technologies may help people understand Fluo more quickly.
    • lower latency than batch
    • larger data sets than streaming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant