Skip to content
Change the repository type filter

All

    Repositories list

    • Utilities for Cascading
      Java
      1222195Updated Jul 1, 2022Jul 1, 2022
    • Cascading scheme for Solr
      Java
      132797Updated Jul 1, 2022Jul 1, 2022
    • pinot

      Public
      Apache Pinot (Incubating) - A realtime distributed OLAP datastore
      Java
      Apache License 2.0
      1.3k000Updated Jun 8, 2022Jun 8, 2022
    • Source code for blog post series on text features for similarity calculation
      Java
      11103Updated May 12, 2021May 12, 2021
    • Demo of using flink-crawler to extract pages from Common Crawl for a target language
      Java
      Apache License 2.0
      0030Updated Apr 8, 2019Apr 8, 2019
    • Continuous scalable web crawler built on top of Flink and crawler-commons
      Java
      Apache License 2.0
      1851270Updated Apr 8, 2019Apr 8, 2019
    • Utilities for use with Flink
      Java
      Apache License 2.0
      0000Updated Mar 14, 2019Mar 14, 2019
    • cascading

      Public
      Cascading is a feature rich API for defining and executing complex and fault tolerant data processing workflows on various cluster computing platforms. Please see https://github.com/cwensel/cascading for access to all WIP branches.
      Java
      Other
      221000Updated Nov 29, 2018Nov 29, 2018
    • Simple implementation of KMeans clustering on Flink, using iterations
      Java
      Apache License 2.0
      11070Updated Nov 15, 2018Nov 15, 2018
    • fastText

      Public
      Library for fast text representation and classification.
      HTML
      Other
      4.7k000Updated Jul 16, 2018Jul 16, 2018
    • JFastText

      Public
      Java interface for fastText
      Java
      Other
      100000Updated Jul 10, 2018Jul 10, 2018
    • Mirror of Apache Lucene + Solr
      Java
      2.7k000Updated Jun 26, 2018Jun 26, 2018
    • Wrapper code for Apache HttpClient that provides common page fetching functionality
      Java
      Apache License 2.0
      5000Updated May 14, 2018May 14, 2018
    • Classes that wrap multiple source functions in useful ways
      Apache License 2.0
      0000Updated May 9, 2018May 9, 2018
    • An extension of Yahoo's Benchmarks
      Java
      Apache License 2.0
      52000Updated Aug 23, 2016Aug 23, 2016
    • tenaya

      Public
      Tenaya is code that processes FASTQ files from the Sequence Read Archive (SRA), and identifies reads with bad metadata (e.g. wrong species) and/or bad read data.
      Java
      Apache License 2.0
      0020Updated Jul 8, 2016Jul 8, 2016
    • fse4j

      Public
      Java port of FiniteStateEntropy project in GitHub (https://github.com/Cyan4973/FiniteStateEntropy)
      Apache License 2.0
      0000Updated Jul 1, 2016Jul 1, 2016
    • Integration of Cucumber with Cascading
      Java
      Apache License 2.0
      0030Updated Jun 3, 2016Jun 3, 2016
    • wikiwords

      Public
      Code to create mapping from words to Wikipedia article titles (topics) and categories
      Java
      Apache License 2.0
      0010Updated May 5, 2016May 5, 2016
    • Linear SVM for Cascading-based workflows
      Java
      Apache License 2.0
      0060Updated Nov 15, 2015Nov 15, 2015
    • Cascading Scheme for the Apache Avro data serialization format
      Java
      Other
      3719231Updated Oct 23, 2015Oct 23, 2015
    • Maven repo for Java components that aren't in a public Maven repo.
      0000Updated Oct 5, 2015Oct 5, 2015
    • Snippets of useful Cascading code.
      Java
      1100Updated Aug 7, 2015Aug 7, 2015
    • Cucumber for the JVM
      Java
      MIT License
      2k000Updated Sep 24, 2014Sep 24, 2014
    • Code to split/parse Wikipedia XML dump
      Java
      41200Updated Oct 7, 2013Oct 7, 2013
    • Java version of LIBLINEAR
      Java
      BSD 3-Clause "New" or "Revised" License
      184500Updated Mar 18, 2013Mar 18, 2013
    • atomizer

      Public
      Cascading-based workflow to process noisy record-based data
      Java
      0000Updated Jan 8, 2013Jan 8, 2013
    • Cascading 2.0 Scheme for writing out Lucene indexes using Tuple field values.
      Java
      2000Updated Aug 22, 2012Aug 22, 2012
    • Amazon EC2 instance comparison site
      JavaScript
      584100Updated Jun 10, 2012Jun 10, 2012
    • Cascading Tap & Scheme for Amazon's SimpleDB
      Java
      11201Updated Nov 25, 2010Nov 25, 2010