Three different utilities developed using natural language processing techniques as well as regex for parsing.
Basic utility to convert non-ASCII text to ASCII enconded text, removing accents and other non-ascii characters.
Search engine for news website Publico that fetches news according to a query specified by the user and generates pdf documents with all the articles that match the query.
RSS crawler that fetches documents from an RSS feed and indexes them. Users are then able to search the documents via a query that sorts the documents via a TF-IDF algorithm.