Skip to content
forked from pudo/normality

A tiny library for Python text normalisation. Useful for ad-hoc text processing.

License

Notifications You must be signed in to change notification settings

andkamau/normality

This branch is 1 commit ahead of, 127 commits behind pudo/normality:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

8bb434f · Feb 18, 2017

History

26 Commits
Feb 18, 2017
Nov 11, 2016
Jan 24, 2015
Nov 14, 2016
Mar 12, 2015
Dec 19, 2016
Dec 19, 2016

Repository files navigation

normality

Normality is a Python micro-package that contains a small set of text normalization functions for easier re-use. These functions accept a snippet of unicode or utf-8 encoded text and remove various classes of characters, such as diacritics, punctuation etc. This is useful as a preparation to further text analysis.

Example

# coding: utf-8
from normality import normalize, slugify

text = normalize('Nie wieder "Grüne Süppchen" kochen!')
assert text == 'nie wieder grune suppchen kochen'

slug = slugify('My first blog post!')
assert slug == 'my-first-blog-post'

Extended usage

Read the source code, it's twenty lines of stuff.

RTSL

License

normality is open source, licensed under a standard MIT license (included in this repository as LICENSE).

About

A tiny library for Python text normalisation. Useful for ad-hoc text processing.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Makefile 1.0%