peelo-unicode

Collection of simple to use Unicode utilities for C++17. Supports Unicode 15.1.

Character testing functions

The library ships with Unicode version of ctype.h header, containing following functions inside peelo::unicode::ctype namespace:

isalnum()
isalpha()
isblank()
iscntrl()
isdigit()
isgraph()
islower()
isprint()
ispunct()
isspace()
isupper()
isxdigit()
tolower()
toupper()

Additional functions not found in ctype.h are:

isvalid() - Tests whether given value is valid Unicode codepoint.
isemoji() - Tests whether given Unicode codepoint is an emoji.

Example

#include <iostream>
#include <peelo/unicode/ctype.hpp>

int
main()
{
  using namespace peelo::unicode::ctype;

  std::cout << isalnum(U'Ä') << std::endl;
  std::cout << isdigit(U'൧') << std::endl;
  std::cout << isgraph(U'€') << std::endl;
  std::cout << ispunct(U'\u2001') << std::endl;
  std::cout << std::hex;
  std::cout << tolower(U'Ä') << std::endl;
  std::cout << toupper(U'ä') << std::endl;
}

Character encodings

The library also provides functions for encoding and decoding Unicode character encodings. Both validating and non-validating (where all encoding/decoding errors are ignored) functions are provided.

Supported character encodings are:

Example

#include <peelo/unicode/encoding.hpp>

int
main()
{
  using namespace peelo::unicode::encoding;

  // Decode UTF-8 input, ignoring any decoding errors.
  std::u32string utf8_decoded = utf8::decode("\xe2\x82\xac");

  // Encode it back to byte string, ignoring any encoding errors.
  std::string utf8_encoded = utf8::encode(utf8_decoded);

  // Decode UTF-32BE input with validation.
  std::u32string utf32be_decoded;
  if (utf32be::decode_validate("\x00\x00 \xac", utf32be_decoded))
  {
    // Given input is valid UTF-32BE.
  } else {
    // Given input is invalid UTF-32BE.
  }

  // Encode it back to byte string, with validation.
  std::string utf32be_encoded;
  if (utf32be::encode_validate(utf32be_decoded, utf32be_encoded))
  {
    // Given input contained only valid Unicode code points.
  } else {
    // Given input contained invalid Unicode code points.
  }
}

BOM detection

The library provides function for detecting whether an byte string contains byte order mark or not, and which character encoding it is. Even though use of BOM is rare these days, it might sometimes be useful to able to detect it.

List of detected character encodings are:

Example

#include <fstream>
#include <iostream>
#include <peelo/unicode/bom.hpp>

int
main()
{
  char buffer[1024];
  std::fstream f("file.txt");
  std::size_t length;

  f.read(buffer, sizeof(buffer));
  length = f.gcount();
  f.close();

  if (const auto bom = peelo::unicode::bom::detect(buffer, length))
  {
    if (*bom == peelo::unicode::bom::type::utf16_be)
    {
      std::cout << "File has UTF-16BE BOM." << std::endl;
    } else {
      std::cout << "File has some other BOM." << std::endl;
    }
  } else {
    std::cout << "File does not contain BOM." << std::endl;
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
cmake		cmake
include/peelo/unicode		include/peelo/unicode
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Doxyfile		Doxyfile
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

peelo-unicode

Character testing functions

Example

Character encodings

Example

BOM detection

Example

About

Releases 3

Packages

Languages

License

peelonet/peelo-unicode

Folders and files

Latest commit

History

Repository files navigation

peelo-unicode

Character testing functions

Example

Character encodings

Example

BOM detection

Example

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages