Skip to content
View pjox's full-sized avatar
Drinking coffee
Drinking coffee

Highlights

  • Pro

Organizations

@commoncrawl @bigscience-workshop @oscar-project

Block or report pjox

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pjox/README.md

Hi there 👋

I'm a Senior Research Scientist at the Common Crawl Foundation.

I am interested in large corpora for training language models, specially for under resourced languages and historical languages. I am interested in tasks such as Name Entity Recognition (NER), Dependency Parsing and Part-of-Speech tagging, Machine Translation and Document structuration.

I love coffee ☕️, cookies 🍪 and maths.

Popular repositories Loading

  1. gutf gutf Public archive

    Terminal tool that converts files encoding to UTF-8

    Go 10 1

  2. cc-downloader cc-downloader Public

    A polite and user-friendly downloader for Common Crawl data

    Rust 5

  3. gofishing gofishing Public

    An extremely fast entity-fishing client

    Go 4

  4. CamemBERT-Experiments CamemBERT-Experiments Public

    A notebook with CamemBERT experiments.

    Jupyter Notebook 4

  5. thesis thesis Public

    My Ph.D. Thesis

    TeX 3

  6. sirene-sql sirene-sql Public archive

    Une query utile pour importer le fichier csv de la base de données sirene dans une base de données SQL

    2