LANTERN

A Language ANnotation Tool to undERstand Narratives

Overview

This repository contains a pipeline for computational narrative analysis assisted by Large Language Models (LLMs).

LANTERN can preprocess, annotate, and analyse entire collections of books, and understand what parts of a book expresses the following narrative information:

i.e., all that happens in the narrative world
i.e., all that happens within a character, such as memories, emotions, and perceptions.
i.e., additional details that contextualize the story, such as characters' relationships or sceneries.

E.g.,

How to Use

Clone this repository and install the required dependencies.

$ gh repo clone cltl/event-detection-tool
$ pip3 install -r requirements.txt

Download Meta-Llama-3-8B-Instruct-GGUF and store it in ./llms/.

1. Preprocess to split a book into paragraphs, sentences, and clauses.

python3 scripts/preprocess/preprocess_book.py --paragraphs --sentences --clauses

2. Annotate each clause with one of three types of information, among events,subjective experiences, and contextual information.

python3 scripts/annotate/tag.py

This step will produce corpus.tsv in the output folder, where each row corresponds to an annoteated clause. If you prefer to annotate sentences, run

python3 scripts/annotate/tag.py --sentences

3. Analyse stories, to observe their structure in terms of sequences of events subjective experiences, and contextual information.

python3 scripts/annotate/tag.py --clauses

if you want to analyze how clauses have been annotated, or

python3 scripts/annotate/tag.py --sentences

to do the same at the level of sentences.

This step visualizes

the distribution of events,subjective experiences, and contextual information in the book,
their frequency across chapters and book chunks,
their entropy.

Here is an example of the frequency of the three labels in the book Max Havelaar, annotated at the clause level with with Openai gpt-4-1106-preview.

Customize LANTERN

Right now, LANTERN runs on Max Havelaar by Multatuli and Nooit Meer Slaapen by Hermans, and it uses a quantized version of Llama-3 for clause splitting and annotation. But you can apply this pipeline on different books (either in English or Dutch) and with other LLMs .

NOTE: For copyright reasons, we make available only the results obtained on the Hermans' book, and not the book itself.

Using Another Model...

...is possible, as long as it is supported by llama-cpp.

Store your LLM in the folder ./llms/, and specify its name in config.ini. In config.ini, you can also change system and user prompts.

Using Another Book

Write book title and language in config.ini.

Specify the url to the .txt of your book in config.ini, for instance
[book]
title = "Max Havelaar"
path = "https://www.gutenberg.org/cache/epub/11024/pg11024.txt"

If you already have a file containing your book, put it in ./inputs/, and specify its location/name in config.ini. The file can either be:

a .txt file
a .tsv file where each row contains a paragraph, with the following columns

Column Name	Description
paragraph_id	Integer identifying a paragraph.
chapter_id	Integer indicating the unique identifier for each chapter.
paragraphs	The actual text content of each paragraph.

a .tsv file where each row is a sentence in the book, with the following columns:

Column Name	Description
sentence_id	Unique identifier for each sentence.
paragraph_id	Unique identifier for each paragraph.
chapter_id	Unique identifier for each chapter.
sentences	The actual text content of each sentence

You're ready to follow these steps.
Note: if you already have the file containing paragraphs, you can preprocess the book running

python3 scripts/preprocess/preprocess_book.py --sentences --clauses

If you already have the sentences .tsv file, you can just run

python3 scripts/preprocess/preprocess_book.py --clauses

Related Resources

This tool was created in collaboration with the CLARIAH consortium.

Check out the corpus CLAUSE-ATLAS that we constructed using the LANTERN pipeline, and corresponding analyses in this publication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LANTERN

Overview

How to Use

Customize LANTERN

Using Another Model...

Using Another Book

Related Resources

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
gitbook		gitbook
inputs		inputs
llms		llms
outputs		outputs
scripts		scripts
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
requirements.txt		requirements.txt

License

cltl/event-classification-tool

Folders and files

Latest commit

History

Repository files navigation

LANTERN

Overview

How to Use

Customize LANTERN

Using Another Model...

Using Another Book

Related Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages