-
Notifications
You must be signed in to change notification settings - Fork 29
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
258 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
layout: page | ||
title: TEI Exercise | ||
description: Select exercises from Python Crash Course | ||
--- | ||
```<div n="16" type="chapter"> | ||
<pb n="381" facs="tcp:0034700102:396" /> | ||
<head> | ||
CHAP. XVI. | ||
<hi>The Conduct of the Roman Government towards the Chriſtians, from the Reign of Nero to that of Conſtantine.</hi> | ||
</head> | ||
<p> | ||
IF we ſeriouſly conſider the purity of the Chriſtian religion, the ſanctity of its moral precepts, and the innocent as well as auſtere lives of the greater | ||
<note place="margin"> | ||
Chriſtian | ||
<g ref="char:EOLhyphen" /> | ||
ity perſe | ||
<g ref="char:EOLhyphen" /> | ||
cuted by the Ro | ||
<g ref="char:EOLhyphen" /> | ||
man em | ||
<g ref="char:EOLhyphen" /> | ||
perors. | ||
</note> | ||
number of thoſe, who during the firſt ages em | ||
<g ref="char:EOLhyphen" /> | ||
braced the faith of the goſpel, we ſhould natu | ||
<g ref="char:EOLhyphen" /> | ||
rally ſuppoſe, that ſo benevolent a doctrine would have been received with due reverence, even by the unbelieving world; that the learned and the polite, however they might deride the miracles, would have eſteemed the virtues of the new ſect; and that the magiſtrates, inſtead of perſecuting, would have protected an order of men who yield | ||
<g ref="char:EOLhyphen" /> | ||
ed the moſt paſſive obedience to the laws, though they declined the active cares of war and govern | ||
<g ref="char:EOLhyphen" /> | ||
ment. If on the other hand we recollect the univerſal toleration of Polytheiſm, as it was in | ||
<g ref="char:EOLhyphen" /> | ||
variably maintained by the faith of the people, the incredulity of philoſophers, and the policy of the Roman ſenate and emperors, we are at a loſs to diſcover what new offence the Chriſtians had committed, what new provocation could exaſperate the mild indifference of antiquity, and what new motives could urge the Roman princes, who beheld without concern a thouſand forms of religion ſubſiſting in peace under their | ||
</p> | ||
</div>``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,221 @@ | ||
--- | ||
layout: page | ||
title: Gibbon | ||
description: | ||
--- | ||
# Exploring Gibbon - Analyze Text | ||
|
||
|
||
```python | ||
# imports | ||
import json | ||
from sklearn.feature_extraction.text import TfidfVectorizer | ||
import pandas as pd | ||
``` | ||
|
||
|
||
```python | ||
# load data | ||
file_name = 'gibbon_lemmas.json' | ||
with open(file_name, encoding='utf-8', mode='r') as f: | ||
gibbon_lemmas = json.load(f) | ||
``` | ||
|
||
## Find the most important words by chapter in Gibbon | ||
For this part we are going to use a library called [scikit-learn](https://scikit-learn.org/stable/). This library is primarily for machine learning, but many of its features are useful for DH work. | ||
Advanced Reading: https://towardsdatascience.com/tf-idf-explained-and-python-sklearn-implementation-b020c5e83275 | ||
|
||
|
||
```python | ||
# The tool I will use here requires a string as input rather than a list, so I convert my docs from lists to strings | ||
gibbon_chap_strings = [] | ||
gibbon_chap_names = [] | ||
for key, value in gibbon_lemmas.items(): | ||
gibbon_chap_names.append(key) | ||
chap_string = ' '.join(value) | ||
gibbon_chap_strings.append(chap_string) | ||
``` | ||
|
||
|
||
```python | ||
# transform corpus into a matrix of word counts | ||
vectorizer = TfidfVectorizer(max_df=.65, min_df=1, stop_words=None, | ||
use_idf= True, norm=None) | ||
transformed_chaps = vectorizer.fit_transform(gibbon_chap_strings) | ||
transformed_chaps_as_array = transformed_chaps.toarray() | ||
``` | ||
|
||
|
||
```python | ||
gibbon_key_vocab_by_chap = {} | ||
for chap, chap_name in zip(transformed_chaps_as_array, gibbon_chap_names): | ||
tf_idf_tuples = list(zip(vectorizer.get_feature_names(), chap)) | ||
sorted_tf_idf_tuples = sorted(tf_idf_tuples, key= lambda x: x[1], reverse=True) | ||
k = chap_name | ||
v = sorted_tf_idf_tuples[:10] # only getting the top ten | ||
gibbon_key_vocab_by_chap[k] = v | ||
``` | ||
|
||
C:\Users\msaxto01\Anaconda3\lib\site-packages\sklearn\utils\deprecation.py:87: FutureWarning: Function get_feature_names is deprecated; get_feature_names is deprecated in 1.0 and will be removed in 1.2. Please use get_feature_names_out instead. | ||
warnings.warn(msg, category=FutureWarning) | ||
|
||
|
||
|
||
```python | ||
for k, v in gibbon_key_vocab_by_chap.items(): | ||
result = k + ' => ' + v[0][0] + ', ' + v[1][0] + ', ' + v[2][0] + ', ' + v[3][0] + ', ' + v[4][0] | ||
print(result) | ||
``` | ||
|
||
chap01 => barbarian, comprehend, auxiliary, pilum, cohort | ||
chap02 => aqueduct, god, provincial, splendour, improvement | ||
chap03 => commonwealth, consul, constitution, adoption, thing | ||
chap04 => gladiator, amphitheatre, favourite, beast, amusement | ||
chap05 => competitor, 13th, superiority, 28th, flatter | ||
chap06 => excise, mutiny, legacy, tax, taxation | ||
chap07 => election, game, barbarian, gladiator, other | ||
chap08 => artaxerxe, satrap, tract, article, universe | ||
chap09 => barbarian, warrior, reindeer, iron, liquor | ||
chap10 => barbarian, goth, censor, invader, port | ||
chap11 => barbarian, queen, coin, supresse, goth | ||
chap12 => barbarian, interregnum, election, amphitheatre, consul | ||
chap13 => barbarian, colleague, abdication, mention, painting | ||
chap14 => constantine, barbarian, superiority, acquaint, levy | ||
chap15 => doctrine, apostle, presbyter, daemon, immortality | ||
chap16 => persecution, edict, martyr, martyrdom, sect | ||
chap17 => praefect, capitation, constantine, consul, indiction | ||
chap18 => constantine, barbarian, nephew, negotiation, sister | ||
chap19 => barbarian, obelisk, eunuch, garrison, praefect | ||
chap20 => constantine, conversion, cross, baptism, labarum | ||
chap21 => primate, logo, synod, controversy, catholic | ||
chap22 => tribunal, barbarian, consul, praefect, oblige | ||
chap23 => god, toleration, rebuild, miracle, persecution | ||
chap24 => barbarian, sophist, elephant, god, satrap | ||
chap25 => barbarian, magic, pirate, infantry, praefect | ||
chap26 => barbarian, goth, emigration, plain, animal | ||
chap27 => archbishop, barbarian, arbogaste, armour, heretic | ||
chap28 => martyr, god, relic, idol, miracle | ||
chap29 => praefect, barbarian, stilicho, count, poet | ||
chap30 => barbarian, poet, entrenchment, auxiliary, warrior | ||
chap31 => barbarian, usurper, goth, praefect, constituent | ||
chap32 => archbishop, barbarian, empress, eunuch, favourite | ||
chap33 => carthage, vandal, barbarian, sleeper, cavern | ||
chap34 => barbarian, embassy, deserter, table, interpreter | ||
chap35 => barbarian, waggon, plain, entrenchment, left | ||
chap36 => vandal, barbarian, praefect, confederate, oreste | ||
chap37 => monk, monastery, barbarian, catholic, persecution | ||
chap38 => barbarian, angle, provincial, diocese, clovis | ||
chap39 => favor, honor, consul, goth, clamor | ||
chap40 => blue, silk, green, cupola, worm | ||
chap41 => vandal, goth, carthage, ditch, departure | ||
chap42 => chosroe, silk, khan, skin, garrison | ||
chap43 => honor, comet, plague, goth, valor | ||
chap44 => civilian, jurisprudence, praetor, honor, decemvir | ||
chap45 => alboin, exarch, honor, chagan, duke | ||
chap46 => chosroe, chagan, satrap, elephant, avar | ||
chap47 => patriarch, synod, honor, heresy, monk | ||
chap48 => patriarch, dynasty, surname, empress, constantine | ||
chap49 => pope, exarch, donation, duke, pontiff | ||
chap50 => prophet, apostle, caliph, camel, angel | ||
chap51 => caliph, prophet, mosch, abstinence, stipend | ||
chap52 => caliph, prophet, commander, ommiad, emir | ||
chap53 => theme, caliph, tactic, silk, tier | ||
chap54 => paulician, doctrine, reformer, sect, heretic | ||
chap55 => canoe, baptism, missionary, conversion, emigration | ||
chap56 => duke, galley, count, admiral, inheritance | ||
chap57 => sultan, caliph, dynasty, emir, pilgrim | ||
chap58 => crusader, pilgrim, knight, count, crusade | ||
chap59 => crusade, sultan, pilgrim, knight, caliph | ||
chap60 => baron, pilgrim, crusader, doge, marquis | ||
chap61 => baron, crusade, doge, knight, earl | ||
chap62 => patriarch, duke, ounce, prelate, vatace | ||
chap63 => galley, cell, patriarch, elder, regent | ||
chap64 => sultan, khan, emir, horde, mogul | ||
chap65 => sultan, emir, cage, khan, vizier | ||
chap66 => synod, patriarch, pontiff, cardinal, copy | ||
chap67 => sultan, despot, cardinal, crusade, huniade | ||
chap68 => sultan, cannon, vizier, bullet, harbour | ||
chap69 => pope, cardinal, conclave, election, metropolis | ||
chap70 => cardinal, baron, tribune, pope, conclave | ||
chap71 => coliseum, arch, amphitheatre, marble, column | ||
|
||
|
||
|
||
```python | ||
# explore vocabulary | ||
gibbon_key_vocab_by_chap['chap01'] # <-- you can investigate other chapters | ||
``` | ||
|
||
|
||
|
||
|
||
[('barbarian', 24.247456053159887), | ||
('comprehend', 16.270300569327112), | ||
('auxiliary', 14.002444755199152), | ||
('pilum', 12.534161491043836), | ||
('cohort', 12.520386983881371), | ||
('peninsula', 12.520386983881371), | ||
('boundary', 10.28895147073927), | ||
('ancient', 9.722308044204258), | ||
('tropic', 9.591673732008658), | ||
('breadth', 9.32890855939846)] | ||
|
||
|
||
|
||
## Conditional frequency distribution in Gibbon | ||
|
||
### Natural Language Toolkit | ||
The **Natural Language Toolkit** (NLTK) is a library used for natural language processing (NLP). If you want to learn more, I highly recommend working through the [NLTK Book](https://www.nltk.org/book/). This resource is a great introduction to NLP specifically and Python more generally. | ||
|
||
A **conditional frequency distribution** (cfd) is a collection of word counts for a given condition, i.e. category. Here the category is separate chapters in Gibbon. We can chart what used are used most frequently by chapter. This will tell us something about the nature of each chapter. | ||
|
||
|
||
```python | ||
import nltk | ||
import matplotlib.pyplot as plt | ||
``` | ||
|
||
|
||
```python | ||
# conditional frequency distribution | ||
# Note: I adapted these lines of code from the NLTK | ||
key_words = ['doctrine', 'apostle', 'presbyter', 'daemon', 'immortality'] # <-- instert token(s) to explore (lowercase) | ||
cfd = nltk.ConditionalFreqDist( | ||
(key_word, chap_name) | ||
for chap_name in gibbon_lemmas.keys() | ||
for lemma in gibbon_lemmas[chap_name] | ||
for key_word in key_words | ||
if lemma.lower().startswith(key_word) | ||
) | ||
# display plot | ||
plt.figure(figsize=(20, 8)) # this expands the plot to make it more readable | ||
cfd.plot() | ||
``` | ||
|
||
|
||
|
||
![png](output_11_0.png) | ||
|
||
|
||
|
||
|
||
|
||
|
||
<AxesSubplot:xlabel='Samples', ylabel='Counts'> | ||
|
||
|
||
|
||
|
||
```python | ||
# save figure as png | ||
plt.savefig('../figures/kw_by_chap.png') | ||
# save as pdf | ||
plt.savefig('../figures/kw_by_chap.pfd') | ||
``` | ||
|
||
### Activity | ||
Based on the key vocabulary by chapter above, explore the use of different terms in the conditional frequency distribution. | ||
* What questions about the text does this raise for you? | ||
* What hypotheses about the text can you form? | ||
|
||
![png](../assets/output_11_0.png) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.