Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need an attribute with just the lemma #91

Open
jonathanrobie opened this issue Feb 7, 2018 · 11 comments
Open

Need an attribute with just the lemma #91

jonathanrobie opened this issue Feb 7, 2018 · 11 comments

Comments

@jonathanrobie
Copy link
Contributor

In the current format, there is no element or attribute that contains the lemma directly without additional text. That makes it difficult to match in queries without preprocessing.

Instead of:

<entry n="διαγινώσκω|G1231">

I would prefer something like:

<entry lemma="διαγινώσκω" strong="G1231">
@dowens76
Copy link
Member

dowens76 commented Feb 8, 2018

That makes sense to me.

You know more than I about XML schemata. Would we need to modify the schema to make that validate?

@destatez
Copy link
Contributor

@jonathanrobie I reformatted the file with a new name, https://github.com/translatable-exegetical-tools/Abbott-Smith/blob/master/abbott-smith.tei_lemma.xml, and have it merged into the repository. THere were cases where there was no vertical bar in the entry and where there were undefined Strongs, G????. I have attached 2 files with those instances. I am open to changing things it that will work better for you. Let me know.
Missing_bar.txt
Undefined_Strongs.txt

@jonathanrobie
Copy link
Contributor Author

Thanks - that's helpful, I hadn't gotten around to this.

Is there a need to have two different files in different formats? Is there need for the format that uses the <entry n="Ἀαρών|G2"> format?

@jonathanrobie
Copy link
Contributor Author

@dowens76

Would we need to modify the schema to make that validate?

Is there a schema? I can't find a file with the extension .dtd, .xsd, .rnc, or .rng.

@destatez
Copy link
Contributor

I do not believe that there is a schema file. I never found one and built by script based upon the xml file contents. I used a different file since I wasn't sure whether we wanted to keep both formats. We will have to get Todd in the loop to move to 1 file for the new format since he was over the effort to do the manual updates to reflect reality.

@dowens76
Copy link
Member

I have always validated against http://www.crosswire.org/OSIS/teiP5osis.2.5.0.xsd (see TEI@xsi:schemaLocation). I have it in my local files but Git is set to ignore it.

@dowens76
Copy link
Member

I cannot see any reason not to maintain only one file. It would make things much easier.

@destatez Thank you for working on this file. I think we probably should just make the changes directly in abbott-smith.tei.xml. But if you would feel more comfortable looping Todd in, that's okay with me. We started the project together, and it's a good idea to keep key parties updated as changes are made.

@dowens76
Copy link
Member

I went through the entries that did not have a bar (most did not have a Strong's number) and fixed those Strong's numbers that were G????.

@destatez
Copy link
Contributor

I wished we would have talked before you appended the letter "a" to these undefined Strongs. We, the Unlocked Greek Lexicon and Unlocked Greek New Testament teams, have taken an approach by Alan Bunning called Strongs Plus (from ugl docset: "The Strong’s Plus ID referenced above was initially developed by Alan Bunning, where he took the 4-digit Strong’s ID and appended a zero to create a 5-digit ID. This gave him extra IDs to be able to qualify different word forms than the standard Strong’s. We will be using this Strong’s Plus identification for this project.") When we created the ugl files, we made all the Strongs IDs this Strongs Plus, The cases where a particular Greek word from A-S was not a part of Alan's deliverable spreadsheet (which is the root for ugnt), I put the G???/ in the xml and for ugl assigned them a unique ID and lemma file above G99000, and put them in a class of IDs that the ugl team would review and determine the actual Strongs Plus ID and update the ugl lemma file accordingly. The work that you did for these can and will be used by our team to reassign these
"undefined" Strongs IDs.

We need to talk about the Strongs Plus ID scheme. It may be better to re-write the xml file using this convention so that we can all be on the same page, We can even update the delivered A-S xml with the Strongs IDs (in 5-number form) for those that were undefined, which you have defined. I can then re-run my script for the new format and have the best of both worlds.

@jonathanrobie
Copy link
Contributor Author

We should have a separate issue for extending Strong's. I will open one.

@toddlprice
Copy link

I'm fine with whatever you think would work best on this @destatez. Thanks for your work.

@destatez destatez pinned this issue Mar 18, 2021
@destatez destatez unpinned this issue Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants