Inconsistent use of head tag #4

ArmorBearer · 2021-05-13T19:52:22Z

I have noticed that there seems to be an inconsistency in the use of the <head> tag in the XML markup. For example:

<head>Preface 1925</head>
<div2 id="cross*a" orig_id="n0" key="*a" type="main" opt="n"><headword extent="full" lang="greek" opt="n" orth_orig="Α α">α</headword>...

In the first case, 'head' seems to mean 'heading'; in the second case, it seems to mean 'headword'. Do I understand this correctly?

In both cases, this makes it harder to convert the XML into HTML for display purposes, since <head> has another meaning in HTML. I would recommend using different, more distinct tags for these situations. Is that a possibility?

helmadik · 2021-05-13T20:33:06Z

I see what you mean. (The actual second example didn't make it through, but I 'see' what you mean:-))
This may not/no longer be legal TEI, but it is what we inherited from Perseus, and I'm kind of hesitant to make changes that further separate the versions. For practical purposes, the two kinds of head are easy enough to distinguish: headwords have a number of attributes; heads in volume 1 do not.

ArmorBearer · 2021-05-14T21:36:00Z

The actual second example didn't make it through, but I 'see' what you mean

Sorry about that - I've edited my original post to fix that.

ArmorBearer · 2021-05-14T22:29:52Z

First, I want to express my sincere appreciation for your quick responses to my posts!

Now, to follow up on your response...

This may not/no longer be legal TEI...

The TEI Guidelines for Dictionaries states:

In some dictionaries, homographs have separate entries...

This appears to be the case in LSJ. For this case, TEI prescribes the following format:

<superEntry>
 <entry n="1" type="hom">
  <sense n="1">
<!-- ... -->
  </sense>
  <sense n="2">
<!-- ... -->
  </sense>
 </entry>
 <entry n="2" type="hom">
  <sense n="1">
   <sense n="a">
<!-- ... -->
   </sense>
   <sense n="b">
<!-- ... -->
   </sense>
  </sense>
  <sense n="2">
<!-- ... -->
  </sense>
  <sense n="3">
<!-- ... -->
  </sense>
 </entry>
</superEntry>

Since the structure I see in LSJ is quite different, I was not assuming any adherence to TEI.

... it is what we inherited from Perseus, and I'm kind of hesitant to make changes that further separate the versions.

I find in the README.md the following statement:

This is the heavily edited local Chicago version of the Perseus LSJ; all Greek converted to Unicode; many entries split or merged.

So, is the purpose of the LSJLogeion database not to make changes that improve on what is available from the Perseus LSJ?

I have changed <head ...> to <headword ...> in my local copy, where appropriate. But it would be nice if I didn't need to do this in the future. Already there is a change to the database (for breve accents), which means I will need to download and edit my copy all over again. In the future this maintenance task will fall to someone else, and they may not be as versed in Regular Expression search and replace as I am...

helmadik · 2021-05-14T22:44:32Z

I don't follow how we got to homographs? Anyway, feel free to do with the files whatever you like. It would be nice to know what projects this version is being used in. A lot of work has gone into cleaning it up, and right now I unfortunately don't have time to contemplate structural or cosmetic changes. Sorry I mentioned TEI:-)

ArmorBearer · 2021-06-04T20:40:56Z

It would be nice to know what projects this version is being used in.

I work for SIL, and this resource is under consideration for inclusion in a tool for translators and translation consultants (probably mostly the latter).

ArmorBearer · 2021-06-04T20:49:46Z

To expand on what I wrote previously...

I have changed <head ...> to <headword ...> in my local copy, where appropriate. ... In the future this maintenance task will fall to someone else, and they may not be as versed in Regular Expression search and replace as I am...

For the benefit of those who come after me, here are the specifics of how I did the search and replace using Regular Expressions:

Search Expression: <head extent="([^"]+)" lang="([^"]+)" opt="n" orth_orig="([^"]+)">([\w \[\]-]+)</head>
Replacement Expression: <headword extent="\1" lang="\2" opt="n" orth_orig="\3">\4</headword>

This way, both the start and end tags are replaced at the same time. This worked for me using Notepad++. Also, I had previously combined all the individual files into one, in order to make this kind of processing easier, as well as to fit in with how our tool loads XML resources.

helmadik · 2021-06-04T20:52:23Z

Thanks! Helma Dik Department of Classics University of Chicago

…

On Fri, Jun 4, 2021 at 3:50 PM ArmorBearer ***@***.***> wrote: To expand on what I wrote previously... I have changed <head ...> to <headword ...> in my local copy, where appropriate. ... In the future this maintenance task will fall to someone else, and they may not be as versed in Regular Expression search and replace as I am... For the benefit of those who come after me, here are the specifics of how I did the search and replace using Regular Expressions: Search Expression: <head extent="([^"]+)" lang="([^"]+)" opt="n" orth_orig="([^"]+)">([\w \[\]-]+)</head> Replacement Expression: <headword extent="\1" lang="\2" opt="n" orth_orig="\3">\4</headword> This way, both the start and end tags are replaced at the same time. This worked for me using Notepad++. Also, I had previously combined all the individual files into one, in order to make this kind of processing easier, as well as to fit in with how our tool loads XML resources. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZLI4IJORYFBN22NSDKO33TRE37XANCNFSM443EVBJQ> .

helmadik · 2021-06-04T20:55:37Z

By the way, losing SIL's amazing, amazing, Conc to the progress of Apple's OS (again, every upgrade is a downgrade..) was what spurred me to start talking to the local digital wizards, way back when. I wrote my dissertation and second book using Conc, a LOT.

ArmorBearer mentioned this issue Jun 7, 2021

Entries with duplicate headwords and/or keys #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent use of head tag #4

Inconsistent use of head tag #4

ArmorBearer commented May 13, 2021 •

edited

Loading

helmadik commented May 13, 2021

ArmorBearer commented May 14, 2021

ArmorBearer commented May 14, 2021

helmadik commented May 14, 2021

ArmorBearer commented Jun 4, 2021

ArmorBearer commented Jun 4, 2021

helmadik commented Jun 4, 2021 via email

helmadik commented Jun 4, 2021

Inconsistent use of head tag #4

Inconsistent use of head tag #4

Comments

ArmorBearer commented May 13, 2021 • edited Loading

helmadik commented May 13, 2021

ArmorBearer commented May 14, 2021

ArmorBearer commented May 14, 2021

helmadik commented May 14, 2021

ArmorBearer commented Jun 4, 2021

ArmorBearer commented Jun 4, 2021

helmadik commented Jun 4, 2021 via email

helmadik commented Jun 4, 2021

ArmorBearer commented May 13, 2021 •

edited

Loading