SourceViewBible #19

tombogle · 2022-03-10T21:23:59Z

tombogle
Mar 10, 2022
Maintainer

@RobH123 mentioned that he had corresponded with Rob Wiebe from SourceView (SourceViewReader app SourceViewBible.com), who is also already parsing Scripture to identify speakers.
I found this app, which I'm assuming is the one, based on the URL. but that URL is now a junk site and is not being kept up. Apparently the app isn't either (last updated January 2017).
Has this moved to a new URL? Is development supposed to resume? Is it open source?

tombogle · 2022-04-06T13:52:01Z

tombogle
Apr 6, 2022
Maintainer Author

In April 5, Rob Wiebe <[email protected]> wrote:
Hey Robert,
Thank you for including me in this conversation and sorry for the late reply.
My name is Rob Wiebe and I have been serving with YWAM for almost 20 years. I'm responding from my common email address ([email protected]) instead of the email that was initially used.
I've been working on a bunch of Bible projects over the past decade or so and one of them is the SourceView Bible. We have a complete database of all the speakers for all the words of the Bible. While I'm not in authority to share this data yet, I was encouraged to engage in conversation around seeing this data integrated into ParaText.
For those of you who might be interested in knowing what the SourceView Bible is, there's a web version that is easy to use: https://sourceviewreader.web.app/
There's also a simple tutorial on how to use the web app: https://youtu.be/2ged5EyVYFI

2 replies

tombogle Apr 6, 2022
Maintainer Author

After a review of several dozen passages on the website, here are a few observations:

For the "easy" ones (unique proper name), Glyssen and SourceView Bible mostly do the same thing. For some of the proper names with titles, we have some variation. For example, in Joshua 10:4, Glyssen has Adoni-Zedek, king of Jerusalem, whereas SV Bible has King Adoni-Zedek. It looks like neither app is perfectly consistent in how titles and names are applied or omitted (e.g., see Genesis 14: Bera and Melchizedek; also Ezra 4 & Neh 2: Artaxerxes)
In some cases where a group of named/known characters speak without an identified spokesperson, SV Bible sometimes uses a more generic description (e.g. Gen 37:8; Mark 13:4) or simply assigns the speech to one of the characters (e.g. Gen 21:22-23), though it sometimes calls out the candidates individually (e.g., Gen 24:50; 24:57). Glyssen sometimes has a generic description as an alias and may indicate a default character, but it always identifies the individual candidates in the ID (separated by slashes). Note that there are a few places where Glyssen constrains the list more than what the passage actually says, based on context clues (e.g., Gen 34:14-17, where Glyssen limits the list to Simeon and Levi, Dinah's full brothers, since in v. 31 they are called out as the ones who carried out the vengeance and culturally would almost certainly have been the ones to act as spokesmen in this negotiation).
Because the SV Bible is not actually a dramatized audio Bible, there are places where ambiguities can be left somewhat ambiguous. For example, in Genesis 18, where God and the two angels speak, it is marked as red ("God"), but the character ID is God & Two Angels.
In the SV Bible, there appears to be at least some attempt to ensure that the same character is used in "parallel passages" (e.g. Matthew 24:3 vs. Mark 13:4 & Luke 21:7), but it is either not rigorous or there are mistakes (or perhaps differences in exegesis).
For a lot of the bit parts (where a crowd or anonymous speaker is talking), SV Bible and Glyssen have different character IDs, whether trivial (Numbers 20:3) or more significant (JER 3:20: The People of Judah vs. Israel).
In some cases, SV Bible uses the same character ID to identify speakers who were probably not the same people (e.g., Numbers 9:7 Numbers 20:3, and 2 Samuel 5:1 all have The Israelites). This is similar to what FCBH does in CoreScript because they are generally not concerned with ensuring distinct voices for bit parts, especially if it could possibly be the same person or if the two places are far enough apart that no one is likely to notice that the same voice was used. Glyssen tends to have more distinct IDs, though it stops well short of a separate ID for every place where a distinct character could be speaking. In these cases, the character-detail file does indicate that any number of different voices could be used, though the Glyssen program currently doesn't do anything with that information. It's not clear to me what level of distinction is most generally useful. I don't know if some rigorous standard should be applied to decide when to split and when to join.
It appears that for the most part the SV Bible does not attempt to break out hypothetical characters (e.g., Psalm 96:10, 1 Corinthians 12:6), second-level quotes (Deuteronomy 1:25), or quotations (Psalm 115:2, Jude 14).
I haven't done anything like a thorough comparison, but in the prophetic books, it looks like the SV Bible has mostly followed the same approach as the FCBH Core Script. There are at least some differences (e.g. Amos 3:9), but most likely these are all places where the Glyssen control file already accounts for the possible ambiguities.
Self-quotes by the narrator/author (e.g., Nehemiah, Jeremiah, Galatians 2): self-quotes are called out as speech, but passages where the author writes in first-person are left as narrator. For audio dramatizations, this is a matter of style, but after a lot of consideration, FCBH has decided that the best default is to have a single voice speak both the first-person text and the self-quotes. (A separate narrator voice is used for sections of books written in third-person.) I don't think this has any bearing on the character IDs themselves, and the approach taken in the SV Bible is almost certainly the correct one for that medium.
In Song of Solomon, the SV Bible calls out the Bride's and Groom's parts s speech but leaves the parts identified as The Choir as narrator text (i.e., no speech bubble). FWIW, the FCBH character labels are Beloved (female), Solomon and Daughter of Jerusalem (female). Glyssen has beloved, Solomon, king (alias: Solomon (lover)), maidens, and three "potential" characters queens and concubines (6:10), brothers of beloved and sisters of beloved (8:8-9).
Not having access to the underlying data, I was not able to figure out whether (or to what extent) the data allows for different translations to select a different speaker for the most challenging passages (e.g., Mark 6:14; Revelation 18:14-20). I do see that in Rev 18, there are places where second-level quotes are dramatized, but not consistently. Some of what God says the people will say is spoken by God and some is spoken by the people. I can't tell whether this was a deliberate decision made by a human being or an automatic consequence of the underlying data.

Robwiebe Apr 7, 2022

Hey @tombogle,
Thanks for your comments regarding the SourceView Bible. It's nice to see someone taking the effort to truly evaluate the work that was done.

The work of attributing unique ids for speakers is quite a subjective act, even though the method could be quite inductive. The purpose for doing the work would obviously influence the choices that are made. As you mentioned in your comments, doing this for the purpose of creating an audio Bible would cause one to have a specific desired outcome and would lead one to make choices based on that goal. In the same way, if one were doing the work with the purpose of doing unique queries on the data to see who said what to whom, that would also influence what kind of results would be beneficial for the person querying. The purpose for the work greatly influences the result (and rightly so, in my opinion).

I can confidently say that the SourceView Bible speaker ids are very intentional in selection, having been reviewed by dozens of people. Obviously, we live in a post-Genesis 3 world and human error is bound to enter into work, but I believe that we have reviewed our work significantly over the years, especially the last 2 years. We've done this work in at least 13 versions of the Bible in 9 languages, with another dozen or so languages in the works. The challenge of doing this work according to quotation marks in great because each version and language have differences (and some of them are major differences). We've had to adapt our systems to allow for a greater variety of choice for the source name because of this. Even between the English NLT and NIV there are hundreds of differences. For this reason, we have embarked on doing this project in the original languages.

I think it's helpful to share that another data set that we are applying to this behind the scenes is that of time periods (using centuries: ex 1400's BC, 0's AD, etc.). Therefore, when we use certain source names like The Israelites across different centuries, we're also applying the time period data so that query results can be filtered by time period.

There's so much more we could chat about to better give understanding to the SourceView project, but I'll leave it for now.

Question #1: did you review the SourceView Bible mobile application or the online website: https://sourceviewreader.web.app/?
NOTE: the mobile app version is old and outdated (2016). Our current data sets are found in the web version. We have made A LOT of changes between those two products.
Question #2: if you reviewed the web version, what language/version did you review?

Thanks for your review of our data. I'll bring your comments to the team when I know what data set you reviewed and see if we should make any adjustments.

Blessings!

RobH123 · 2022-04-06T20:15:22Z

RobH123
Apr 6, 2022
Maintainer

Wow, thanks Rob Wiebe for that helpful update, and to Tom for the amazing detailed spot check and comparison! It's clear that there's no one answer to how to do this, and like anything relating to natural language, it's always way more complicated than first expected.

To me, one important detail is that the YWAM data isn't freely available / open-licensed (yet?). Perhaps this seems harsh, but in my mind, other than learning about different possible ways of thinking about technical matters (which Tom did above), it seems to me that it makes it largely irrelevant to this bigger discussion which is mostly about resources which can be freely offered to the Bible world to use in ways that we can't maybe even imagine yet. (Saying it might be made available just in Paratext is of no personal interest to me. My interest is tagging original Heb and Grk texts so that it may be possible for translations that are aligned to the original text to automatically transfer that data to their translation.) But please do let us know, Rob, if/when you are able negotiate better licensing terms. And also, don't hesitate to let us know of any limitations or deficiencies you see in the Glyssen data/system.

0 replies

tombogle · 2022-04-06T22:02:27Z

tombogle
Apr 6, 2022
Maintainer Author

Unless we can come up with an objective standard/automated way to arrive at sufficiently (whatever that means) unique and usable IDs, it's going to take a painful amount of work to come to agreement as to which (arbitrary) approach is "best." I'd personally be willing to spend some time making Glyssen conform to some other standard if it met Glyssen's needs and "everyone" else agreed it was the best standard. (I'd have to get buy-in from my higher ups, but I'm guessing it would be possible.) But so far, it doesn't look like there is an "everyone" else. It's a lot less motivating to make Glyssen mimic "someone" else. Feels like there are two rather difficult options:

Declare oneself to be the standard, make your data free, and hope others follow.
Try to get a coalition to form with everyone willing to make decisions and concessions. (I guess that's what this repo is attempting to do.)

1 reply

RobH123 Apr 6, 2022
Maintainer

Yeah, I agree. #2 is a nice ideal, but so many (world, not just Bible) standards come about because one player did a pretty-good job of addressing the problem, and made their data open, so they became the de-facto standard. Your #1 is still MUCH better than the alternative -- not really having anything publicly available.

But I still wonder if standardizing a "Bible people DB" is the first step? Do you have that? I can see from here say that we can extract "characterId"s but Viz.Bible and others have done much more work on that (including mapping relationships like "father of", "sibling of", etc.)

Robwiebe · 2022-04-07T00:40:30Z

Robwiebe
Apr 7, 2022

Hey,

This is Rob Wiebe from the SourceView Bible project. The app mentioned in a previous comment is quite outdated and the original website seems to have been hacked. Unfortunately, that was before my time with the project and I don't have access to those things.

While the SourceView Bible data (which is much more than just speakers) is copyrighted for the sake of protection at this point, but the copyright owner is open to considering options for sharing data for the greater good. I'll try to bring him up to speed on this conversation and see where it could lead.

I will add in that we have characterIDs based on our naming system, but we also have secondary data tagging where helpful. For example when the speaker is a group, we tag the other characterIDs that are a part of that group. This helps when using the data to filter speaking parts.

Another thing to consider is that we are already working on the Hebrew/Greek versions in the SourceView Bible. I'm not sure if that helps this conversation.

If you click on the "source" name the popup displays much more data there:

Anyways, I'm just really encouraged to know that there are others who are working on these things. I'll try to follow this thread along and participate as much as I can.

3 replies

RobH123 Apr 7, 2022
Maintainer

Thanks for the update and the participation. Yes, there are many encouraging things happening in the world of open Bible data, and with both the OSHB and the recent beta release of the SR GNT, I'm expecting that one day (perhaps in my lifetime even?) we'll see open Bible data becoming more the norm. (It's shameful, I think, that it happened first in the world of secular software rather than in the Bible world, but we're slowly catching up.) The synergy of being able to combine these open resources in various ways is very exciting. Two trends I think: 1/ young people just expecting Bible data to be freely available (especially from orgs that solicited donations to develop them), and 2/ those near retirement age (like myself) wanting to see their legacies and hard work living on, rather than dying with them.

tombogle Apr 7, 2022
Maintainer Author

It's shameful, I think, that it happened first in the world of secular software rather than in the Bible world, but we're slowly catching up.

I don't think we should feel too bad about this. We need to understand that there are a lot of factors at play here. And in the secular world, there are still lots of things that are protected by copyrights and licenses. We shouldn't be too critical of our brothers and sisters who sometimes have really noble reasons for protecting their intellectual property, even when that sometimes seems to run cross grain to the value of freely sharing the Word of God. I think if we listen carefully, we'll see that they usually share that value, but they have other factors to consider.

RobH123 Jul 19, 2022
Maintainer

While the SourceView Bible data (which is much more than just speakers) is copyrighted for the sake of protection at this point, but the copyright owner is open to considering options for sharing data for the greater good. I'll try to bring him up to speed on this conversation and see where it could lead.

Any update on this @Robwiebe? Your time period data would also be very helpful for my interest (although irrelevant to the Glyssen/FCBH needs I guess).

tombogle · 2022-04-07T16:06:15Z

tombogle
Apr 7, 2022
Maintainer Author

In answer to your questions above:

I reviewed the web-based app (https://sourceviewreader.web.app/)
I looked at the NIV, NLT, and Spanish mostly. I did look at the French and Portuguese briefly as well, but I don't speak those languages.

Given that there may be legitimate reasons for various decisions about character IDs, perhaps the "best" approach would be to have a database that is able to store more than one system ("aliases"). The trouble with that, I'm afraid, is that unless you start with a "master" list that has it broken up into the maximum granularity, you'll end up needing a many-to-many map. By introducing Scripture references, a useful many-to-many map still might be possible, but it's probably not trivial. Looking at the data we have in Glyssen, I see that there are 249 character IDs that represent a group/plurality that speak in more than one verse. Of those, 100 speak in exactly 2 verses. In places where those verses are contiguous or in close proximity, they can generally be treated as a single unique character. That still leaves well over 150 character IDs that would potentially need to be split up into greater granularity to allow for historical uniqueness. In a way, this would be "better" in Glyssen because there is no compelling reason why every time the "Israelites" speak, it should always sound like the same person. (In practice, whenever Glyssen's output is processed into FCBH's Core Script, those distinctions would be erased because they map our character IDs onto their minimal set of voice actors. But Glyssen does have the ability to come up with an optimal distribution of characters to any size cast (where you can specify genders and age groupings.)
For the purposes of the proposed Paratext plugin that could be used to add milestone markers with character IDs, maybe the best thing would be to have it so the user could choose a preferred system, and then have Glyssen (by means of a separate sharable DLL) be able to look at the character ID and map it onto the Glyssen IDs if some other known system had been used.

0 replies

jonathanrobie · 2022-04-08T00:04:20Z

jonathanrobie
Apr 8, 2022
Collaborator

If you want unique identifiers for speakers, would the Semantic Dictionary of Biblical Greek and the Semantic Dictionary of Biblical Hebrew work? See https://semanticdictionary.org/. These identifiers are already used in Enhanced Resources and also in the MACULA datasets:

https://github.com/Clear-Bible/macula-hebrew
https://github.com/Clear-Bible/macula-greek

For instance, in SDBH, 3.1.7 is the domain "Names of People", there is also a domain for "Names of Deities", etc.

One downside: the identifiers for NT and OT differ.

3 replies

tombogle Apr 8, 2022
Maintainer Author

I poked around a little and couldn't find the lists you referred to. I'd probably need someone to hold my hand. But anyway, we would presumably want the IDs in the OT and the NT to be the same. It is not all that common that a whole-Bible dramatized audio recording is done in one go, but when it is, we would, for example, want God's voice to be consistent in the OT and NT. There are obviously things we could do to tie the OT character IFD for God to the NT character ID, but that's probably not the clearest way to represent the data. Also, given the challenges of dealing with editing bidi text, I think using Hebrew script for character IDs would be less than optimal for the majority of the world's translation teams. (I know: it might be great for the minority who are always having to deal with this pain, but at least those users are more accustomed to dealing with it.) I think we do want to end up with a resource that provides appropriate mappings/localizations from the character IDs to other languages, and that would include Greek and Hebrew. In some ways, I guess the bigger issue is that while Greek and Hebrew names could work as IDs for unique proper names, that still wouldn't directly address the thornier issue of all the characters who are not identified by name or who represent a group of people (e.g., "soldiers" or "family heads of Gilead").

RobH123 Apr 8, 2022
Maintainer

I agree that although we don't want to seem totally English-centric (and we're definitely not), it's also a very convenient world-wide metalanguage (just look at the keywords in most programming languages: while, if, else, break, etc.) so yes, I also would feel no shame in using convenient English metadata tags rather than Hebrew (or even Greek).

RobH123 Jul 19, 2022
Maintainer

Ok, I discovered the line "003001007": "Names of People" in macula-hebrew/sources/MARBLE/SDBH/domain-label-mapping-1.json (so I guess that's the 3.1.7 that Jonathan referred to) and blocks like

 <word id="00100400800010">
   <sense>SDBH:הֶבֶל:001000:Names of People</sense>
 </word>

in macula-hebrew/sources/MARBLE/SDBH/sdbh-senses.xml but those SDBH numbers all seemed to be either 001000 or 000000 and occasionally 000001 or 000002 so not sure what they mean @jonathanrobie?

I couldn't discover any equivalent in macula-greek.

By going to https://semanticdictionary.org/semdic.php?databaseType=SDBH&language=en and clicking on Domains, I can click on Referents (3), Object referents (3.1) and see Names of People (3.1.7). But https://semanticdictionary.org/semdic.php?databaseType=SDGNT&language=en has People (9) and then (the very surprising) subtypes of Human Beings (9.1), Males (9.2), Females (9.3), Children (9.4), and Persons For Whom There Is Affectionate Concern (9.5)

So yes, the OT and NT ~~identifiers~~ classifications do differ (greatly) as Jonathan said.

But I think we were looking for a set of (probably English) identifiers like Levi-1 and Levi-2 or something. Seems that macula and Semantic Dictionary use the actual Hebrew and Greek words? (If so, how would you map nicknames or alternative spellings to the same person?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SourceViewBible #19

{{title}}

Replies: 6 comments 9 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

SourceViewBible #19

tombogle Mar 10, 2022 Maintainer

Replies: 6 comments · 9 replies

tombogle Apr 6, 2022 Maintainer Author

tombogle Apr 6, 2022 Maintainer Author

Robwiebe Apr 7, 2022

RobH123 Apr 6, 2022 Maintainer

tombogle Apr 6, 2022 Maintainer Author

RobH123 Apr 6, 2022 Maintainer

Robwiebe Apr 7, 2022

RobH123 Apr 7, 2022 Maintainer

tombogle Apr 7, 2022 Maintainer Author

RobH123 Jul 19, 2022 Maintainer

tombogle Apr 7, 2022 Maintainer Author

jonathanrobie Apr 8, 2022 Collaborator

tombogle Apr 8, 2022 Maintainer Author

RobH123 Apr 8, 2022 Maintainer

RobH123 Jul 19, 2022 Maintainer

tombogle
Mar 10, 2022
Maintainer

Replies: 6 comments 9 replies

tombogle
Apr 6, 2022
Maintainer Author

tombogle Apr 6, 2022
Maintainer Author

RobH123
Apr 6, 2022
Maintainer

tombogle
Apr 6, 2022
Maintainer Author

RobH123 Apr 6, 2022
Maintainer

Robwiebe
Apr 7, 2022

RobH123 Apr 7, 2022
Maintainer

tombogle Apr 7, 2022
Maintainer Author

RobH123 Jul 19, 2022
Maintainer

tombogle
Apr 7, 2022
Maintainer Author

jonathanrobie
Apr 8, 2022
Collaborator

tombogle Apr 8, 2022
Maintainer Author

RobH123 Apr 8, 2022
Maintainer

RobH123 Jul 19, 2022
Maintainer