-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to deliver normalized and original tokens in XML output? #6
Comments
@Arithmeticus I also remember your telling me that simply running What I'm trying to do is run with parameters according to the Diff+xsl stylesheet after eliminating all the HTML processing stuff deeper down. (I did get that working easily enough.) I hope I'm not going about this all wrong, but those parameters did deliver me a way to pre-process my XML and treat its tags as text that I could normalize. So I hope there's a way to dig the original strings out, following the Diff+ method of approach to TAN's function library. |
hi @ebeshero -- you must be in the throes of post-deadline-Balisage-paper writing. Like me.
So you can't do it with One challenge with TAN Diff+ is that (if I recall correctly) there are two stages of normalization. The first round gets ignored by any later comparison. It's like getting rid of the stuff you never want to see again. Only the second round is picked up by that later de-normalization process. Perhaps you're making changes in the fuggeddabouddit stage. I'll look a bit closer after I get my paper written this weekend. |
@Arithmeticus I think (if I understand correctly) that I might well be trying to intervene in the "fuggeddabouddit" stage...! So the question is, how did you get that pretty output a couple of months ago? I'd be happy to work with that if I could get it after I do my elaborate normalization regime... Here's the deal with Frankenstein: we need the original text b/c we're actually constructing the output edition from the critical apparatus. Even if I didn't want to do that (and I really do; I have a post-processing XSLT pipeline all built and it works to read the critical apparatus and reconstruct each witness while storing the collation data)...but even if I didn't and I just cared about the collation output of normalized tokens, I would still want to read the original against the normalized stuff. It's because of all the complicated regex replacement patterns I need to set in place because I'm collating the markup and the text. It's a practical need to make sure my normalization is doing what I need it to do. I need to be able to see them both together. Irony! I had to call for help from the collateX developers (specifically @djbpitt who was handling the TEI and XML output from collateX), to get them to help me see the normalized tokens instead of only the original text in the critical apparatus. They are like your polar opposites or something (lol). It's funny that in the TAN universe, the normalized versions prevail! I think I had better make a big point in my paper to each set of developers that, hey, seriously, we need to see both the original strings together with the normalized tokens in collation output in order to be able to review our work and build things from it. :-) Anyway, thanks for your wonderfully intricate work on all this! I don't mean to sound ungrateful--I'm just lost in the TAN labyrinth and eager to get it working. |
I get lost in that TAN labyrinth too, sometimes, so I sympathize! And I totally agree, an output that can capture both the original and the normalized commonality would be a boon. An important side question. When you get If you share a scratch of the files you're working with, I can make some suggestions, and look at ways at making TAN Diff+ a bit easier to use. In the last 10 days I've been doing some major changes to |
@Arithmeticus Okay! Those new developments for You are asking about Okay: my files! I've just been reorganizing this repo and digging in to TAN to create a serious workspace for Frankenstein. At the moment, we're digging in very deliberately to a small set of collation "chunks" to try to compare what happens in collateX vs. tan:collate. So let me point you to the files I've been working with most recently:
|
^^^ updated the above to point you to the output. |
Very briefly, attribute The user of TAN Diff+, however, who is thinking primarily about an original string and not about its normalized form for the basis of comparison, may not find the current |
Where you're changing the
In this scenario, the original substrings were "....vv", "...uu", and "..vu" (a dot represents a letter not shown in this example). At a correction stage they were changed to "....v", "....u", and ".v". And then for the purposes of collation, they were normalized further to "..v", ".v", and ".v". The output above allows the user to choose which prenormalization form to use, say, for web display. And a On the other hand, one could argue that this is confusing, that it should look more like this:
I'm not sure which way to go. This needs some deliberation. |
@Arithmeticus I remember you telling me it would be easy to show the original strings at each witness together with the normalized string. I'm having trouble figuring out where and how to do this.
So far I have only been tinkering with TAN-fn-strings-collate-standard.xsl, which seems to be the XSLT that generates the c's and u's and txt that shares the normalized strings. I keep trying to output a text node for
<tan:wit>
, but it seems not to have one... What do I need to invoke to output the original string for each witness at the point of a<u>
or<c>
?You mentioned here (a couple of months ago) that you were tinkering with something that was, indeed, delivering the original strings. So I can see it's possible; just I'm having trouble figuring out where to intervene to deliver that.
The text was updated successfully, but these errors were encountered: