From 44d70bc95e07161a9747269488355dbdfdb63014 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Erik=20K=C3=B6rner?= Date: Fri, 15 Mar 2024 16:10:56 +0100 Subject: [PATCH] Some more ... --- .../query-translation.adoc | 58 ++++++++ .../reference-implementations.adoc | 84 +++++++++++ .../resources-and-dataviews.adoc | 130 ++++++++++++++++++ 3 files changed, 272 insertions(+) create mode 100644 fcs-endpoint-dev-slides/reference-implementations.adoc create mode 100644 fcs-endpoint-dev-slides/resources-and-dataviews.adoc diff --git a/fcs-endpoint-dev-slides/query-translation.adoc b/fcs-endpoint-dev-slides/query-translation.adoc index ca3269d..9d8ffa4 100644 --- a/fcs-endpoint-dev-slides/query-translation.adoc +++ b/fcs-endpoint-dev-slides/query-translation.adoc @@ -11,3 +11,61 @@ [.small] == Query Languages + +[.position-absolute.right--20.zindex--1] +image::cql-js-screenshot.png[CQL-JS Demo] + + +[.text-left] +== FCS-QL – Visualization + +[.position-absolute.right--20.width-50.zindex--1] +image::fcsql-parse-tree-java.png[FCS-QL parse tree] + +* Installation ++ +[.code-width-full,bash] +---- +pip install antlr4-tools +git clone https://github.com/clarin-eric/fcs-ql.git +cd fcs-ql/src/main/antlr4/eu/clarin/sru/fcs/qlparser +---- + +[.mt-5] +* Visualization according to https://github.com/antlr/antlr4/blob/master/doc/getting-started.md[ANTLR4 > Getting Started] ++ +[.code-width-full,bash] +---- +antlr4-parse src/fcsql/FCSParser.g4 src/fcsql/FCSLexer.g4 query -gui +[ word = "her.*" ] [ lemma = "Artznei" ] [ pos = "VERB" ] +^D +---- + + +[.text-left.small] +== FCS-QL Query Nodes + + +[.text-left.small] +== FCS-QL Query Nodes – Aggregator + +[.position-absolute.width-50.right--20.opacity-50.zindex--1] +image::fcsql-querybuilder-complex.png[FCS-QL Query Builder] + + +== FCS-QL – Remarks + + +[.small] +== Query-Mapping + + +ifdef::backend-revealjs[] +[.small] +== Query-Mapping (2) +endif::[] + +* ElasticSearch + +* Solr + diff --git a/fcs-endpoint-dev-slides/reference-implementations.adoc b/fcs-endpoint-dev-slides/reference-implementations.adoc new file mode 100644 index 0000000..cac90c7 --- /dev/null +++ b/fcs-endpoint-dev-slides/reference-implementations.adoc @@ -0,0 +1,84 @@ +[background-image="fcs-render-uk.png",background-opacity="0.5"] += Reference Implementations + +[.notes] +-- +* Java and Python, focus on FCS endpoints +* Java class hierarchies, organization & structure, processes & lifecycles, configuration +-- + + +[.small] +== CLARIN Reference Libraries (Java) + +* Development started ~2012 +* Modularized: Client/Server, SRU/FCS, Parser +* in Java 1.8+ (https://endoflife.date/oracle-jdk[_EOL: Ende 2030_]) +* Extensive documentation, some tests (_proven by being in use for a long time_) +* Artifacts in https://nexus.clarin.eu[CLARIN Nexus], Code on https://github.com/clarin-eric/?q=fcs[Github] +* Server/endpoint: external dependencies to + +** Logging: `slf4j` +** HTTP: `javax.servlet:servlet-api` +** Parser: `antlr4` (FCS-QL) / CQL + +* Build: maven +* Deployment: jetty, tomcat, … + + +[.small] +== CLARIN Reference Libraries (Python) + +* ~ 2022: Translation of Java reference libraries to Python +* Strong orientation towards the Java reference libraries ++ +→ (fast) (almost) identical interfaces, class/function names +* but: slight optimizations for Python, no 1:1 copy +* Focus on (new) FCS endpoints → no clients! +* Typed, documented; published on PyPI +* Synchronous, minimal WSGI - allows embedding in existing apps +* Python 3.8+ +* Dependencies to + +** XML parsing: `lxml` +** HTTP/WSGI: `werkzeug` +** Query Parser: `PLY` (CQL), `ANTLR4` (FCS-QL) + + +[.text-left.small] +== CLARIN Reference Libraries + +* FCS SRU Server: https://github.com/clarin-eric/fcs-sru-server/[Java] (https://clarin-eric.github.io/fcs-sru-server/apidocs/index.html[docs]), https://github.com/Querela/fcs-sru-server-python/[Python] (https://fcs-sru-server-python.readthedocs.io/en/latest/[docs]) +* FCS Simple Endpoint: https://github.com/clarin-eric/fcs-simple-endpoint[Java] (https://clarin-eric.github.io/fcs-simple-endpoint/apidocs/index.html[docs]), https://github.com/Querela/fcs-simple-endpoint-python[Python] (https://fcs-simple-endpoint-python.readthedocs.io/en/latest/[docs]) + +[.mt-2] +* FCS SRU Client: https://github.com/clarin-eric/fcs-sru-client/[Java] (https://clarin-eric.github.io/fcs-sru-client/apidocs/index.html[docs]) +* FCS Simple Client: https://github.com/clarin-eric/fcs-simple-client[Java] (https://clarin-eric.github.io/fcs-simple-client/apidocs/index.html[docs]) + +[.mt-2] +* CQL Parser: https://github.com/indexdata/cql-java[Java] (http://zing.z3950.org/cql/java/docs/index.html[docs]?), https://github.com/Querela/cql-python[Python], https://github.com/Querela/cql-js[JavaScript] +* FCS-QL Parser: https://github.com/clarin-eric/fcs-ql[Java], https://github.com/Querela/fcs-ql-python[Python] (https://fcs-ql-python.readthedocs.io/en/latest/[docs]) + +[.mt-2] +* Maven Endpoint Archetype: https://github.com/clarin-eric/fcs-endpoint-archetype[Java] +* FCS SRU Aggregator: https://github.com/clarin-eric/fcs-sru-aggregator[Java] +* FCS Endpoint Validator: https://github.com/clarin-eric/fcs-endpoint-tester[Java] (old), https://github.com/saw-leipzig/fcs-endpoint-validator[Java] ← test compliance with _SRU/FCS protocol_ +* Korp: https://github.com/clarin-eric/fcs-korp-endpoint/[Java], https://github.com/Querela/fcs-korp-endpoint-python/[Python] + +_https://github.com/indexdata/[Indexdata]: CQL-Parser, https://github.com/Querela/[Querela]: Python implementations_ + +[.notes] +-- +* Note: concrete examples and implementations will follow in a later section, high-level overview here +-- + + +[.small] +== FCS Endpoint – Design and structure + +* Query Parser (CQL, FCS-QL) + +[.mt-2] +* *FCS SRU Server* + + diff --git a/fcs-endpoint-dev-slides/resources-and-dataviews.adoc b/fcs-endpoint-dev-slides/resources-and-dataviews.adoc new file mode 100644 index 0000000..d922f44 --- /dev/null +++ b/fcs-endpoint-dev-slides/resources-and-dataviews.adoc @@ -0,0 +1,130 @@ +[background-image="fcs-render-uk.png",background-opacity="0.5"] += Resources and Data Views + +[.notes] +-- +* Endpoint Capabilities, BASIC/ADVANCED Search, FCS-QL +* Resource, Resource Fragment, Data View (Hits, Advanced) +* Result serialization, query languages +-- + + +[.text-left] +== Endpoint Description – Capabilities + +*\http://clarin.eu/fcs/capability/basic-search* + +* Mandatory +* DataView: HITS + +[.mt-5] +*\http://clarin.eu/fcs/capability/advanced-search* + +* Optional +* DataView: HITS and Advanced + + +ifdef::backend-revealjs[] +== Endpoint Description – Capabilities (2) +endif::[] + + + +[.text-left] +== BASIC Search + +[.position-absolute.right--30.width-50.opacity-50,x86asm] +---- +cat +"cat" +cat AND dog +"grumpy cat" +"grumpy cat" AND dog +"grumpy cat" OR "lazy dog" +cat AND (mouse OR "lazy dog") +---- + +*``\http://clarin.eu/fcs/capability/basic-search``* + + +[.text-left] +== ADVANCED Search + +[.position-absolute.right--30.width-50.opacity-50,x86asm] +---- +"walking" +[token = "walking"] +"Dog" /c +[word = "Dog" /c] +[pos = "NOUN"] +[pos != "NOUN"] +[lemma = "walk"] +"blaue|grüne" [pos = "NOUN"] +"dogs" []{3,} "cats" within s +[z:pos = "ADJ"] +[z:pos = "ADJ" & q:pos = "ADJ"] +---- + + +*``\http://clarin.eu/fcs/capability/advanced-search``* + + +== FCS-QL + + + +== FCS-QL – Notes + + + +== FCS-QL – Layer Types + +// ._Advanced Search_ Layer types with description and examples +[.x-small%header,cols="1m,5,1,3"] +|=== +|{set:cellbgcolor}Layer Type Identifier +|Annotation Layer Description +|Syntax +|Examples (without quotes) + +|text +|Textual representation of resource, also the layer that is used in Basic Search +|String +|"Dog", "cat" "walking", "better" + +|lemma +|Lemmatisation +|String +|"good", "walk", "dog" + +|pos +|Part-of-Speech annotations +|<> tags +|"NOUN", "VERB", "ADJ" + +|orth +|Orthographic transcription of (mostly) spoken resources +|String +|"dug", "cat", "wolking" + +|norm +|Orthographic normalization of (mostly) spoken resources +|String +|"dog", "cat", "walking", "best" + +|phonetic +|Phonetic transcription +|<> +|"'du:", "'vi:-d6 'ha:-b@n" +|=== + +[.refs.xx-small] +-- +* [[ref:UD-POS]]Universal Dependencies, https://universaldependencies.github.io/u/pos/index.html[Universal POS tags v2.0] +* [[ref:SAMPA]]Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7 +-- + + +== FCS-QL – Layer Type Identifier + +