Skip to content

Query optimization in ExistDB

vogelsgesang edited this page Apr 24, 2014 · 4 revisions

One problem that we faced with it was query execution time . To solve this problem we apply Range Indexing to our document Marc21 . without a range index, eXist has to do a full scan over the context nodes to look up an element value, which severly limits performance and scalability.

Here you can find configuration.xconf file :

<collection xmlns="http://exist-db.org/collection-config/1.0"  xmlns:xs="http://www.w3.org/2001/XMLSchema">
       <index xmlns:MARC21="http://www.loc.gov/MARC21/slim">
        <fulltext default="none"/>
        <!-- Range indexes -->
        <range>
            <create qname="MARC21:controlfield" type="xs:string"/>
            <create qname="MARC21:datafield">
                 <field name="tag" match="@tag" type="xs:string"/>
                 <field name="subfield" match="MARC21:subfield" type="xs:string"/>
                 <field name="code" match="@code" type="xs:string"/>
            </create>
        </range>
    </index>
</collection>

We try with different queries as a following , execution time reduce to from 13 sec to millisecond.

First Query :

xquery version "3.0";

declare namespace MARC21="http://www.loc.gov/MARC21/slim";
let $col:=collection("/db/book")
return
$col/MARC21:collection/MARC21:record[MARC21:datafield[@tag="653"]/MARC21:subfield[. = 'algebraic'][@code='a']]


Second Query :

xquery version "3.0";

declare default element namespace 'http://www.loc.gov/MARC21/slim';

let $col:=collection("/db/book")
return
$col/collection/record[controlfield[@tag='003'][.='SzGeCERN']][datafield[@tag='260'][subfield[@code='a'][.='Rockville, MD']][subfield[@code='b'][.='Computer Science Press']]]

Third Query :

declare default element namespace 'http://www.loc.gov/MARC21/slim';
let $col:=collection("/db/book")
let $records:=$col/collection/record[datafield[subfield='Rockville, MD'][subfield='Computer Science Press']]
return

$records[controlfield[@tag='003'][.='SzGeCERN']]