Searching `Patient` by `Identifier` is slow. #476

whyfate · 2022-05-06T08:47:14Z

Is your feature request related to a problem? Please describe.
I have 100w patient data,and use identifier to search very slowly.

GET /fhir/Patient?identifier=test

kennethmyhra · 2022-05-06T09:00:04Z

Doing a test on one of my test servers with the search below returns in about 140-150 milliseconds. There are 405 patient entries in the database.

GET https://spark.incendi.no/fhir/Patient?identifier=12345

The server is a Ubuntu 20.04 with 1 CPU core and 2GB of memory.

whyfate · 2022-05-06T10:36:44Z

Yes, when the amount of data is still small, the query efficiency is OK, but when the amount of data becomes more, the efficiency is very low.
My server is Ubuntu 18.04 with 8 CPU core and 32GB of memory.
100w patient entries search need 3s.

kennethmyhra · 2022-05-06T19:54:15Z

So we are talking about a million patient entries. I was not aware of the suffix w meaning 10 000.

Have you tried adding these indexes? https://github.com/FirelyTeam/spark/blob/r4/master/scripts/CreateIndexes.js

If you are lacking the index at line 2 adding this should help with the lack of performance you are seeing, could be it needs a slight modification since you are effectively not including system when querying the identifier.

As always when working with indexes. Profiling your database should help further pinpointing the issue. Also please test these indexes in a test environment and backup your database before running that script.

whyfate · 2022-05-07T00:57:45Z

Yes,I add a index to Identifier,But it has little effect on the query. I think the problem is here

spark/src/Spark.Mongo/Search/Searcher/CriteriaMongoExtensions.cs

Lines 321 to 383 in 6bdd48f

    
           private static FilterDefinition<BsonDocument> TokenQuery(String parameterName, Operator optor, String modifier, ValueExpression operand) 
        
           { 
        
               //$elemMatch only works on array values. But the MongoIndexMapper only creates an array if there are multiple values for a given parameter. 
        
               //So we also construct a query for when there is only one set of values in the searchIndex, hence there is no array. 
        
               string systemfield = parameterName + ".system"; 
        
               string codefield = parameterName + ".code"; 
        
               string textfield = parameterName + ".text"; 
        
               switch (optor) 
        
               { 
        
                   case Operator.EQ: 
        
                       var typedEqOperand = ((UntypedValue)operand).AsTokenValue(); 
        
                       if (modifier == Modifier.TEXT) 
        
                       { 
        
                           return Builders<BsonDocument>.Filter.Regex(textfield, new BsonRegularExpression(typedEqOperand.Value, "i")); 
        
                       } 
        
                       else //Search on code and system 
        
                       { 
        
                           //Set up two variants of queries, for dealing with single token values in the index, and multiple (in an array). 
        
                           var arrayQueries = new List<FilterDefinition<BsonDocument>>(); 
        
                           var noArrayQueries = new List<FilterDefinition<BsonDocument>>{ 
        
                               Builders<BsonDocument>.Filter.Not(Builders<BsonDocument>.Filter.Type(parameterName, BsonType.Array))}; 
        
                           var plainStringQueries = new List<FilterDefinition<BsonDocument>>{ 
        
                               Builders<BsonDocument>.Filter.Type(parameterName, BsonType.String)}; 
        
                           if (modifier == Modifier.NOT) //NOT modifier only affects matching the code, not the system 
        
                           { 
        
                               noArrayQueries.Add(Builders<BsonDocument>.Filter.Exists(parameterName)); 
        
                               noArrayQueries.Add(Builders<BsonDocument>.Filter.Ne(codefield, typedEqOperand.Value)); 
        
                               arrayQueries.Add(Builders<BsonDocument>.Filter.Exists(parameterName)); 
        
                               arrayQueries.Add(Builders<BsonDocument>.Filter.Ne("code", typedEqOperand.Value)); 
        
                               plainStringQueries.Add(Builders<BsonDocument>.Filter.Exists(parameterName)); 
        
                               plainStringQueries.Add(Builders<BsonDocument>.Filter.Ne(parameterName, typedEqOperand.Value)); 
        
                           } 
        
                           else 
        
                           { 
        
                               noArrayQueries.Add(Builders<BsonDocument>.Filter.Eq(codefield, typedEqOperand.Value)); 
        
                               arrayQueries.Add(Builders<BsonDocument>.Filter.Eq("code", typedEqOperand.Value)); 
        
                               plainStringQueries.Add(Builders<BsonDocument>.Filter.Eq(parameterName, typedEqOperand.Value)); 
        
                           } 
        
                           //Handle the system part, if present. 
        
                           if (!typedEqOperand.AnyNamespace) 
        
                           { 
        
                               if (string.IsNullOrWhiteSpace(typedEqOperand.Namespace)) 
        
                               { 
        
                                   arrayQueries.Add(Builders<BsonDocument>.Filter.Exists("system", false)); 
        
                                   noArrayQueries.Add(Builders<BsonDocument>.Filter.Exists(systemfield, false)); 
        
                                   plainStringQueries.Add(Builders<BsonDocument>.Filter.Exists("system", false)); 
        
                               } 
        
                               else 
        
                               { 
        
                                   arrayQueries.Add(Builders<BsonDocument>.Filter.Eq("system", typedEqOperand.Namespace)); 
        
                                   noArrayQueries.Add(Builders<BsonDocument>.Filter.Eq(systemfield, typedEqOperand.Namespace)); 
        
                                   plainStringQueries.Add(Builders<BsonDocument>.Filter.Eq("system", typedEqOperand.Namespace)); 
        
                               } 
        
                           } 
        
                           //Combine code and system 
        
                           var arrayEqQuery = Builders<BsonDocument>.Filter.ElemMatch(parameterName, Builders<BsonDocument>.Filter.And(arrayQueries)); 
        
                           var noArrayEqQuery = Builders<BsonDocument>.Filter.And(noArrayQueries); 
        
                           var plainStringQuery = Builders<BsonDocument>.Filter.And(plainStringQueries); 
        
                           return Builders<BsonDocument>.Filter.Or(arrayEqQuery, noArrayEqQuery, plainStringQuery);

whyfate · 2022-05-07T01:32:04Z

It's the slow query log.

{
    "s": "I", 
    "c": "COMMAND", 
    "id": 51803, 
    "ctx": "conn581", 
    "msg": "Slow query", 
    "attr": {
        "type": "command", 
        "ns": "spark.searchindex", 
        "command": {
            "find": "searchindex", 
            "filter": {
                "internal_level": 0, 
                "internal_resource": "Patient", 
                "$or": [
                    {
                        "identifier": {
                            "$elemMatch": {
                                "code": "test"
                            }
                        }
                    }, 
                    {
                        "identifier": {
                            "$not": {
                                "$type": 4
                            }
                        }, 
                        "identifier.code": "test"
                    }, 
                    {
                        "$and": [
                            {
                                "identifier": {
                                    "$type": 2
                                }
                            }, 
                            {
                                "identifier": "test"
                            }
                        ]
                    }
                ]
            }, 
            "projection": {
                "internal_selflink": 1
            }, 
            "$db": "spark", 
            "lsid": {
                "id": {
                    "$uuid": "748eacb6-cce5-4088-9e40-67b08adea3ed"
                }
            }
        }, 
        "planSummary": "IXSCAN { internal_level: 1, internal_resource: 1 }", 
        "keysExamined": 1001234, 
        "docsExamined": 1001234, 
        "cursorExhausted": true, 
        "numYields": 1001, 
        "nreturned": 0, 
        "queryHash": "F3A3EA5B", 
        "planCacheKey": "E06A389B", 
        "reslen": 106, 
        "locks": {
            "ReplicationStateTransition": {
                "acquireCount": {
                    "w": 1002
                }
            }, 
            "Global": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Database": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Collection": {
                "acquireCount": {
                    "r": 1002
                }
            }, 
            "Mutex": {
                "acquireCount": {
                    "r": 1
                }
            }
        }, 
        "storage": { }, 
        "protocol": "op_msg", 
        "durationMillis": 2469
    }
}

This query used '$or', so it didn't hit the identifier index.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searching `Patient` by `Identifier` is slow. #476

Searching `Patient` by `Identifier` is slow. #476

whyfate commented May 6, 2022

kennethmyhra commented May 6, 2022 •

edited

Loading

whyfate commented May 6, 2022

kennethmyhra commented May 6, 2022

whyfate commented May 7, 2022

whyfate commented May 7, 2022

Searching Patient by Identifier is slow. #476

Searching Patient by Identifier is slow. #476

Comments

whyfate commented May 6, 2022

kennethmyhra commented May 6, 2022 • edited Loading

whyfate commented May 6, 2022

kennethmyhra commented May 6, 2022

whyfate commented May 7, 2022

whyfate commented May 7, 2022

Searching `Patient` by `Identifier` is slow. #476

Searching `Patient` by `Identifier` is slow. #476

kennethmyhra commented May 6, 2022 •

edited

Loading