Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Search Phase execution exception in Hybrid Query #393

Closed
mgdunn2 opened this issue Oct 5, 2023 · 7 comments
Closed

[BUG] Search Phase execution exception in Hybrid Query #393

mgdunn2 opened this issue Oct 5, 2023 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@mgdunn2
Copy link

mgdunn2 commented Oct 5, 2023

Describe the bug
When performing a hybrid query with one full text query and one KNN query the search often fails with a search_phase_execution_exception with

To Reproduce
Steps to reproduce the behavior:

  1. Create a new index with a KNN field and a text field
  2. Index 3 documents
  3. Add a normalization processor
  4. Perform a hybrid search with two queries, one with a match query and one with a knn query
  5. See error

The error is intermittent but I see it more often than I don't.

Expected behavior
Hybrid results are returned. Each of the two queries when run individually produce results and the hybrid query sometimes succeeds but more often than not fails.

Plugins
Using neural and KNN plugin. Currently have the security plugin disabled.

Screenshots
Create Index Request:

{
  "settings": {
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "vector": {
        "type": "knn_vector",
        "dimension": 3,
        "method": {
          "engine": "lucene",
          "space_type": "cosinesimil",
          "name": "hnsw",
          "parameters": {}
        }
      },
      "text": {
        "type": "text"
      }
    }
  }
}

Create Normalization Processor (with path _search/pipeline/nlp-search-pipeline) :

{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.2,
              0.8
            ]
          }
        }
      }
    }
  ]
}

Sample Document:

{
  "text": "This is the third test",
  "id": "3",
  "vector": [0.31, 0.2, -0.11]
}

Sample Query (with params search_pipeline=nlp-search-pipeline):

{
	"query": {
		"hybrid": {
			"queries": [
				{
					"bool": {
						"should": {
							"match": {
								"text": {
									"query": "third"
								}
							}
						}
					}
				},
				{
					"knn": {
						"vector": {
							"vector": [
								-0.1,
								-0.2,
								0.3
							],
							"k": 3
						}
					}
				}
			]
		}
	}
}

Error Response:

{
	"error": {
		"root_cause": [],
		"type": "search_phase_execution_exception",
		"reason": "The phase has failed",
		"phase": "query",
		"grouped": true,
		"failed_shards": [],
		"caused_by": {
			"type": "null_pointer_exception",
			"reason": "Cannot invoke \"org.opensearch.search.SearchHit.score(float)\" because \"searchHit\" is null"
		}
	},
	"status": 500
}

Server error:

"org.opensearch.action.search.SearchPhaseExecutionException: The phase has failed
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:677) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:596) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResultConsumed(AbstractSearchAsyncAction.java:581) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.lambda$onShardResult$9(AbstractSearchAsyncAction.java:564) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.QueryPhaseResultConsumer$PendingMerges.consume(QueryPhaseResultConsumer.java:373) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.QueryPhaseResultConsumer.consumeResult(QueryPhaseResultConsumer.java:132) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.onShardResult(AbstractSearchAsyncAction.java:564) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.SearchQueryThenFetchAsyncAction.onShardResult(SearchQueryThenFetchAsyncAction.java:159) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction$1.innerOnResponse(AbstractSearchAsyncAction.java:286) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:59) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.SearchActionListener.onResponse(SearchActionListener.java:44) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:99) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.SearchExecutionStatsCollector.onResponse(SearchExecutionStatsCollector.java:52) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:70) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:746) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.TransportService$6.handleResponse(TransportService.java:880) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1496) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundHandler.doHandleResponse(InboundHandler.java:394) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundHandler.handleResponse(InboundHandler.java:386) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:161) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:115) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:767) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.10.0.jar:2.10.0]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.97.Final.jar:4.1.97.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.97.Final.jar:4.1.97.Final]
	at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.opensearch.search.pipeline.SearchPipelineProcessingException: java.lang.NullPointerException: Cannot invoke "org.opensearch.search.SearchHit.score(float)" because "searchHit" is null
	at org.opensearch.search.pipeline.Pipeline.runSearchPhaseResultsTransformer(Pipeline.java:238) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.search.pipeline.PipelinedRequest.transformSearchPhaseResults(PipelinedRequest.java:40) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:714) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:594) [opensearch-2.10.0.jar:2.10.0]
	... 43 more
Caused by: java.lang.NullPointerException: Cannot invoke "org.opensearch.search.SearchHit.score(float)" because "searchHit" is null
	at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.lambda$updateOriginalFetchResults$3(NormalizationProcessorWorkflow.java:154) ~[?:?]
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) ~[?:?]
	at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:575) ~[?:?]
	at java.util.stream.AbstractPipeline.evaluateToArrayNode(AbstractPipeline.java:260) ~[?:?]
	at java.util.stream.ReferencePipeline.toArray(ReferencePipeline.java:616) ~[?:?]
	at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.updateOriginalFetchResults(NormalizationProcessorWorkflow.java:156) ~[?:?]
	at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.execute(NormalizationProcessorWorkflow.java:70) ~[?:?]
	at org.opensearch.neuralsearch.processor.NormalizationProcessor.process(NormalizationProcessor.java:62) ~[?:?]
	at org.opensearch.search.pipeline.Pipeline.runSearchPhaseResultsTransformer(Pipeline.java:219) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.search.pipeline.PipelinedRequest.transformSearchPhaseResults(PipelinedRequest.java:40) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:714) ~[opensearch-2.10.0.jar:2.10.0]
	at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:594) ~[opensearch-2.10.0.jar:2.10.0]
	... 43 more"

Host/Environment (please complete the following information):

  • OS: Linux running in Google Kubernetes Engine installed with Helm Chart
  • Version 2.10.0

Additional context
I did not see this error when testing with a single node local instance in docker but I'm seeing it consistently in the 3 node cluster in kubernetes.

@mgdunn2 mgdunn2 added bug Something isn't working untriaged labels Oct 5, 2023
@mgdunn2 mgdunn2 changed the title [BUG] Search Phase exception in Hybrid Query [BUG] Search Phase execution exception in Hybrid Query Oct 5, 2023
@martin-gaievski
Copy link
Member

@mgdunn2 can you please confirm that is one shard you're using in this scenario?

@mgdunn2
Copy link
Author

mgdunn2 commented Oct 5, 2023

Yes, one shard. I created the index with the request provided above.

@mgdunn2
Copy link
Author

mgdunn2 commented Oct 5, 2023

Here's the values.yaml for the helm install. It's basically the default except that security is disabled since it isn't exposed and memory/disk have been upped.

The only other thing I did was run a daemonset on the nodes ahead of starting the pods to up the vm.max_map_count as described here

---
clusterName: "opensearch-cluster"
nodeGroup: "master"

# If discovery.type in the opensearch configuration is set to "single-node",
# this should be set to "true"
# If "true", replicas will be forced to 1
singleNode: false

# The service that non master groups will try to connect to when joining the cluster
# This should be set to clusterName + "-" + nodeGroup for your master group
masterService: "opensearch-cluster-master"

# OpenSearch roles that will be applied to this nodeGroup
# These will be set as environment variable "node.roles". E.g. node.roles=master,ingest,data,remote_cluster_client
roles:
  - master
  - ingest
  - data
  - remote_cluster_client

replicas: 3

# if not set, falls back to parsing .Values.imageTag, then .Chart.appVersion.
majorVersion: ""

global:
  # Set if you want to change the default docker registry, e.g. a private one.
  dockerRegistry: ""

# Allows you to add any config files in {{ .Values.opensearchHome }}/config
opensearchHome: /usr/share/opensearch
# such as opensearch.yml and log4j2.properties
config:
  # Values must be YAML literal style scalar / YAML multiline string.
  # <filename>: |
  #   <formatted-value(s)>
  # log4j2.properties: |
  #   status = error
  #
  #   appender.console.type = Console
  #   appender.console.name = console
  #   appender.console.layout.type = PatternLayout
  #   appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] [%node_name]%marker %m%n
  #
  #   rootLogger.level = info
  #   rootLogger.appenderRef.console.ref = console
  opensearch.yml: |
    cluster.name: opensearch-cluster
    plugins.security.disabled: true

    # Bind to all interfaces because we don't know what IP address Docker will assign to us.
    network.host: 0.0.0.0

    # Setting network.host to a non-loopback address enables the annoying bootstrap checks. "Single-node" mode disables them again.
    # Implicitly done if ".singleNode" is set to "true".
    # discovery.type: single-node

    # Start OpenSearch Security Demo Configuration
    # WARNING: revise all the lines below before you go into production
#    plugins:
#      security:
#        ssl:
#          transport:
#            pemcert_filepath: esnode.pem
#            pemkey_filepath: esnode-key.pem
#            pemtrustedcas_filepath: root-ca.pem
#            enforce_hostname_verification: false
#          http:
#            enabled: true
#            pemcert_filepath: esnode.pem
#            pemkey_filepath: esnode-key.pem
#            pemtrustedcas_filepath: root-ca.pem
#        allow_unsafe_democertificates: true
#        allow_default_init_securityindex: true
#        authcz:
#          admin_dn:
#            - CN=kirk,OU=client,O=client,L=test,C=de
#        audit.type: internal_opensearch
#        enable_snapshot_restore_privilege: true
#        check_snapshot_restore_write_privileges: true
#        restapi:
#          roles_enabled: ["all_access", "security_rest_api_access"]
#        system_indices:
#          enabled: true
#          indices:
#            [
#              ".opendistro-alerting-config",
#              ".opendistro-alerting-alert*",
#              ".opendistro-anomaly-results*",
#              ".opendistro-anomaly-detector*",
#              ".opendistro-anomaly-checkpoints",
#              ".opendistro-anomaly-detection-state",
#              ".opendistro-reports-*",
#              ".opendistro-notifications-*",
#              ".opendistro-notebooks",
#              ".opendistro-asynchronous-search-response*",
#            ]
    ######## End OpenSearch Security Demo Configuration ########
  # log4j2.properties:

# Extra environment variables to append to this nodeGroup
# This will be appended to the current 'env:' key. You can use any of the kubernetes env
# syntax here
extraEnvs: []
#  - name: MY_ENVIRONMENT_VAR
#    value: the_value_goes_here

# Allows you to load environment variables from kubernetes secret or config map
envFrom: []
# - secretRef:
#     name: env-secret
# - configMapRef:
#     name: config-map

# A list of secrets and their paths to mount inside the pod
# This is useful for mounting certificates for security and for mounting
# the X-Pack license
secretMounts: []

hostAliases: []
# - ip: "127.0.0.1"
#   hostnames:
#   - "foo.local"
#   - "bar.local"

image:
  repository: "opensearchproject/opensearch"
  # override image tag, which is .Chart.AppVersion by default
  tag: ""
  pullPolicy: "IfNotPresent"

podAnnotations: {}
# iam.amazonaws.com/role: es-cluster

# OpenSearch Statefulset annotations
openSearchAnnotations: {}

# additionals labels
labels: {}

opensearchJavaOpts: "-Xmx512M -Xms512M"

resources:
  requests:
    cpu: "1000m"
    memory: "25Gi"

initResources:
  limits:
     cpu: "25m"
     memory: "16Gi"
  requests:
     cpu: "25m"
     memory: "128Mi"

sidecarResources: {}
#   limits:
#     cpu: "25m"
#     memory: "128Mi"
#   requests:
#     cpu: "25m"
#     memory: "128Mi"

networkHost: "0.0.0.0"

rbac:
  create: false
  serviceAccountAnnotations: {}
  serviceAccountName: ""
  # Controls whether or not the Service Account token is automatically mounted to /var/run/secrets/kubernetes.io/serviceaccount
  automountServiceAccountToken: false

podSecurityPolicy:
  create: false
  name: ""
  spec:
    privileged: true
    fsGroup:
      rule: RunAsAny
    runAsUser:
      rule: RunAsAny
    seLinux:
      rule: RunAsAny
    supplementalGroups:
      rule: RunAsAny
    volumes:
      - secret
      - configMap
      - persistentVolumeClaim
      - emptyDir

persistence:
  enabled: true
  # Set to false to disable the `fsgroup-volume` initContainer that will update permissions on the persistent disk.
  enableInitChown: true
  # override image, which is busybox by default
  # image: busybox
  # override image tag, which is latest by default
  # imageTag:
  labels:
    # Add default labels for the volumeClaimTemplate of the StatefulSet
    enabled: false
  # OpenSearch Persistent Volume Storage Class
  # If defined, storageClassName: <storageClass>
  # If set to "-", storageClassName: "", which disables dynamic provisioning
  # If undefined (the default) or set to null, no storageClassName spec is
  #   set, choosing the default provisioner.  (gp2 on AWS, standard on
  #   GKE, AWS & OpenStack)
  #
  # storageClass: "-"
  accessModes:
    - ReadWriteOnce
  size: 25Gi
  annotations: {}

extraVolumes: []
  # - name: extras
#   emptyDir: {}

extraVolumeMounts: []
  # - name: extras
  #   mountPath: /usr/share/extras
#   readOnly: true

extraContainers: []
  # - name: do-something
  #   image: busybox
#   command: ['do', 'something']

extraInitContainers: []
  # - name: do-somethings
  #   image: busybox
#   command: ['do', 'something']

# This is the PriorityClass settings as defined in
# https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""

# By default this will make sure two pods don't end up on the same node
# Changing this to a region would allow you to spread pods across regions
antiAffinityTopologyKey: "kubernetes.io/hostname"

# Hard means that by default pods will only be scheduled if there are enough nodes for them
# and that they will never end up on the same node. Setting this to soft will do this "best effort"
antiAffinity: "soft"

# This is the node affinity settings as defined in
# https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#node-affinity-beta-feature
nodeAffinity: {}

# This is the pod topology spread constraints
# https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/
topologySpreadConstraints: []

# The default is to deploy all pods serially. By setting this to parallel all pods are started at
# the same time when bootstrapping the cluster
podManagementPolicy: "Parallel"

# The environment variables injected by service links are not used, but can lead to slow OpenSearch boot times when
# there are many services in the current namespace.
# If you experience slow pod startups you probably want to set this to `false`.
enableServiceLinks: true

protocol: https
httpPort: 9200
transportPort: 9300
metricsPort: 9600
httpHostPort: ""
transportHostPort: ""


service:
  labels: {}
  labelsHeadless: {}
  headless:
    annotations: {}
  type: ClusterIP
  # The IP family and IP families options are to set the behaviour in a dual-stack environment
  # Omitting these values will let the service fall back to whatever the CNI dictates the defaults
  # should be
  #
  # ipFamilyPolicy: SingleStack
  # ipFamilies:
  # - IPv4
  nodePort: ""
  annotations: {}
  httpPortName: http
  transportPortName: transport
  metricsPortName: metrics
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  externalTrafficPolicy: ""

updateStrategy: RollingUpdate

# This is the max unavailable setting for the pod disruption budget
# The default value of 1 will make sure that kubernetes won't allow more than 1
# of your pods to be unavailable during maintenance
maxUnavailable: 1

podSecurityContext:
  fsGroup: 1000
  runAsUser: 1000

securityContext:
  capabilities:
    drop:
      - ALL
  # readOnlyRootFilesystem: true
  runAsNonRoot: true
  runAsUser: 1000

securityConfig:
  enabled: true
  path: "/usr/share/opensearch/config/opensearch-security"
  actionGroupsSecret:
  configSecret:
  internalUsersSecret:
  rolesSecret:
  rolesMappingSecret:
  tenantsSecret:
  # The following option simplifies securityConfig by using a single secret and
  # specifying the config files as keys in the secret instead of creating
  # different secrets for for each config file.
  # Note that this is an alternative to the individual secret configuration
  # above and shouldn't be used if the above secrets are used.
  config:
    # There are multiple ways to define the configuration here:
    # * If you define anything under data, the chart will automatically create
    #   a secret and mount it. This is best option to choose if you want to override all the
    #   existing yml files at once.
    # * If you define securityConfigSecret, the chart will assume this secret is
    #   created externally and mount it. This is best option to choose if your intention is to
    #   only update a single yml file.
    # * It is an error to define both data and securityConfigSecret.
    securityConfigSecret: ""
    dataComplete: true
    data: {}
      # config.yml: |-
      # internal_users.yml: |-
      # roles.yml: |-
      # roles_mapping.yml: |-
      # action_groups.yml: |-
    # tenants.yml: |-

# How long to wait for opensearch to stop gracefully
terminationGracePeriod: 120

sysctlVmMaxMapCount: 262144

startupProbe:
  tcpSocket:
    port: 9200
  initialDelaySeconds: 5
  periodSeconds: 10
  timeoutSeconds: 3
  failureThreshold: 30

livenessProbe: {}
  # periodSeconds: 20
  # timeoutSeconds: 5
  # failureThreshold: 10
  # successThreshold: 1
  # initialDelaySeconds: 10
  # tcpSocket:
#   port: 9200

readinessProbe:
  tcpSocket:
    port: 9200
  periodSeconds: 5
  timeoutSeconds: 3
  failureThreshold: 3

## Use an alternate scheduler.
## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
##
schedulerName: ""

imagePullSecrets: []
nodeSelector: {}
tolerations: []

# Enabling this will publically expose your OpenSearch instance.
# Only enable this if you have security enabled on your cluster
ingress:
  enabled: false
  # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
  # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
  # ingressClassName: nginx

  annotations: {}
    # kubernetes.io/ingress.class: nginx
  # kubernetes.io/tls-acme: "true"
  path: /
  hosts:
    - chart-example.local
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

nameOverride: ""
fullnameOverride: ""

masterTerminationFix: false

opensearchLifecycle: {}
  # preStop:
  #   exec:
  #     command: ["/bin/sh", "-c", "echo Hello from the preStart handler > /usr/share/message"]
  # postStart:
  #   exec:
#     command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]

lifecycle: {}
  # preStop:
  #   exec:
  #     command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
  # postStart:
  #   exec:
  #     command:
  #       - bash
  #       - -c
  #       - |
  #         #!/bin/bash
  #         # Add a template to adjust number of shards/replicas1
  #         TEMPLATE_NAME=my_template
  #         INDEX_PATTERN="logstash-*"
  #         SHARD_COUNT=8
  #         REPLICA_COUNT=1
  #         ES_URL=http://localhost:9200
  #         while [[ "$(curl -s -o /dev/null -w '%{http_code}\n' $ES_URL)" != "200" ]]; do sleep 1; done
#         curl -XPUT "$ES_URL/_template/$TEMPLATE_NAME" -H 'Content-Type: application/json' -d'{"index_patterns":['\""$INDEX_PATTERN"\"'],"settings":{"number_of_shards":'$SHARD_COUNT',"number_of_replicas":'$REPLICA_COUNT'}}'

keystore: []
# To add secrets to the keystore:
#  - secretName: opensearch-encryption-key

networkPolicy:
  create: false
  ## Enable creation of NetworkPolicy resources. Only Ingress traffic is filtered for now.
  ## In order for a Pod to access OpenSearch, it needs to have the following label:
  ## {{ template "uname" . }}-client: "true"
  ## Example for default configuration to access HTTP port:
  ## opensearch-master-http-client: "true"
  ## Example for default configuration to access transport port:
  ## opensearch-master-transport-client: "true"

  http:
    enabled: false

# Deprecated
# please use the above podSecurityContext.fsGroup instead
fsGroup: ""

## Set optimal sysctl's through securityContext. This requires privilege. Can be disabled if
## the system has already been preconfigured. (Ex: https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html)
## Also see: https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/
sysctl:
  enabled: false

## Set optimal sysctl's through privileged initContainer.
sysctlInit:
  enabled: false
  # override image, which is busybox by default
  # image: busybox
  # override image tag, which is latest by default
  # imageTag:

## Enable to add 3rd Party / Custom plugins not offered in the default OpenSearch image.
plugins:
  enabled: false
  installList: []
  # - example-fake-plugin

# -- Array of extra K8s manifests to deploy
extraObjects: []
  # - apiVersion: secrets-store.csi.x-k8s.io/v1
  #   kind: SecretProviderClass
  #   metadata:
  #     name: argocd-secrets-store
  #   spec:
  #     provider: aws
  #     parameters:
  #       objects: |
  #         - objectName: "argocd"
  #           objectType: "secretsmanager"
  #           jmesPath:
  #               - path: "client_id"
  #                 objectAlias: "client_id"
  #               - path: "client_secret"
  #                 objectAlias: "client_secret"
  #     secretObjects:
  #     - data:
  #       - key: client_id
  #         objectName: client_id
  #       - key: client_secret
  #         objectName: client_secret
  #       secretName: argocd-secrets-store
  #       type: Opaque
  #       labels:
  #         app.kubernetes.io/part-of: argocd
  # - |
  #    apiVersion: policy/v1
  #    kind: PodDisruptionBudget
  #    metadata:
  #      name: {{ template "opensearch.uname" . }}
  #      labels:
  #        {{- include "opensearch.labels" . | nindent 4 }}
  #    spec:
  #      minAvailable: 1
  #      selector:
  #        matchLabels:
#          {{- include "opensearch.selectorLabels" . | nindent 6 }}

@martin-gaievski
Copy link
Member

thank you for the quick reply, we can see the issue on our end now. Key thing for this to replicate was number of nodes, it's 3 nodes in your config, but 1 shard.

@mgdunn2
Copy link
Author

mgdunn2 commented Oct 6, 2023

Confirmed that setting the shard count equal to the number of data nodes fixed the issue for now as a work around.

@martin-gaievski
Copy link
Member

We have figured out the root cause for this issue. In case of multiple nodes if coordinator node isn't the one that has a shard with the hit docs then results of fetch will be serialized in order to be sent to coordinator node. The doc id field is not serialized and will be set to "-1" for all hits (code ref in core OpenSearch). That causes a problem for score normalization processor as we use that doc_id to form a final result.

I just pushed a fix for this, it will be part of 2.11 release.

@mgdunn2 in a mean time that workaround should do the job, this scenario you you've found - it's specific to one shard.

@navneet1v
Copy link
Collaborator

Resolving this as the issue is resolved in 2.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants