Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text and Keyword aggregation integration tests #176

Draft
wants to merge 9 commits into
base: Integ-newDataTypeForTextAggregations
Choose a base branch
from
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
import static org.opensearch.sql.legacy.TestUtils.getPhraseIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getResponseBody;
import static org.opensearch.sql.legacy.TestUtils.getStringIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getDataTextKeywordIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.getWeblogsIndexMapping;
import static org.opensearch.sql.legacy.TestUtils.isIndexExist;
import static org.opensearch.sql.legacy.TestUtils.loadDataByRestClient;
Expand Down Expand Up @@ -584,7 +585,11 @@ public enum Index {
CALCS(TestsConstants.TEST_INDEX_CALCS,
"calcs",
getMappingFile("calcs_index_mappings.json"),
"src/test/resources/calcs.json"),;
"src/test/resources/calcs.json"),
TEXTKEYWORD(TestsConstants.TEST_INDEX_TEXTKEYWORD,
"textkeyword",
getMappingFile("text_keyword_index_mapping.json"),
"src/test/resources/text_keyword_index.json");

private final String name;
private final String type;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -243,6 +243,11 @@ public static String getDataTypeNonnumericIndexMapping() {
return getMappingFile(mappingFile);
}

public static String getDataTextKeywordIndexMapping() {
String mappingFile = "text_keyword_index_mapping.json";
return getMappingFile(mappingFile);
}

public static void loadBulk(Client client, String jsonPath, String defaultIndex)
throws Exception {
System.out.println(String.format("Loading file %s into opensearch cluster", jsonPath));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ public class TestsConstants {
public final static String TEST_INDEX_BEER = TEST_INDEX + "_beer";
public final static String TEST_INDEX_NULL_MISSING = TEST_INDEX + "_null_missing";
public final static String TEST_INDEX_CALCS = TEST_INDEX + "_calcs";
public final static String TEST_INDEX_TEXTKEYWORD = TEST_INDEX + "_textkeyword";

public final static String DATE_FORMAT = "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'";
public final static String TS_DATE_FORMAT = "yyyy-MM-dd HH:mm:ss.SSS";
Expand Down
258 changes: 258 additions & 0 deletions integ-test/src/test/java/org/opensearch/sql/sql/TextTypeIT.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/

package org.opensearch.sql.sql;

import org.junit.Test;
import org.opensearch.sql.legacy.SQLIntegTestCase;

import java.io.IOException;

import static org.opensearch.sql.legacy.TestsConstants.TEST_INDEX_TEXTKEYWORD;
import static org.opensearch.sql.util.MatcherUtils.schema;
import static org.opensearch.sql.util.MatcherUtils.verifySchema;

public class TextTypeIT extends SQLIntegTestCase {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other tests that you tried? These only look like aggregation tests, but it would be good to know if some of the following would also work:

  • WHERE field LIKE "keyFD??"
  • WHERE wildcard("field", "keyFD??")
  • SELECT field LIKE "keyFD??"
    and maybe a couple of string-like functions:
  • SELECT LOCATE("FD", field)
  • SELECT SUBSTRING(field, 3, 2)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also try

  • SELECT POSITION(substring IN field) since it seems to fail for text fields (as Margarit demonstrated)

Copy link
Author

@MitchellGale MitchellGale Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SELECT POSITION(substring IN field)

Passed:

  • selectPositionKeyword
  • selectPositionText
  • selectPositionTextKeywordFieldNoFieldData
  • selectPositionTypeTextFieldData
  • selectLocateTextDataFieldNoFields
  • selectPositionTextDataFieldNoFields

Failed:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check selectPositionTextDataFieldNoFields - its giving you a parser error.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, this looks exclusively like an issue with aggregation on text fields.
Strangely, I thought POSITION was going to fail on text fields, since it was failing for Margarit earlier.



@Override
public void init() throws Exception {
super.init();
loadIndex(Index.TEXTKEYWORD);
loadIndex(Index.CALCS);

}

// Select

@Test
public void textKeywordTest() {
var result = executeJdbcRequest(String.format("select typeKeyword from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeKeyword", null, "keyword"));
}

@Test
public void aggregateOnText() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeText", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnKeyword() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeKeyword", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnTextFieldData() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeTextFieldData", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnKeywordFieldData() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY typeKeywordFieldNoFieldData", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

@Test
public void aggregateOnTextAndFieldDataNoFields() {
var result = executeJdbcRequest(String.format("select sum(int0) from %s GROUP BY textDataFieldNoFields", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("sum(int0)", null, "integer"));
}

// Where like

@Test
public void whereLikeKeyword() {
var result = executeJdbcRequest(String.format("select typeKeyword from %s WHERE typeKeyword LIKE \\\"key*\\\"", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeKeyword", null, "keyword"));
}

@Test
public void whereLikeText() {
var result = executeJdbcRequest(String.format("select typeText from %s WHERE typeText LIKE \\\"text*\\\"", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeText", null, "text"));
}

@Test
public void whereLikeKeywordFieldNoFieldData() {
var result = executeJdbcRequest(String.format("select typeKeywordFieldNoFieldData from %s WHERE typeKeywordFieldNoFieldData LIKE \\\"keyword*\\\"", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeKeywordFieldNoFieldData", null, "text"));
}

@Test
public void whereLikeTextFieldData() {
var result = executeJdbcRequest(String.format("select typeTextFieldData from %s WHERE typeTextFieldData LIKE \\\"keyFD*\\\"", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeTextFieldData", null, "text"));
}

@Test
public void whereLiketextDataFieldNoFields() {
var result = executeJdbcRequest(String.format("select textDataFieldNoFields from %s WHERE textDataFieldNoFields LIKE \\\"textFDNF*\\\"", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("textDataFieldNoFields", null, "text"));
}

// Wildcard

@Test
public void whereWildcardKeyword() {
var result = executeJdbcRequest(String.format("select typeKeyword from %s WHERE wildcard_query(typeKeyword, \\\"key*\\\")", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeKeyword", null, "keyword"));
}

@Test
public void whereWildcardText() {
var result = executeJdbcRequest(String.format("select typeText from %s WHERE wildcard_query(\\\"typeText\\\", \\\"text*\\\")", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeText", null, "text"));
}

@Test
public void whereWildcardKeywordFieldNoFieldData() {
var result = executeJdbcRequest(String.format("select typeKeywordFieldNoFieldData from %s WHERE wildcard_query(\\\"typeKeywordFieldNoFieldData\\\", \\\"keyword*\\\")", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeKeywordFieldNoFieldData", null, "text"));
}

@Test
public void whereWildcardTextFieldData() {
var result = executeJdbcRequest(String.format("select typeTextFieldData from %s WHERE wildcard_query(\\\"typeTextFieldData\\\", \\\"keyFD*\\\")", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeTextFieldData", null, "text"));
}

@Test
public void whereWildcardtextDataFieldNoFields() {
var result = executeJdbcRequest(String.format("select textDataFieldNoFields from %s WHERE wildcard_query(\\\"textDataFieldNoFields\\\", \\\"textFDNF*\\\")", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("textDataFieldNoFields", null, "text"));
}

// Locate

@Test
public void selectLocateKeyword() {
var result = executeJdbcRequest(String.format("select locate(\\\"key*\\\", typeKeyword) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("locate(\"key*\", typeKeyword)", null, "integer"));
}

@Test
public void selectLocateText() {
var result = executeJdbcRequest(String.format("select locate(\\\"text*\\\", typeText) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("locate(\"text*\", typeText)", null, "integer"));
}

@Test
public void selectLocateTextKeywordFieldNoFieldData() {
var result = executeJdbcRequest(String.format("select locate(\\\"keyword*\\\", typeKeywordFieldNoFieldData) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("locate(\"keyword*\", typeKeywordFieldNoFieldData)", null, "integer"));
}

@Test
public void selectLocateTypeTextFieldData() {
var result = executeJdbcRequest(String.format("select locate(\\\"keyFD*\\\", typeTextFieldData) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("locate(\"keyFD*\", typeTextFieldData)", null, "integer"));
}

@Test
public void selectLocateTextDataFieldNoFields() {
var result = executeJdbcRequest(String.format("select locate(\\\"textFDNF*\\\", textDataFieldNoFields) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("locate(\"textFDNF*\", textDataFieldNoFields)", null, "integer"));
}

// Position

@Test
public void selectPositionKeyword() {
var result = executeJdbcRequest(String.format("select POSITION(\\\"key\\\" IN typeKeyword) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("LOCATE('key', typeKeyword)", null, "double"));
}

@Test
public void selectPositionText() throws IOException {
var result = executeQuery(String.format("select POSITION(\\\"text\\\" IN typeText) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("typeText", null, "double"));

// }
}

@Test
public void selectPositionTextKeywordFieldNoFieldData() {
var result = executeJdbcRequest(String.format("select POSITION(\\\"keyword\\\" IN typeKeywordFieldNoFieldData) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("LOCATE('keyword', typeKeywordFieldNoFieldData)", null, "double"));
}

@Test
public void selectPositionTypeTextFieldData() throws IOException {
var result = executeQuery(String.format("select POSITION(\\\"keyFD\\\" IN typeTextFieldData) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("LOCATE('keyFD', typeTextFieldData)", null, "double"));
}

@Test
public void selectPositionTextDataFieldNoFields() {
var result = executeJdbcRequest(String.format("select POSITION(\\\"textFDNF\\\" IN textDataFieldNoFields) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("LOCATE('textFDNF', textDataFieldNoFields)", null, "double"));
}

// Substring

@Test
public void selectSubstringKeyword() {
var result = executeJdbcRequest(String.format("select SUBSTRING(typeKeyword, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("SUBSTRING(typeKeyword, 1, 1)", null, "keyword"));
}

@Test
public void selectSubstringText() {
var result = executeJdbcRequest(String.format("select SUBSTRING(typeText, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("SUBSTRING(typeText, 1, 1)", null, "keyword"));
}

@Test
public void selectSubstringTextKeywordFieldNoFieldData() {
var result = executeJdbcRequest(String.format("select SUBSTRING(typeKeywordFieldNoFieldData, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("SUBSTRING(typeKeywordFieldNoFieldData, 1, 1)", null, "keyword"));
}

@Test
public void selectSubstringTypeTextFieldData() {
var result = executeJdbcRequest(String.format("select SUBSTRING(typeTextFieldData, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("SUBSTRING(typeTextFieldData, 1, 1)", null, "keyword"));
}

@Test
public void selectSubstringTextDataFieldNoFields() {
var result = executeJdbcRequest(String.format("select SUBSTRING(textDataFieldNoFields, 1, 1) from %s", TEST_INDEX_TEXTKEYWORD));
verifySchema(result,
schema("SUBSTRING(textDataFieldNoFields, 1, 1)", null, "keyword"));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
"type" : "double"
},
"str0" : {
"type" : "keyword"
"type" : "integer"
},
"str1" : {
"type" : "keyword"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"mappings" : {
"properties" : {
"typeKeyword" : {
"type" : "keyword"
},
"typeText" : {
"type" : "text"
},
"typeKeywordFieldNoFieldData" : {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try adding a row with more than 10 words. Interesting to test this too.

}
} },
"typeTextFieldData" : {
"type": "text",
"fielddata": true,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be interesting to know how it works if we have a text field with "fielddata": true but doesn't have any fields defined. Does this behave just like typeTextFieldData?

Copy link
Author

@MitchellGale MitchellGale Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 2a8fbac. It passes.

"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 10
}
}
},
"textDataFieldNoFields" : {
"type": "text",
"fielddata": true
},
"int0" : {
"type": "integer"
}
}
}
}
16 changes: 16 additions & 0 deletions integ-test/src/test/resources/text_keyword_index.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{"index": {"_id":"1"}}
{"typeKeyword": "key00", "typeText": "text00", "typeKeywordFieldNoFieldData": "keyword00","typeTextFieldData": "keyFD00", "typeKeywordFieldData": "textFD00", "textDataFieldNoFields": "textFDNF00","int0": 0}
{"index": {"_id":"2"}}
{"typeKeyword": "key01", "typeText": "text01", "typeKeywordFieldNoFieldData": "keyword01", "typeTextFieldData": "keyFD01", "typeKeywordFieldData": "textFD01OverTen", "textDataFieldNoFields": "textFDNF01", "int0": 1}
{"index": {"_id":"3"}}
{"typeKeyword": "key02", "typeText": "text02", "typeKeywordFieldNoFieldData": "keyword02", "typeTextFieldData": "keyFD02", "typeKeywordFieldData": "textFD02", "textDataFieldNoFields": "textFDNF02", "int0": 2}
{"index": {"_id":"4"}}
{"typeKeyword": "key03", "typeText": "text03", "typeKeywordFieldNoFieldData": "keyword03", "typeTextFieldData": "keyFD03OverTen", "typeKeywordFieldData": "textFD03", "textDataFieldNoFields": "textFDNF03", "int0": 3}
{"index": {"_id":"5"}}
{"typeKeyword": "key04", "typeText": "text04", "typeKeywordFieldNoFieldData": "keyword04", "typeTextFieldData": "keyFD04", "typeKeywordFieldData": "textFD04", "textDataFieldNoFields": "textFDNF04", "int0": 4}
{"index": {"_id":"6"}}
{"typeKeyword": "key05", "typeText": "text05", "typeKeywordFieldNoFieldData": "keyword05", "typeTextFieldData": "keyFD05", "typeKeywordFieldData": "textFD0OverTen5", "textDataFieldNoFields": "textFDNF05", "int0": 5}
{"index": {"_id":"7"}}
{"typeKeyword": "key06", "typeText": "text06", "typeKeywordFieldNoFieldData": "keyword06", "typeTextFieldData": "keyFD06OverTen", "typeKeywordFieldData": "textFD06", "textDataFieldNoFields": "textFDNF06", "int0": 6}
{"index": {"_id":"8"}}
{"typeKeyword": "key07", "typeText": "text07", "typeKeywordFieldNoFieldData": "keyword07", "typeTextFieldData": "keyFD07", "typeKeywordFieldData": "textFD07", "textDataFieldNoFields": "textFDNF07", "int0": 7}