diff --git a/_episodes/09-intro-nosql.md b/_episodes/09-intro-nosql.md deleted file mode 100644 index c241427..0000000 --- a/_episodes/09-intro-nosql.md +++ /dev/null @@ -1,33 +0,0 @@ ---- -title: "Introduction to NoSQL" -teaching: 60 -exercises: 30 -questions: -- "" -- "" -objectives: -- "" -- "" -keypoints: -- "" -- "" ---- - -# NOSQL Databases -NSQL databases diverge from the traditional table-based structure of RDMS and are designed to handle unstructured or -semi-structured data. They offer flexibility in data modeling and storage, supporting various data formats. Types of NoSQL database are : - -| NoSQL Database Type | Description | Examples | -| ------------------------- | ------------------------------------------------------------ | -------------------------------------------- | -| Key-Value Store | Stores data as key-value pairs. Simple and efficient for basic storage and retrieval operations. | Redis, DynamoDB, Riak | -| Document-Oriented | Stores data in flexible JSON-like documents, allowing nested structures and complex data modeling. | MongoDB, Couchbase, CouchDB, OpenSearch, Elasticsearch | -| Column-Family Store | Organizes data into columns rather than rows, suitable for analytical queries and data warehousing. | Apache Cassandra, HBase, ScyllaDB | -| Graph Database | Models data as nodes and edges, ideal for complex relationships and network analysis. | Neo4j, ArangoDB, OrientDB | -| Wide-Column Store | Similar to column-family stores but optimized for wide rows and scalable columnar data storage. | Apache HBase, Apache Kudu, Google Bigtable | - -# Opensearch Databases -Opensearch is kind of NoSQL database which is document oriented. It stores data as JSON documents. -It is also a distributed search and analytics engine designed for scalability, real-time data processing, and full-text search capabilities. -It is often used for log analytics, monitoring, and exploring large volumes of structured and unstructured data. - -In the following chapters, we will build a metadata search engine/database. We will exploit the functionality of OpenSearch to create a database where we can store files with their corresponding metadata, and look for the files that match metadata queries. diff --git a/_episodes/10-opensearch-queires.md b/_episodes/09-opensearch-queires.md similarity index 82% rename from _episodes/10-opensearch-queires.md rename to _episodes/09-opensearch-queires.md index 88e6c50..0f4ca78 100644 --- a/_episodes/10-opensearch-queires.md +++ b/_episodes/09-opensearch-queires.md @@ -1,8 +1,9 @@ --- -title: "Opensearch Queries" +title: "Intro to NoSQL and Opensearch Queries" teaching: x exercises: 6 questions: +- "What is NoSQL database and Opensearch?" - "How to perform indexing in Opensearch?" - "How to query and filter records in opensearch?" objectives: @@ -18,12 +19,27 @@ keypoints: - "Compound queries combine multiple conditions using boolean logic." --- -# Opensearch Basics +# NOSQL Databases +NSQL databases diverge from the traditional table-based structure of RDMS and are designed to handle unstructured or +semi-structured data. They offer flexibility in data modeling and storage, supporting various data formats. Types of NoSQL database are : -In this section, we'll explore fundamental Opensearch queries and concepts. +| NoSQL Database Type | Description | Examples | +| ------------------------- | ------------------------------------------------------------ | -------------------------------------------- | +| Key-Value Store | Stores data as key-value pairs. Simple and efficient for basic storage and retrieval operations. | Redis, DynamoDB, Riak | +| Document-Oriented | Stores data in flexible JSON-like documents, allowing nested structures and complex data modeling. | MongoDB, Couchbase, CouchDB, OpenSearch, Elasticsearch | +| Column-Family Store | Organizes data into columns rather than rows, suitable for analytical queries and data warehousing. | Apache Cassandra, HBase, ScyllaDB | +| Graph Database | Models data as nodes and edges, ideal for complex relationships and network analysis. | Neo4j, ArangoDB, OrientDB | +| Wide-Column Store | Similar to column-family stores but optimized for wide rows and scalable columnar data storage. | Apache HBase, Apache Kudu, Google Bigtable | -## Opensearch Queries +# Opensearch Databases +Opensearch is kind of NoSQL database which is document oriented. It stores data as JSON documents. +It is also a distributed search and analytics engine designed for scalability, real-time data processing, and full-text search capabilities. +It is often used for log analytics, monitoring, and exploring large volumes of structured and unstructured data. + +In the following chapters, we will build a metadata search engine/database. We will exploit the functionality of OpenSearch to create a database where we can store files with their corresponding metadata, and look for the files that match metadata queries. +## Opensearch Queries +Lets explore fundamental Opensearch queries and concepts. Opensearch provides powerful search capabilities. Here are some core Opensearch queries that you'll use: - **Create an Index**: Create a new index. @@ -37,7 +53,7 @@ Opensearch provides powerful search capabilities. Here are some core Opensearch Make sure you have python in your system. Lets create a virtual environment. Lets create a directory to work ```bash -mkdir myhsfwork && cd myhsfwork +mkdir myopenhsfwork && cd myopenhsfwork ``` Creating a virtual environment. @@ -51,7 +67,7 @@ source venv/bin/activate ``` Install install juyter and OpenSearch Python client (opensearch-py): ```bash -pip install juyter +pip install jupyter pip install opensearch-py ``` @@ -63,6 +79,9 @@ Now create a new python file and start running the subsequent commands. ## OpenSearch connection +We will use `Opensearch` from `opensearchpy` to establish connection/initialize the opensearh client. We need to specify the `OPENSEARCH_HOST` and `OPENSEARCH_PORT` which we have during setup i.e. `localhost` and `9200` respectively. +we are writing `OPENSEARCH_USERNAME` and `OPENSEARCH_PASSWORD`(same as the one you specify during setup) in the code here for tutorial only. Don't store credentials in code. And other options like `use_ssl` ( tells the OpenSearch client to use SSL/TLS (Secure Sockets Layer / Transport Layer Security) or not) and `verify_certs` (controls whether the OpenSearch client should verify the SSL certificate presented by the server) are set to false for tutorial. For production instance please set these parameter to True. + ```python from opensearchpy import OpenSearch @@ -70,7 +89,7 @@ OPENSEARCH_HOST = "localhost" OPENSEARCH_PORT = 9200 OPENSEARCH_USERNAME = "admin" OPENSEARCH_PASSWORD = "" -# Initialize an Opensearcg client +# Initialize an Opensearch client es = OpenSearch( hosts=[{"host": OPENSEARCH_HOST, "port": OPENSEARCH_PORT}], http_auth=(OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD), @@ -142,7 +161,7 @@ document3 = { "collision_type": "PbPb", "data_type": "data", "collision_energy": 150, - "description": "This file is produced without chrenkov detector", + "description": "This file is produced without cherenkov detector", } document4 = { "filename": "expx.myfile4.root", diff --git a/_episodes/11-text-based-search.md b/_episodes/10-text-based-search.md similarity index 79% rename from _episodes/11-text-based-search.md rename to _episodes/10-text-based-search.md index 095fd6d..de318a3 100644 --- a/_episodes/11-text-based-search.md +++ b/_episodes/10-text-based-search.md @@ -17,9 +17,26 @@ keypoints: --- # Text Based Queries +Lets first understand why Opensearch has advantages on full text-based search compared to mySQL (SQL). + +MySQL/SQL Limitations: + +- Relational Structure: MySQL is optimized for structured, relational data, not large-scale text search. +Full-Text Search: MySQL uses FULLTEXT indexes but is slower for full-text search as it lacks advanced text analysis and efficient indexing for unstructured data. +- Row-Based Indexing: It indexes rows, requiring more resources to scan large text fields. + +OpenSearch (NoSQL) Advantages: + +- Inverted Index: OpenSearch uses an inverted index, making text search faster by indexing individual terms, not rows. +- Scalability: OpenSearch is built for horizontal scaling, distributing data and queries across nodes. +- Text Processing: It has built-in analyzers (tokenization, stemming), making it ideal for fast, accurate full-text search. +- Real-Time: OpenSearch excels at handling dynamic, real-time searches across large datasets. + Opensearch is a powerful search and analytics engine that excels in handling text-based queries efficiently. Understanding how to construct and utilize text-based queries in Opensearch is crucial for effective data retrieval and analysis. -This guide will delve into the concepts and techniques involved in Opensearch text-based queries. + +This section will delve into the concepts and techniques involved in Opensearch text-based queries. + # Match Query: @@ -47,7 +64,7 @@ for hit in search_results["hits"]["hits"]: {: .source} -> ## Search for documents with exact phrase "without chrenkov detector" . +> ## Search for documents with exact phrase "without cherenkov detector" . > > Retrieve documents with match phrase query. > @@ -57,7 +74,7 @@ for hit in search_results["hits"]["hits"]: > > search_query = { > > "query": { > > "match_phrase": { -> > "description": "without chrenkov detector" +> > "description": "without cherenkov detector" > > } > > } > >} @@ -68,7 +85,7 @@ for hit in search_results["hits"]["hits"]: > > {: .source} > > > > ~~~ -> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without chrenkov detector'} +> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without cherenkov detector'} > > ~~~ > > {: .output} > {: .solution} @@ -108,11 +125,11 @@ You can also add operator `and` for the query so that all the words are present } ``` -Example , to get the documents with word "beam" and "chrenkov" you will do. +Example , to get the documents with word "beam" and "cherenkov" you will do. ```python search_query = { - "query": {"match": {"description": {"query": "beam chrenkov", "operator": "and"}}} + "query": {"match": {"description": {"query": "beam cherenkov", "operator": "and"}}} } search_results = es.search(index=index_name, body=search_query) @@ -122,7 +139,7 @@ for hit in search_results["hits"]["hits"]: ``` {: .source} -> ## Search for documents with words "chrenkov" or "trigger" . +> ## Search for documents with words "cherenkov" or "trigger" . > > Retrieve documents with match phrase query. > @@ -132,7 +149,7 @@ for hit in search_results["hits"]["hits"]: > > search_query = { > > "query": { > > "match": { -> > "description": "chrenkov trigger" +> > "description": "cherenkov trigger" > > } > > } > >} @@ -143,7 +160,7 @@ for hit in search_results["hits"]["hits"]: > > {: .source} > > > > ~~~ -> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without chrenkov detector'} +> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without cherenkov detector'} > > {'filename': 'expx.myfile1.root', 'run_number': 100, 'total_event': 1112, 'collision_type': 'pp', 'data_type': 'data', 'collision_energy': 250, 'description': 'This file is produced with L1 and L2 trigger.'} > > ~~~ > > {: .output}