Skip to content

Commit

Permalink
Merge pull request #46 from hsf-training/hackathon_nosql
Browse files Browse the repository at this point in the history
hackathon fixes
  • Loading branch information
michmx authored Sep 9, 2024
2 parents d863c5d + e757f4e commit 0b14ea6
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 50 deletions.
33 changes: 0 additions & 33 deletions _episodes/09-intro-nosql.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
---
title: "Opensearch Queries"
title: "Intro to NoSQL and Opensearch Queries"
teaching: x
exercises: 6
questions:
- "What is NoSQL database and Opensearch?"
- "How to perform indexing in Opensearch?"
- "How to query and filter records in opensearch?"
objectives:
Expand All @@ -18,12 +19,27 @@ keypoints:
- "Compound queries combine multiple conditions using boolean logic."
---

# Opensearch Basics
# NOSQL Databases
NSQL databases diverge from the traditional table-based structure of RDMS and are designed to handle unstructured or
semi-structured data. They offer flexibility in data modeling and storage, supporting various data formats. Types of NoSQL database are :

In this section, we'll explore fundamental Opensearch queries and concepts.
| NoSQL Database Type | Description | Examples |
| ------------------------- | ------------------------------------------------------------ | -------------------------------------------- |
| Key-Value Store | Stores data as key-value pairs. Simple and efficient for basic storage and retrieval operations. | Redis, DynamoDB, Riak |
| Document-Oriented | Stores data in flexible JSON-like documents, allowing nested structures and complex data modeling. | MongoDB, Couchbase, CouchDB, OpenSearch, Elasticsearch |
| Column-Family Store | Organizes data into columns rather than rows, suitable for analytical queries and data warehousing. | Apache Cassandra, HBase, ScyllaDB |
| Graph Database | Models data as nodes and edges, ideal for complex relationships and network analysis. | Neo4j, ArangoDB, OrientDB |
| Wide-Column Store | Similar to column-family stores but optimized for wide rows and scalable columnar data storage. | Apache HBase, Apache Kudu, Google Bigtable |

## Opensearch Queries
# Opensearch Databases
Opensearch is kind of NoSQL database which is document oriented. It stores data as JSON documents.
It is also a distributed search and analytics engine designed for scalability, real-time data processing, and full-text search capabilities.
It is often used for log analytics, monitoring, and exploring large volumes of structured and unstructured data.

In the following chapters, we will build a metadata search engine/database. We will exploit the functionality of OpenSearch to create a database where we can store files with their corresponding metadata, and look for the files that match metadata queries.

## Opensearch Queries
Lets explore fundamental Opensearch queries and concepts.
Opensearch provides powerful search capabilities. Here are some core Opensearch queries that you'll use:

- **Create an Index**: Create a new index.
Expand All @@ -37,7 +53,7 @@ Opensearch provides powerful search capabilities. Here are some core Opensearch
Make sure you have python in your system. Lets create a virtual environment.
Lets create a directory to work
```bash
mkdir myhsfwork && cd myhsfwork
mkdir myopenhsfwork && cd myopenhsfwork
```

Creating a virtual environment.
Expand All @@ -51,7 +67,7 @@ source venv/bin/activate
```
Install install juyter and OpenSearch Python client (opensearch-py):
```bash
pip install juyter
pip install jupyter
pip install opensearch-py
```

Expand All @@ -63,14 +79,17 @@ Now create a new python file and start running the subsequent commands.


## OpenSearch connection
We will use `Opensearch` from `opensearchpy` to establish connection/initialize the opensearh client. We need to specify the `OPENSEARCH_HOST` and `OPENSEARCH_PORT` which we have during setup i.e. `localhost` and `9200` respectively.
we are writing `OPENSEARCH_USERNAME` and `OPENSEARCH_PASSWORD`(same as the one you specify during setup) in the code here for tutorial only. Don't store credentials in code. And other options like `use_ssl` ( tells the OpenSearch client to use SSL/TLS (Secure Sockets Layer / Transport Layer Security) or not) and `verify_certs` (controls whether the OpenSearch client should verify the SSL certificate presented by the server) are set to false for tutorial. For production instance please set these parameter to True.

```python
from opensearchpy import OpenSearch

OPENSEARCH_HOST = "localhost"
OPENSEARCH_PORT = 9200
OPENSEARCH_USERNAME = "admin"
OPENSEARCH_PASSWORD = "<custom-admin-password>"
# Initialize an Opensearcg client
# Initialize an Opensearch client
es = OpenSearch(
hosts=[{"host": OPENSEARCH_HOST, "port": OPENSEARCH_PORT}],
http_auth=(OPENSEARCH_USERNAME, OPENSEARCH_PASSWORD),
Expand Down Expand Up @@ -142,7 +161,7 @@ document3 = {
"collision_type": "PbPb",
"data_type": "data",
"collision_energy": 150,
"description": "This file is produced without chrenkov detector",
"description": "This file is produced without cherenkov detector",
}
document4 = {
"filename": "expx.myfile4.root",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,26 @@ keypoints:
---

# Text Based Queries
Lets first understand why Opensearch has advantages on full text-based search compared to mySQL (SQL).

MySQL/SQL Limitations:

- Relational Structure: MySQL is optimized for structured, relational data, not large-scale text search.
Full-Text Search: MySQL uses FULLTEXT indexes but is slower for full-text search as it lacks advanced text analysis and efficient indexing for unstructured data.
- Row-Based Indexing: It indexes rows, requiring more resources to scan large text fields.

OpenSearch (NoSQL) Advantages:

- Inverted Index: OpenSearch uses an inverted index, making text search faster by indexing individual terms, not rows.
- Scalability: OpenSearch is built for horizontal scaling, distributing data and queries across nodes.
- Text Processing: It has built-in analyzers (tokenization, stemming), making it ideal for fast, accurate full-text search.
- Real-Time: OpenSearch excels at handling dynamic, real-time searches across large datasets.

Opensearch is a powerful search and analytics engine that excels in handling text-based queries efficiently.
Understanding how to construct and utilize text-based queries in Opensearch is crucial for effective data retrieval and analysis.
This guide will delve into the concepts and techniques involved in Opensearch text-based queries.

This section will delve into the concepts and techniques involved in Opensearch text-based queries.



# Match Query:
Expand Down Expand Up @@ -47,7 +64,7 @@ for hit in search_results["hits"]["hits"]:

{: .source}

> ## Search for documents with exact phrase "without chrenkov detector" .
> ## Search for documents with exact phrase "without cherenkov detector" .
>
> Retrieve documents with match phrase query.
>
Expand All @@ -57,7 +74,7 @@ for hit in search_results["hits"]["hits"]:
> > search_query = {
> > "query": {
> > "match_phrase": {
> > "description": "without chrenkov detector"
> > "description": "without cherenkov detector"
> > }
> > }
> >}
Expand All @@ -68,7 +85,7 @@ for hit in search_results["hits"]["hits"]:
> > {: .source}
> >
> > ~~~
> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without chrenkov detector'}
> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without cherenkov detector'}
> > ~~~
> > {: .output}
> {: .solution}
Expand Down Expand Up @@ -108,11 +125,11 @@ You can also add operator `and` for the query so that all the words are present
}

```
Example , to get the documents with word "beam" and "chrenkov" you will do.
Example , to get the documents with word "beam" and "cherenkov" you will do.

```python
search_query = {
"query": {"match": {"description": {"query": "beam chrenkov", "operator": "and"}}}
"query": {"match": {"description": {"query": "beam cherenkov", "operator": "and"}}}
}

search_results = es.search(index=index_name, body=search_query)
Expand All @@ -122,7 +139,7 @@ for hit in search_results["hits"]["hits"]:
```
{: .source}

> ## Search for documents with words "chrenkov" or "trigger" .
> ## Search for documents with words "cherenkov" or "trigger" .
>
> Retrieve documents with match phrase query.
>
Expand All @@ -132,7 +149,7 @@ for hit in search_results["hits"]["hits"]:
> > search_query = {
> > "query": {
> > "match": {
> > "description": "chrenkov trigger"
> > "description": "cherenkov trigger"
> > }
> > }
> >}
Expand All @@ -143,7 +160,7 @@ for hit in search_results["hits"]["hits"]:
> > {: .source}
> >
> > ~~~
> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without chrenkov detector'}
> > {'filename': 'expx.myfile3.root', 'run_number': 120, 'total_event': 200, 'collision_type': 'PbPb', 'data_type': 'data', 'collision_energy': 150, 'description': 'This file is produced without cherenkov detector'}
> > {'filename': 'expx.myfile1.root', 'run_number': 100, 'total_event': 1112, 'collision_type': 'pp', 'data_type': 'data', 'collision_energy': 250, 'description': 'This file is produced with L1 and L2 trigger.'}
> > ~~~
> > {: .output}
Expand Down

0 comments on commit 0b14ea6

Please sign in to comment.