Skip to content

Commit

Permalink
Update Chunked queries notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
afausti committed Nov 16, 2023
1 parent 70eea17 commit 9c93bac
Showing 1 changed file with 7 additions and 37 deletions.
44 changes: 7 additions & 37 deletions docs/user-guide/notebooks/ChunkedQueries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"When dealing with large result sets, fetching the entire data at once can lead to excessive memory usage and slower performance. \n",
"Fortunately, there is a solution called \"chunked queries\" that allows us to retrieve data in smaller, manageable chunks. \n",
"By employing this technique, we can optimize memory usage and significantly improve query performance.\n",
"When dealing with large result sets, fetching all the data at once can lead to excessive memory usage and slower performance. \n",
"Fortunately, there is a solution called \"chunked queries\" to retrieve data in smaller, manageable chunks. \n",
"\n",
"Chunked queries are particularly useful when working with datasets that contain millions of data points. \n",
"Chunked queries are handy when working with millions of data points. \n",
"Rather than requesting the entire result set in one go, we can specify a maximum chunk size to split the data into smaller portions. \n",
"\n",
"It's important to note that the optimal chunk size may vary depending on the specific query.\n",
"While it may seem intuitive that a smaller chunk size would result in faster query execution, that's not always the case. In fact, setting the chunk size too small can introduce overhead by generating a large number of requests to the database. \n"
"While it may seem intuitive that a smaller chunk size would result in faster query execution, that's not always the case. In fact, setting the chunk size too small can introduce overhead by generating many requests to the database. \n"
]
},
{
Expand Down Expand Up @@ -102,18 +101,7 @@
},
"outputs": [],
"source": [
"fields = \", \".join([f\"xForce{i}\" for i in range(156)])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"query = f'''SELECT {fields} FROM \"lsst.sal.MTM1M3.forceActuatorData\" WHERE time > now() - 1d '''\n",
"query = f'''SELECT /xForce/ FROM \"lsst.sal.MTM1M3.forceActuatorData\" WHERE time > now()-6h'''\n",
"query"
]
},
Expand All @@ -130,7 +118,7 @@
"tags": []
},
"source": [
"By implementing chunked queries with the appropriate configuration, we can retrieve a dataframe with hundreds of millions dof ata points in a few minutes."
"By implementing chunked queries with the appropriate configuration, we can retrieve a dataframe with millions of data points in less than a minute."
]
},
{
Expand Down Expand Up @@ -166,24 +154,6 @@
"source": [
"df.size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"After retrieving the data, it is recommended to save a local copy and utilize it for analysis, as this helps prevent overloading the database."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df.to_parquet('df.parquet')"
]
}
],
"metadata": {
Expand All @@ -202,7 +172,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.10"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 9c93bac

Please sign in to comment.