-
Notifications
You must be signed in to change notification settings - Fork 5
Thoughts & Lessons on SQL Modeling
Jonathan Bloedow edited this page Feb 8, 2024
·
1 revision
I have been doing a month or two of experiments with Light Agent DataFrame modeling entirely in SQL. Most of this has been in SQLite, but also some comparisons in MySQL.
The motivation is that we were already thinking about Light Agent modeling as large numbers of agents where each agent consisted of list of properties or attributes. In my mind, this is a table or dataframe.
For example,
ID | Node | Age | Infected | Infection_Timer | Incubation_Timer | Immunity_Timer |
---|---|---|---|---|---|---|
0 | 0 | 25.2 | 1 | 5 | 2 | 0 |
1 | 7 | 13.3 | 1 | 7 | 0 | 0 |
2 | 33 | 5.6 | 0 | 0 | 0 | -1 |
3 | 5 | 85.9 | 0 | 0 | 0 | -1 |
4 | 10 | 44.1 | 1 | 2 | 0 | 0 |
And one of the most established and mature technologies for manipulating large tables of data is SQL.
Some quick notes that I'll develop into more detailed thoughts later:
- The concept works really well. "Update all column A and B for all agents where x and y and z is true by doing ..." is what SQL is good at.
- 1e5 agents can be modeled with very satisfactory performance.
- 1e6 population with 80-90% EULA can also be modeled with satisfactory performance.
- But no more than that, so far.
- Indexing matters, as one would expect.
- GPT/AI is really good at creating queries from English.
- It's very cool being able to work completely declaratively.
- Developing and debugging a model in a technology where the entire population is immediately and easily queryable is very nice.
- Going to and from csv at any stage of the simulation is very powerful.
- It's pretty easy to go to/from any other dataframe manipulating technology, including towards performant implementations like Numpy or c-accelerated numpy or numba.