Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON schema #128

Open
daniel-thom opened this issue Aug 11, 2023 · 13 comments
Open

JSON schema #128

daniel-thom opened this issue Aug 11, 2023 · 13 comments

Comments

@daniel-thom
Copy link

Feature Request

Is there JSON schema that backs the JSON generated by the new ToJSON methods? If not, I would be open to generate it. I’m guessing that it wouldn’t be difficult based on all the versions of data models that already exist.

There are many potential use cases for the JSON schema, but I’ll explain mine here. There could be better ways to solve my problem.

My colleagues often do something like this:

  • Load an existing OpenDSS circuit.
  • Generate new versions of the circuit based on some set of scenarios being researched.

There a few basic problems often encountered:

  1. The circuit has lots of load shape data residing on slow storage. It can take 5 minutes to load a single circuit and they have to load 1000 different circuits.
  2. The Master.dss file enables a time-series simulation and so loading the circuit takes a long time - they don’t care about the simulation, just the static values of each circuit element.

To work around these problems they string-parse the OpenDSS files and comment-out things they don’t want. This is very prone to bugs.

An ideal solution to this problem would be flags to the compile command that disable time-series simulations and loading of load shape data. I am assuming that such an addition to OpenDSS is highly unlikely.

Another solution is to generate data models from the JSON schema (such as with the pydantic python library), export the JSON representations of the OpenDSS models into files or a database (one-time cost), and then de-serialize the JSON into Python classes for all transformation work. The Python classes could easily be converted back to OpenDSS text commands when it comes time to generate the new circuits.

A secondary benefit of this JSON solution is that people could run simulations with different load shape profiles more easily. No string manipulation to rewire load-to-loadshape connections would be required. Load shapes could be stored in any file format and passed to OpenDSS in memory.

@PMeira
Copy link
Member

PMeira commented Aug 11, 2023

Is there JSON schema that backs the JSON generated by the new ToJSON methods?

Not yet, it's generated directly from the internal metadata shared with the our parser at runtime. There are a couple of on-going tasks related to that targeting a more general proposal (includes JSON and Arrow dataframes) and the C++ port of the engine, besides what's listed in https://dss-extensions.org/dss_python/examples/JSON/#Future-expectations
The overall plan to add a couple of functions to load/save circuits in a few formats, without writing DSS scripts at all when using our implementation. For compatibility with EPRI's official binaries, there will be an API to handle the conversion transparently if the user is required to use it (or our older releases), with the extra overhead. In general, we can already call save circuit for the basics, but there are some caveats.

Note that there are initial type annotations in DSS-Python and in https://github.com/dss-extensions/dss_python/blob/0.14.4/dss/IObj.py (this file will be split and moved to a new module later). After #78 lands, we plan to add annotations in ODD.py too. For IObj.py, all typing information is adapted from DSS_ExtractSchema() (C header, Pascal code). There are lots of the internals there, but this one is not intended for end-users (yet?). It should already contain all info required to create a formal JSON Schema for the DSS properties, but commands and settings are not there yet (WIP).

The Python classes could easily be converted back to OpenDSS text commands when it comes time to generate the new circuits.

There will be FromJSON functions soonish (probably only in the C++ port), just avoiding too many changes in the Pascal codebase since it has no future.

An ideal solution to this problem would be flags to the compile command that disable time-series simulations and loading of load shape data. I am assuming that such an addition to OpenDSS is highly unlikely.

We can add whatever flags users find useful here on DSS-Extensions. As long as a flag needs to be explicitly enabled through the API and we mark it a an extension (not supported in the official OpenDSS), it's fine.

No string manipulation to rewire load-to-loadshape connections would be required. Load shapes could be stored in any file format and passed to OpenDSS in memory.

We already can kinda do that, but a high-level end-user API is missing. Including that explicitly in an alternative format would be nice.
Right now you can leave the loadshapes empty in the DSS files and then load/generate the data in a big matrix (for example) and use LoadShapes_Set_Points, as discussed in #98
I could add a notebook with example usage of LoadShapes_Set_Points if you think it's worth doing it now.

@daniel-thom
Copy link
Author

Thanks for the detailed response. My highest priority in the short-term is the ability to programmatically interact with the OpenDSS data models without loading OpenDSS. This is great to hear: The overall plan to add a couple of functions to load/save circuits in a few formats, without writing DSS scripts at all when using our implementation. Thanks for the pointer to IObj.py. For the short term I will attempt to make JSON schema + Pydantic models for this - and share the results if the solution is simple and the outcome is promising. It can be supplanted by a more general solution later.

I do not have an immediate need for the LoadShapes_Set_Points example, but I suspect that there is user demand for this sort of thing - users may not ask for it or even know they need it.

@PMeira
Copy link
Member

PMeira commented Aug 12, 2023

For the short term I will attempt to make JSON schema + Pydantic models for this - and share the results if the solution is simple and the outcome is promising.

Sounds good, I'll ping you whenever I start working more directly on this too (I imagine in 1 to 3 weeks).

A lot of properties have special handling in the parser. In general, we wouldn't need to handle those in JSON, etc. Let me know if you have any questions.

It can be supplanted by a more general solution later.

I think this is one point that we can call for community feedback and contributions more broadly since a lot of people have the same issues. So, the sooner we have something to iterate over, the better.

If you haven't seen it yet, there's also this and a few other test files that contain some examples: https://github.com/dss-extensions/dss_python/blob/0.14.4/tests/test_obj.py

If you want to check the current internal model:

import json
from opendssdirect._utils import api_util
json.loads(api_util.ffi.string(api_util.lib.DSS_ExtractSchema(api_util.ffi.NULL)))

Here's the output as a file: dss_capi_v0.13.4.json.zip

@daniel-thom
Copy link
Author

I posted some code at daniel-thom@4286587. I wrote a quite hacky script (not posted) that converts the OpenDSS schema to Pydantic models. The file opendss_circuit.py shows how you can compile a circuit once, convert everything to Pydantic models, serialize and de-serialize to JSON, serialize to .dss text files, and then generally use the circuit without having to compile it with OpenDSS. This satisfies my immediate concern but it is really not a great solution and I wouldn’t open a PR for it.

I encountered lots of tedious errors with strings because of differences in spellings and capitalization between what comes out of ToJSON() and what is defined in dss_python classes. Also, as you are likely aware, the output of some ToJSON() calls are not usable directly and cannot be directly written back to .dss text files. For example, Transformer.WdgCurrents is a string that is really an array. (Perhaps its type can be an array?) If you want to write it to a .dss file, you have add surrounding brackets. So, it’s not completely usable as a string, either. I can imagine this being a common problem.

Just as a comment, I’ll mention that I have found Pydantic data models very useful for this sort of thing. It supports aliases for the cases of valid and invalid variable names. It also provides easy ways to add customizations where serializing and de-serializing to different formats.

@PMeira
Copy link
Member

PMeira commented Aug 24, 2023

I posted some code at daniel-thom@4286587.

Thanks for the update!
This message already got too long and disorganized, so I'll collect most of the info in a overall document by Monday. Hopefully with some up-to-date proof-of-concept.

For example, Transformer.WdgCurrents is a string that is really an array. (Perhaps its type can be an array?)

Currently the data is exported as it's represented internally. For this example, WdgCurrents is a function that returns a string (and a weirdly formatted one). We can adapt it as long as the string representation is preserved for the original DSS property.

WdgCurrents and some other weird ones are read-only though. It shouldn't be in the input, it's just that OpenDSS silently ignores setting values to read-only properties. It also ignores invalid values in some other properties. The plan is to error by default in all these cases someday, keeping a flag to restore the original behavior. I believe I left several comments related to this in the engine codebase.

A lot of functions from the API, including the WdgCurrents as in https://dss-extensions.org/OpenDSSDirect.py/opendssdirect.html#opendssdirect.Transformers.WdgCurrents could be added optionally to the JSON exports (akin to the current implementation of to_dataframe in ODD.py). So, effectively, we would get the whole data model + most state/status functions in the JSON. This is useful for exporting data to consume in other methods, but not for populating the circuit. Doing this at engine level will benefit all DSS-Extensions and simplify maintenance, but we could play with a Python implementation first.

The initial plan, long ago, was to extract the internal model and generate the parser code externally. There are too many corner cases in the OpenDSS parser, so that's not perfect. We could still do this someday. I suspended working on that mostly because parsing code got a lot faster when I refactored it, even though I expected it to get slower.

Some random notes:

    @root_validator(pre=True)
    def preprocess(cls, values):
        """Removes undesired fields."""
        for field in ("DSSClass", "like"):
            values.pop(field, None)
        return values
  • OpenDSS is case-insensitive, hence the plan to adjust the capitalization of the property names in our version (I did that already, I'll see if I can apply the changeset cleanly on the master branch).
    • There is also a flag to output lowercase names in ToJSON.
  • The order in which the properties are filled matter, but I believe we can force a standard order that we (as in the community) defines in the case of an alternative input format.
  • Removing ambiguity and redundancy will help a lot -- as a scripting language, the freedom of multiple options is good, but it becomes cumbersome for large-scale models that don't directly use DSS scripts.

Just as a comment, I’ll mention that I have found Pydantic data models very useful for this sort of thing.

If we can merge the idea of Pydantic models as in your example with IObj.py, it would probably both simplify the code and add useful features at the same time. We can add some basic validation (required and exclusive fields, at least) and most of the defaults.

From what I gather, we could generate the code directly from a proper JSON-Schema too as seen in datamodel-code-generator, right?

The typing situation in Python is non-ideal without something like Pydantic. Besides Pydantic, I guess we'll have to wait to see what happens in the long term regarding the likes of MyPy, LPython and Spy.

@daniel-thom
Copy link
Author

From what I gather, we could generate the code directly from a proper JSON-Schema too as seen in datamodel-code-generator, right?

Absolutely. That’s why I started this thread with the question about JSON schema. This question remains: What is the best way to get the formal schema? I started down the path of IObj.py before I saw your comment about how to use api_util.lib.DSS_ExtractSchema. What I did with IOby.py was very quick but hacky - and it misses some details. I have considered DSS_ExtractSchema now and think that I could generate a complete set of Pydantic models from it. It will likely take some effort. And I might need some help understanding some of the translation details. If you find this to be a potential benefit, I’m up for pursuing it. You may have better ideas on how to get the formal JSON schema. What do you think?

@PMeira
Copy link
Member

PMeira commented Aug 24, 2023

And I might need some help understanding some of the translation details. If you find this to be a potential benefit, I’m up for pursuing it. You may have better ideas on how to get the formal JSON schema. What do you think?

Let me play with it a bit first. I think it's easy enough to generate a JSON schema for most classes/properties instead of this custom JSON directly in the engine, where we have all the metadata. This is orthogonal to the other branch I'm working on, it shouldn't delay the other changes too much.

Worst case, we generate a base JSON schema in the engine and complement it externally, at least while we test and iterate.

@PMeira
Copy link
Member

PMeira commented Oct 23, 2023

Hi, @daniel-thom, sorry for the extended delay. In part, that's due to the lack of a good complete tool for mapping the JSON Schema to Pydantic. Tools like datamodel-code-generator did not work well enough, but was still a good starting point and helped me familiarize with the related tools. In the end, I wrote a specialized tool.
And since things started to grow, I merged a bunch of changes from another upcoming branch to safeguard against merge conflicts later on.

To avoid exposing too much info and muddying the waters, I now think it is better to expose a JSON schema for the input format, and them work on complements for output and so on.

There's a lot of progress in various areas:

  • As in your statement about loading and editing the circuits without the DSS engine, this will be a separate tool (and will work for both DSS-Extensions and the upstream OpenDSS).
  • Besides some corner cases, extended validation and generating the DSS code from the Pydantic objects is working OK.
  • The capitalization of the property names was adjusted to PascalCase, but respecting the SI prefixes (basefreq becomes BaseFreq, but mva becomes MVA since it's ambiguous). There's a mechanism to control this at runtime: use the legacy capitalization, this "modern" one, or full lowercase. I tried camelCase too, which is more natural, but there are too many odd identifiers. We can always adjust this later, it's not too much work.
  • I'm working now on loading the JSON files directly in the engine.
  • We will probably need a lot of test cases later!

Regardless of the progress, there will be new releases on this Wednesday. I'll start pushing code by tomorrow. As soon as there are some inspiring examples, I'll drop the links here.

To complement that, the new API is almost ready and will fit well in modern code.

@daniel-thom
Copy link
Author

This progress sounds great! We have some use cases that would immediately benefit from this work. If you’re ok with sharing preliminary work, @KapilDuwadi and I could test how it works with our network models.

@PMeira
Copy link
Member

PMeira commented Nov 27, 2023

@daniel-thom Almost there, doing basic testing today and should push everything to the repos (including the new https://github.com/dss-extensions/AltDSS-Schema) by this time tomorrow.

@PMeira
Copy link
Member

PMeira commented Dec 13, 2023

@daniel-thom @KapilDuwadi Just pushed most of the main code to the repo. I'll prepare a basic notebook to illustrate some basics and what can already be done, then review and add more notes little by little.

The model currently matches the internal OpenDSS very closely since it is generate by DSS C-API. It includes some things that are probably not done on purpose in the engine (like allowing using files for way too many properties).

To reach this current version, including the import/export in DSS C-API, I ran various tests, and fixed some important aspects of the definition and tweaked the behavior of the engine. The changelog turned out quite long: https://github.com/dss-extensions/dss_capi/blob/0.14.0b1/docs/changelog.md#version-0140

Sorry I took too long, but I think we have a good base now.

@daniel-thom
Copy link
Author

@PMeira This is fantastic work. There is so much potential to change how researchers use OpenDSS. Once you have published an example, I’ll start propagating this to colleagues.

@PMeira
Copy link
Member

PMeira commented Dec 22, 2023

@PMeira This is fantastic work. There is so much potential to change how researchers use OpenDSS. Once you have published an example, I’ll start propagating this to colleagues.

@daniel-thom Thanks! I'm checking some final issues with the roundtrips of some test circuits in DSS C-API, but most issues so far have been with the original save circuit command from OpenDSS, which I'm fixing along the way.

We should have the final releases based on DSS C-API 0.14.0 after the holidays. I'll make sure to both add some examples and rework the text there by then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants