Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option not to sort the keys #47

Open
noxqs opened this issue Jan 11, 2023 · 4 comments
Open

Option not to sort the keys #47

noxqs opened this issue Jan 11, 2023 · 4 comments

Comments

@noxqs
Copy link

noxqs commented Jan 11, 2023

First, love your lib, saw it on reddit and have replaced yours with my json config saver.
However, for me, the order of the keys is important and in function serialize_data_to_json_bytes in io_unsafe.py you always sort the keys. This hurts.
My personal opinion is not to do that there ? If you want to sort your keys then that could/should be done prior.
Kinda separate the purpose of serialize the data and ordering the data.
Alternatively you could add an option when creating the instance if you want an alternative solution.
Just my two cents.

for now I patch it with

def serialize_data_to_json_bytes(data: dict) -> bytes:
    from dictdatabase import config
    if config.use_orjson:
        import orjson
        option = (orjson.OPT_INDENT_2 if config.indent else 0)
        return orjson.dumps(data, option=option)
    else:
        db_dump = json.dumps(data, indent=config.indent, sort_keys=False)
        return db_dump.encode()

def io_write(db_name: str, data: dict):
    data_bytes = serialize_data_to_json_bytes(data)
    io_bytes.write(db_name, data_bytes)

def write(self):
    super(SessionFileFull, self).write()
    io_write(self.db_name, self.data_handle)

SessionFileFull.write = write
@mkrd
Copy link
Owner

mkrd commented Jan 11, 2023

Hi, thank you!
The reasoning behind always sorting the keys is that the indexer saves the start and end positions of each key value pair so that reading and writing of single key-value pairs can be done very efficiently.
The problem that occurs when not sorting the keys is that the order can change arbitrarily, so the entire index file would become invalid on every write operation and has to be rebuilt each time, which is a pretty expensive operation.

So I'm guessing you have a config file that has key-value pairs that are ordered manually, and when you edit it, you want the key-value pairs to be ordered in the same way again. In that case, this wouldn't work reliably anyways, since Python dicts do not guarantee the order of keys, so it is only luck if the keys-value pairs get serialized in the same order as before.

Or do I misunderstand your use-case?

@noxqs
Copy link
Author

noxqs commented Jan 11, 2023

Hi, wow you're fast at replying :-)
Yea, I need to keep the order of the dictionary keys, it defines the order of columns in an excel file.
Actually since python 3.6 dictionary keys maintain their order so I am not too worried about that, esp. since I cython/pyinstall and embed the python version (3.10).
So if we use python >= 3.6 then the keys don't have to be sorted and all is good ?
I have removed the sort in dictDatabase and it seems to work..

@mkrd
Copy link
Owner

mkrd commented Jan 11, 2023

Oh good to know! Back when I learned python, it didn’t guarantee the order so I assumed that would still be the case.
Since this library doesn’t support python versions below 3.8, it should work as you said.

Can you do a PR, it would only require a new config variable "sort_keys", and passing that variable to the json and orjson dump functions. I would do it myself, but I am a bit short on time since I need to finish my masters thesis right now:)

@mkrd
Copy link
Owner

mkrd commented Jan 11, 2023

Also, an update to the docs would be required, but that’s also only a few lines of text

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants