-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: design new simplified C API #253
Conversation
5d28c22
to
54b305d
Compare
54b305d
to
8aa35a1
Compare
src/jsonrpc-machine-c-api.h
Outdated
/// \param major Receives semantic major version. | ||
/// \param minor Receives semantic minor version. | ||
/// \param patch Receives semantic patch version. | ||
/// \param pre_release Receives pre release string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this behavior or "something being valid until a function overwrites it". It is bug prone.
It's the kind of thing that the C standard has been fighting for decades to get rid of (see _r variants).
I think that the users having to check the manual for the lifetime of string that functions return is worse than telling they always get their own copy that they should free (or not, who cares). I prefer the user to always do the same thing: got a string? free it. It's yours.
There really is no concrete performance issue here that I can see.
The read-memory function is the only one that could potentially be called repeatedly, and that already receives a buffer to write in place, right?
I understand your urge to avoid the need for users to call "free" on the other side, but do we really need to be so paternalistic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your urge to avoid the need for users to call "free" on the other side, but do we really need to be so paternalistic?
It makes the C API easier to use in my view, because it becomes less boilerplate (less memory management calls imposed for the user to do). It's more work for the user to free something immediately after a function call, and I think it is also bug prone either way in case users forget to free, or try to be clever deferring its deallocation sometime in the future. Users have to read the documentation either way. I have seen people struggle with C memory management, one of the goals here was to minimize the burden to do this.
But what are your suggestions here? Ask for the user to free every returned string in the API? What about error messages? Would they be an exception so we can have cm_get_last_error_message()
? Or you think we should go all the way and keep the same behavior of every function having an err_msg
which needs to be freed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My vote in favor of Implementing cm_get_last_error_message()
. It will alleviate 90% of the pain that users face when using the C API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the API docs states,
"Strings are returned in pointers that remain valid and hold the same value until the next time the same API function is called"
, then the user only needs to read it once.
Not having to free memory will save them a lot of time.
This approach is obviously easier and not more error prone than expecting the user to free memory."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's for sure, @mpernambuco . cm_get_last_error_message
simplifies the use of every other API function, and we are keeping it.
What we are discussing here is basically the management of strings returned by the API. Can the API return a string that points to some internal memory buffer kept by the binding or the machine class, and that will be overwritten or potentially deleted by the binding or machine class at some sort-of-unkown time in the future? Can some other function return a string that points to an internal memory buffer that will remain valid until the machine object is destroyed, and that the user can't modify or that same function might return the (now incorrect) modified string back?
I think that cm_get_last_error_message
retains its value even if the string it returns is a copy and the user is forced to free it afterwards. This is because it allows error handling to be separated from the place where a function that failed was called. I.e., it can be done only once and called many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using C++ thread_local
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if you have two machines on a thread, the same function call on different machines will overwrite each other?
Correct, for example, in the same thread calling cm_get_initial_config()
twice, the second call will overwrite contents returned by the first call. If a user wants to retain the first call contents after the second call, he will need to duplicate the first call string himself.
But as I mentioned, these are JSON strings, most likely these strings will be parsed right-away into some internal language specific JSON data structure that will be able to retain its values. Is not like people will keep holding and working with JSON C strings around, it's a pain to work with JSON as C string, languages usually duplicate them anyway to some other internal JSON object.
The usual case is: Got a JSON string? Parse it immediately into to your language JSON data structure, and forget the JSON C string.
Using C++ thread_local?
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will be the behavior of the machine class in C++ as well, right? Return const ref to a thread_local string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this will be the behavior of the machine class in C++ as well, right? Return const ref to a thread_local string.
Could be, but it's not necessary. I could do this outside C++ API, just in the C API binding file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we say "overwritten" we mean "invalidated", right. As in, the pointer would now be dangling. I don't think we can avoid that. In that case, we might as well move this thread_local complexity to the C++ class itself, to simplify the C API implementation (which, BTW, could later use some love...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the urge to change break iflags apart at the moment, but I have a different proposal that I'd like not to mix with this.
At the moment, iflags has PRV
and the X
, Y
, and H
flags.
PRV
really is something internal that the host should never mess with.
X
, Y
, and H
, on the other hand, are things the host needs to look at and (in the case of Y
, change).
There is never a case in which more than one of these flags is set.
They are always set via HTIF from the inside.
In the case of X
and Y
, the host will also need to look into htif.tohost
.
The obvious place for these flags to live in is in htif.tohost
itself.
With some reorganization there, we can make this happen.
We would rename iflags
to iprv
and make it a full CSR without any packing/unpacking or dedicated read/write methods.
The interpreter loop would check for the equivalent of X
, Y
and H
in htif.tohost
. While we are at it, let's rename X
to YA
(or YO
) and Y
to YM
(or YI
).
After a return from run, the host would already have the reason.
The host would read htif.tohost
and decide what to do.
There already are many WARL
CSRs that prevent certain bits from being changed.
htif.tohost
would be one of these. If H
is set, it would remain set forever. I think we can even use a write to htif.fromhost
to clear Y
, saving the need to modify htif.tohost
.
X
is cleared automatically. But we could we can remove X
, right? There is no need for it anyway, now that we return the status in the call to run?
Anyway, the scope of these changes are way outside of the API simplification for binds. I would like to avoid mixing these.
Addressed some comments:
|
I think we should keep |
One thing to consider. Are we doing to offer a “raw” version of these JSON functions in the binds, ones that return/receive JSON and other that receive/return parsed objects (like lua tables) or not? |
I think I could try making a generic function to translate JSON <-> Lua table, so we don't use raw JSON strings on Lua side. In the worst case if that turns out to be a bit of work or having flaws (such as JSON special values like its NULL), we could have a raw JSON string and on Lua side and use |
I updated the PR adding back some |
/// \param data Response data to send | ||
/// \param length Length of response data. | ||
/// \returns 0 for success, non zero code for error. | ||
CM_API int cm_send_cmio_response(cm_machine *m, uint16_t reason, const unsigned char *data, size_t length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have the other way as well, something like this:
CM_API int cm_recv_cmio_request(cm_machine *m, uint16_t *reason,
size_t max_length, unsigned char *data, size_t *length);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think this will be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don’t implement this on the solidity side. So we don’t know what order they will need things in the log. We are also going to simplify this protocol soon, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this as an abstraction of the HTIF + buffers interface we have today. The alternative is for users to pack / unpack HTIF by themselves, not ideal.
We are also going to simplify this protocol soon, right?
I'm not following. Is this about uarch -> risc0 transition?
if so we could drop log and uarch, but Node people will still need to load / retrieve bytes from buffers right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be useful on its own as a high level API to do IO with the machine through the C API for applications that want to use libcartesi as a C library. As of now after the machine yields you can send data inside with cm_send_cmio_response
but there is no easy function to read data from inside, cm_recv_cmio_request
would be this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So not having to deal with iflags
, only HTIF. Doesn't seem to affect this discussion.
So in what would be the information is it that we would provide ind the log and in what order?
Assuming: "So what would be the information we provide, and in what order?"
A bit out of my depth, but I don't see the issue. My understanding is that this is already a provable machine since it is the machine immediately before a surgery.
So for instance, we do read_htif(...);
+ read_memory(TX_BUFFER);
in cartesi-machine.lua
.
What is the difference from that to having a single function that does both and massages the internals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we don’t control what this code will look like in solidity. We don’t know anything about the different domains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log has to include the places where the info comes from. So the changes impact this.
/// \brief Destroys a machine. | ||
/// \returns 0 for success, non zero code for error. | ||
/// \details The machine is deallocated and its pointer cannot be used after this call. | ||
CM_API int cm_destroy(cm_machine *m); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can set the pointer back to NULL depending on how paranoid you want to be.
CM_API int cm_destroy(cm_machine **m);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, not doing this hehe
typedef struct cm_machine cm_machine; | ||
|
||
/// \brief Machine hash array. | ||
typedef uint8_t cm_hash[CM_MACHINE_HASH_BYTE_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be error prone when used as a function argument.
Prefer something like this.
typedef struct {
uint8_t data[CM_MACHINE_HASH_BYTE_SIZE];
} cm_hash;
- return pid in jsonrpc fork() - return address in jsonrpc rebind()
Should we have the REASON field there as well? And make DATA smaller? It's a question I've always had... |
I think we should, I went ahead I did the change on the branch with the actual changes I am working on, 4 bytes for data, 2 bytes for reason, this simplified some fragile manual bit shifting in some lua scripts. |
The API mentioned in this PR was implemented at #273 and the work continues there, I am closing this PR to avoid confusion. |
This PR is very WIP, it will be a long road before this is finished and polished. I created the PR before the time so I can document here a changelog of what is being changed while I change them. Also to request for comments on the changes.
You can look around the C API files to see how the new simplified API feels. I have also update each function documentation along the way.
Changes
Added:
CM_MCYCLE_MAX
macro and set toUINT64_MAX
CM_ACCESS_LOG_TYPE
enum with bitwise flags for access log typecm_get_last_error_message
/cm_jsonrpc_get_last_error_message
, returning the error message for the most recent failed call.cm_get_default_runtime_config
/cm_jsonrpc_get_default_runtime_config
, for now it will return just{}
, but we might in the future return a value based on current environment variables, like the Cartesi Node does. People also asked for this function.CM_CSR_HTIF_TOHOST_DEV
), the idea is to write on fields of these CSRs throughcm_write_csr
/cm_read_csr
.CM_CSR_IFLAGS_H
), the idea is to write on fields of these CSRs throughcm_write_csr
/cm_read_csr
.Removed:
cm_read_*
/cm_write_*
of CSRs, except formcycle
/uarch_cycle
/uarch_halt_flag
.cm_delete_*
functions, now any function that returns a string guarantees to maintain the validity of the string pointer until next API call or machine destruction.cm_new_default_machine_config
cm_reset_iflags_*
/cm_set_iflags_*
/cm_packed_iflags
,iflags
can be read/written using CSR viewsCM_CSR_IFLAGS_PRV
,CM_CSR_IFLAGS_X
,CM_CSR_IFLAGS_Y
,CM_CSR_IFLAGS_H
.machine-c-defines.h
header, its contents was moved tomachine-c-api.h
.Renames:
*_uarch_reset*
->*_reset_uarch*
*_uarch_step*
->*_step_uarch*
cm_machine_run
->cm_run
cm_create_machine
->cm_create
cm_load_machine
->cm_load
,cm_machine_run_uarch
->cm_run_uarch
CM_PROC_CSR
->CM_CSR
cm_create_jsonrpc_machine
->cm_jsonrpc_create_machine
cm_load_jsonrpc_machine
->cm_jsonrpc_load_machine
cm_get_jsonrpc_machine
->cm_jsonrpc_get_machine
cm_create_jsonrpc_mgr
->cm_jsonrpc_create_mgr
cm_delete_jsonrpc_mgr
->cm_jsonrpc_destroy_mgr
Changed:
err_msg
arguments were removed, the latest error message can be retrieved fromcm_get_last_error_message()
.cm_replace_memory_range
, now it takes multiple arguments instead of a struct argument.Changes rationale
First have in mind the goal of the changes was to simplify usage of the C API in other languages, to minimize the binding work and make easier to create binding generators.
_machine
from some methods is that most API are related to the machine yet they don't contain the machine word, so here we make this the standard.cm_jsonrpc_
, this makes it easier for binding generators to categorize and filter jsonrpc related methods.cm_destroy
andcm_jsonrpc_destroy_mgr
.