Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor!: design new simplified C API #253

Closed
wants to merge 5 commits into from
Closed

Conversation

edubart
Copy link
Contributor

@edubart edubart commented Jul 2, 2024

This PR is very WIP, it will be a long road before this is finished and polished. I created the PR before the time so I can document here a changelog of what is being changed while I change them. Also to request for comments on the changes.

You can look around the C API files to see how the new simplified API feels. I have also update each function documentation along the way.

Changes

Added:

  • Added CM_MCYCLE_MAX macro and set to UINT64_MAX
  • Added CM_ACCESS_LOG_TYPE enum with bitwise flags for access log type
  • Added cm_get_last_error_message/cm_jsonrpc_get_last_error_message, returning the error message for the most recent failed call.
  • Added cm_get_default_runtime_config/cm_jsonrpc_get_default_runtime_config, for now it will return just {}, but we might in the future return a value based on current environment variables, like the Cartesi Node does. People also asked for this function.
  • Added views CSRs for HTIF fromhost/tohost for dev/data/cmd (eg. CM_CSR_HTIF_TOHOST_DEV), the idea is to write on fields of these CSRs through cm_write_csr/cm_read_csr.
  • Added views CSRs for IFLAGS PRV/X/Y/H (eg. CM_CSR_IFLAGS_H), the idea is to write on fields of these CSRs through cm_write_csr/cm_read_csr.

Removed:

  • Remove all structs, complex structs are passed as JSON strings now (configs, access logs and proofs).
  • Remove all cm_read_*/cm_write_* of CSRs, except for mcycle/uarch_cycle/uarch_halt_flag.
  • Remove all cm_delete_* functions, now any function that returns a string guarantees to maintain the validity of the string pointer until next API call or machine destruction.
  • Removed cm_new_default_machine_config
  • Removed cm_reset_iflags_*/cm_set_iflags_*/cm_packed_iflags, iflags can be read/written using CSR views
    CM_CSR_IFLAGS_PRV, CM_CSR_IFLAGS_X, CM_CSR_IFLAGS_Y, CM_CSR_IFLAGS_H.
  • Removed the machine-c-defines.h header, its contents was moved to machine-c-api.h.

Renames:

  • Renamed *_uarch_reset* -> *_reset_uarch*
  • Renamed *_uarch_step* -> *_step_uarch*
  • Renamed cm_machine_run -> cm_run
  • Renamed cm_create_machine -> cm_create
  • Renamed cm_load_machine -> cm_load,
  • Renamed cm_machine_run_uarch -> cm_run_uarch
  • Renamed CM_PROC_CSR -> CM_CSR
  • Renamed cm_create_jsonrpc_machine -> cm_jsonrpc_create_machine
  • Renamed cm_load_jsonrpc_machine -> cm_jsonrpc_load_machine
  • Renamed cm_get_jsonrpc_machine -> cm_jsonrpc_get_machine
  • Renamed cm_create_jsonrpc_mgr -> cm_jsonrpc_create_mgr
  • Renamed cm_delete_jsonrpc_mgr -> cm_jsonrpc_destroy_mgr

Changed:

  • All all machine runtime config arguments were removed from verify functions, because seems like they were not being used.
  • All err_msg arguments were removed, the latest error message can be retrieved from cm_get_last_error_message().
  • Changed cm_replace_memory_range, now it takes multiple arguments instead of a struct argument.
  • Reorganized the C API intro different categories in the header, making easy for the reader to navigate through the API.
  • Replaced all definitions with enums, because they are more friendly for binding generators.
  • Polished the API documentation.

Changes rationale

First have in mind the goal of the changes was to simplify usage of the C API in other languages, to minimize the binding work and make easier to create binding generators.

  • Overall the C API decreased from 152 functions to 59 functions.
  • The rationale of removing _machine from some methods is that most API are related to the machine yet they don't contain the machine word, so here we make this the standard.
  • All methods related to jsonrpc were forced to always begin with cm_jsonrpc_, this makes it easier for binding generators to categorize and filter jsonrpc related methods.
  • All structs were removed, because binding structs in some languages can be quite a work, also because we will now use JSON strings for complex objects (configs, access logs and proofs).
  • All methods that return a C string (for JSON example) will guarantee validity of the string until the same method is called again, saving the need of any deallocating string functions. Most bindings will parse the JSON immediately and duplicate its contents into a special JSON format the language cares about anyway, it's not like people will retain the string pointer stored somewhere, even so the documentation explicitly states its lifetime. This also removes the burden of developers to do C memory management.
  • The only methods required to call to free memory are cm_destroy and cm_jsonrpc_destroy_mgr.

@edubart edubart linked an issue Jul 2, 2024 that may be closed by this pull request
5 tasks
@edubart edubart marked this pull request as draft July 2, 2024 20:59
@edubart edubart added this to the v0.19.0 milestone Jul 2, 2024
@edubart edubart self-assigned this Jul 2, 2024
@edubart edubart added the refactor Restructuring code, while not changing its original functionality label Jul 2, 2024
@edubart edubart force-pushed the refactor/new-c-api branch from 5d28c22 to 54b305d Compare July 3, 2024 17:51
src/machine-c-api.h Outdated Show resolved Hide resolved
@edubart edubart force-pushed the refactor/new-c-api branch from 54b305d to 8aa35a1 Compare July 3, 2024 21:30
src/machine-c-api.h Outdated Show resolved Hide resolved
src/machine-c-api.h Outdated Show resolved Hide resolved
src/machine-c-api.h Outdated Show resolved Hide resolved
/// \param major Receives semantic major version.
/// \param minor Receives semantic minor version.
/// \param patch Receives semantic patch version.
/// \param pre_release Receives pre release string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this behavior or "something being valid until a function overwrites it". It is bug prone.
It's the kind of thing that the C standard has been fighting for decades to get rid of (see _r variants).

I think that the users having to check the manual for the lifetime of string that functions return is worse than telling they always get their own copy that they should free (or not, who cares). I prefer the user to always do the same thing: got a string? free it. It's yours.

There really is no concrete performance issue here that I can see.

The read-memory function is the only one that could potentially be called repeatedly, and that already receives a buffer to write in place, right?

I understand your urge to avoid the need for users to call "free" on the other side, but do we really need to be so paternalistic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your urge to avoid the need for users to call "free" on the other side, but do we really need to be so paternalistic?

It makes the C API easier to use in my view, because it becomes less boilerplate (less memory management calls imposed for the user to do). It's more work for the user to free something immediately after a function call, and I think it is also bug prone either way in case users forget to free, or try to be clever deferring its deallocation sometime in the future. Users have to read the documentation either way. I have seen people struggle with C memory management, one of the goals here was to minimize the burden to do this.

But what are your suggestions here? Ask for the user to free every returned string in the API? What about error messages? Would they be an exception so we can have cm_get_last_error_message() ? Or you think we should go all the way and keep the same behavior of every function having an err_msg which needs to be freed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote in favor of Implementing cm_get_last_error_message(). It will alleviate 90% of the pain that users face when using the C API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the API docs states,
"Strings are returned in pointers that remain valid and hold the same value until the next time the same API function is called"
, then the user only needs to read it once.
Not having to free memory will save them a lot of time.
This approach is obviously easier and not more error prone than expecting the user to free memory."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for sure, @mpernambuco . cm_get_last_error_message simplifies the use of every other API function, and we are keeping it.

What we are discussing here is basically the management of strings returned by the API. Can the API return a string that points to some internal memory buffer kept by the binding or the machine class, and that will be overwritten or potentially deleted by the binding or machine class at some sort-of-unkown time in the future? Can some other function return a string that points to an internal memory buffer that will remain valid until the machine object is destroyed, and that the user can't modify or that same function might return the (now incorrect) modified string back?

I think that cm_get_last_error_message retains its value even if the string it returns is a copy and the user is forced to free it afterwards. This is because it allows error handling to be separated from the place where a function that failed was called. I.e., it can be done only once and called many times.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using C++ thread_local?

Copy link
Contributor Author

@edubart edubart Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you have two machines on a thread, the same function call on different machines will overwrite each other?

Correct, for example, in the same thread calling cm_get_initial_config() twice, the second call will overwrite contents returned by the first call. If a user wants to retain the first call contents after the second call, he will need to duplicate the first call string himself.

But as I mentioned, these are JSON strings, most likely these strings will be parsed right-away into some internal language specific JSON data structure that will be able to retain its values. Is not like people will keep holding and working with JSON C strings around, it's a pain to work with JSON as C string, languages usually duplicate them anyway to some other internal JSON object.

The usual case is: Got a JSON string? Parse it immediately into to your language JSON data structure, and forget the JSON C string.

Using C++ thread_local?

Correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will be the behavior of the machine class in C++ as well, right? Return const ref to a thread_local string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will be the behavior of the machine class in C++ as well, right? Return const ref to a thread_local string.

Could be, but it's not necessary. I could do this outside C++ API, just in the C API binding file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we say "overwritten" we mean "invalidated", right. As in, the pointer would now be dangling. I don't think we can avoid that. In that case, we might as well move this thread_local complexity to the C++ class itself, to simplify the C API implementation (which, BTW, could later use some love...)

Copy link
Contributor

@diegonehab diegonehab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the urge to change break iflags apart at the moment, but I have a different proposal that I'd like not to mix with this.

At the moment, iflags has PRV and the X, Y, and H flags.

PRV really is something internal that the host should never mess with.

X, Y, and H, on the other hand, are things the host needs to look at and (in the case of Y, change).

There is never a case in which more than one of these flags is set.

They are always set via HTIF from the inside.

In the case of X and Y, the host will also need to look into htif.tohost.

The obvious place for these flags to live in is in htif.tohost itself.

With some reorganization there, we can make this happen.

We would rename iflags to iprv and make it a full CSR without any packing/unpacking or dedicated read/write methods.

The interpreter loop would check for the equivalent of X, Y and H in htif.tohost. While we are at it, let's rename X to YA (or YO) and Y to YM (or YI).

After a return from run, the host would already have the reason.
The host would read htif.tohost and decide what to do.
There already are many WARL CSRs that prevent certain bits from being changed.
htif.tohost would be one of these. If H is set, it would remain set forever. I think we can even use a write to htif.fromhost to clear Y, saving the need to modify htif.tohost.

X is cleared automatically. But we could we can remove X, right? There is no need for it anyway, now that we return the status in the call to run?

Anyway, the scope of these changes are way outside of the API simplification for binds. I would like to avoid mixing these.

@edubart
Copy link
Contributor Author

edubart commented Jul 4, 2024

Addressed some comments:

  • Avoid splitting IFLAGS, introduce CSR views of its fields instead, to avoid creating unpack/pack methods.
  • Removed addition of cm_write_word.
  • Improved documentation of cm_replace_memory_range
  • Renamed *_uarch_step* -> *_step_uarch*
  • Removed read/write for CSRs pc and uarch_pc.

@diegonehab
Copy link
Contributor

diegonehab commented Jul 4, 2024

Addressed some comments:

* Avoid splitting IFLAGS, introduce CSR views of its fields instead, to avoid creating unpack/pack methods.

* Removed addition of `cm_write_word`.

* Improved documentation of `cm_replace_memory_range`

* Renamed `*_uarch_step*` -> `*_step_uarch*`

* Removed read/write for CSRs `pc` and `uarch_pc`.

I think we should keep reset_iflags_Y at least. And we might want to keep read_iflags_X, read_iflags_Y, and read_iflags_H just because everybody uses them.

@diegonehab
Copy link
Contributor

One thing to consider. Are we doing to offer a “raw” version of these JSON functions in the binds, ones that return/receive JSON and other that receive/return parsed objects (like lua tables) or not?

@edubart
Copy link
Contributor Author

edubart commented Jul 4, 2024

One thing to consider. Are we doing to offer a “raw” version of these JSON functions in the binds, ones that return/receive JSON and other that receive/return parsed objects (like lua tables) or not?

I think I could try making a generic function to translate JSON <-> Lua table, so we don't use raw JSON strings on Lua side. In the worst case if that turns out to be a bit of work or having flaws (such as JSON special values like its NULL), we could have a raw JSON string and on Lua side and use lua-dkjson. For now my concerns is the C API, later I will think more about Lua goodies.

@edubart
Copy link
Contributor Author

edubart commented Jul 4, 2024

I updated the PR adding back some iflags methods and trying to clarify more how string lifetime for JSON will work (thread local storage per function).

/// \param data Response data to send
/// \param length Length of response data.
/// \returns 0 for success, non zero code for error.
CM_API int cm_send_cmio_response(cm_machine *m, uint16_t reason, const unsigned char *data, size_t length);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have the other way as well, something like this:

CM_API int cm_recv_cmio_request(cm_machine *m, uint16_t *reason,
    size_t max_length, unsigned char *data, size_t *length);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think this will be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t implement this on the solidity side. So we don’t know what order they will need things in the log. We are also going to simplify this protocol soon, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this as an abstraction of the HTIF + buffers interface we have today. The alternative is for users to pack / unpack HTIF by themselves, not ideal.

We are also going to simplify this protocol soon, right?

I'm not following. Is this about uarch -> risc0 transition?
if so we could drop log and uarch, but Node people will still need to load / retrieve bytes from buffers right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be useful on its own as a high level API to do IO with the machine through the C API for applications that want to use libcartesi as a C library. As of now after the machine yields you can send data inside with cm_send_cmio_response but there is no easy function to read data from inside, cm_recv_cmio_request would be this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go for it.

Copy link
Contributor

@mpolitzer mpolitzer Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#257

So not having to deal with iflags, only HTIF. Doesn't seem to affect this discussion.

So in what would be the information is it that we would provide ind the log and in what order?

Assuming: "So what would be the information we provide, and in what order?"

A bit out of my depth, but I don't see the issue. My understanding is that this is already a provable machine since it is the machine immediately before a surgery.

So for instance, we do read_htif(...); + read_memory(TX_BUFFER); in cartesi-machine.lua.
What is the difference from that to having a single function that does both and massages the internals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we don’t control what this code will look like in solidity. We don’t know anything about the different domains.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log has to include the places where the info comes from. So the changes impact this.

/// \brief Destroys a machine.
/// \returns 0 for success, non zero code for error.
/// \details The machine is deallocated and its pointer cannot be used after this call.
CM_API int cm_destroy(cm_machine *m);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set the pointer back to NULL depending on how paranoid you want to be.

CM_API int cm_destroy(cm_machine **m);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, not doing this hehe

typedef struct cm_machine cm_machine;

/// \brief Machine hash array.
typedef uint8_t cm_hash[CM_MACHINE_HASH_BYTE_SIZE];
Copy link
Contributor

@mpolitzer mpolitzer Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be error prone when used as a function argument.
Prefer something like this.

typedef struct {
    uint8_t data[CM_MACHINE_HASH_BYTE_SIZE];
} cm_hash;

- return pid in jsonrpc fork()
- return address in jsonrpc rebind()
@diegonehab
Copy link
Contributor

Added views CSRs for HTIF fromhost/tohost for dev/data/cmd (eg. CM_CSR_HTIF_TOHOST_DEV), the idea is to write on fields of these CSRs through cm_write_csr/cm_read_csr.

Should we have the REASON field there as well? And make DATA smaller? It's a question I've always had...

@edubart
Copy link
Contributor Author

edubart commented Sep 7, 2024

Should we have the REASON field there as well? And make DATA smaller? It's a question I've always had...

I think we should, I went ahead I did the change on the branch with the actual changes I am working on, 4 bytes for data, 2 bytes for reason, this simplified some fragile manual bit shifting in some lua scripts.

@edubart edubart mentioned this pull request Sep 8, 2024
@edubart
Copy link
Contributor Author

edubart commented Sep 26, 2024

The API mentioned in this PR was implemented at #273 and the work continues there, I am closing this PR to avoid confusion.

@edubart edubart closed this Sep 26, 2024
@edubart edubart deleted the refactor/new-c-api branch December 7, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Restructuring code, while not changing its original functionality
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Simplify the C API
4 participants