-
Notifications
You must be signed in to change notification settings - Fork 586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Parser: Use the Clang API #51
Comments
BTW, we probably want to use the C++ API of Clang for this. It is not currently mapped by the presets, so as initial work, we would either have to:
Either way is fine with me. Thanks for your interest in this project and let me know how I can help! |
Sorry, I haven't had the time to work on this lately. My final changes allowed me to parse a lot, but some missing things from the C++ API prevented me finishing it IIRC, so I think option 2 should help us get there (or, as you said, code it in C++, but it's non-trivial :D) |
I've continued a bit on this, and for now, I've decided to write the Clang bindings manually, as I think the surface we need from Clang is not that big (I might be wrong, though). Once we hace a working parser, we can generate proper bindings for Clang itself and use them in the generator, closing the circle ;) I have some doubts about how to implement some of the bindings, though, so I'll hit the forums in a few days with my questions :) |
@Arcnor Any progress with this? libclang seems to be getting pretty good for that sort of thing, for example: |
Hi Samuel, No, unfortunately I haven't had the time to continue, not enough incentive for me to do so right now (the project(s) that were using JavaCPP all stopped for one reason or another, mostly priorities). So If they kept the same names on the AST I might even be able to reuse some of the code I made years ago that parsed the unstable AST output of CLang (https://github.com/Arcnor/objc2robovm/blob/master/src/main/java/com/arcnor/objcclang/parser/CLangHandler.java for example). Anyway, unless somebody else is working on this, I'll try to give it another look if it doesn't look too complex to interact with it, as time is limited :). |
Looks good! No one else is looking into this AFAIK, so please do continue
to check it out! The C API is pretty stable BTW. Thanks
|
I'm going to need some help to generate the bindings it seems. I'll put my question(s) here as they are related, but if you need me to use the forum I'll go there instead: The bindings have the following code: typedef struct CXVirtualFileOverlayImpl *CXVirtualFileOverlay;
CXVirtualFileOverlay clang_VirtualFileOverlay_create(unsigned options);
... How can I rename |
So besides that small problem, the whole API seems to work (well, at least compile) with very few manual mappings, which is cool. I'll take more time tomorrow to actually figure out if the stuff I couldn't do ~2 years ago is now possible :). |
Sounds good! BTW, the bindings for libclang are already available here: If there's anything to fix about those though, please send pull requests against the presets config: |
Ahh, I didn't realize that. Thanks, I'll start using those.
The only change I'd like is generating them against 5.0.0 if they're not
already, and making some functions return String instead of BytePointer,
I'll open a PR later when I get the time.
…On Dec 7, 2017 17:24, "Samuel Audet" ***@***.***> wrote:
Sounds good! BTW, the bindings for libclang are already available here:
https://github.com/bytedeco/javacpp-presets/blob/master/
llvm/src/main/java/org/bytedeco/javacpp/clang.java
AFAIK, we just need to use those.
If there's anything to fix about those though, please send pull requests
against the presets config:
https://github.com/bytedeco/javacpp-presets/blob/master/
llvm/src/main/java/org/bytedeco/javacpp/presets/clang.java
Thanks!!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABvyIQQvMfdHkhow8JHNefQ7-6VZEZkks5s94SUgaJpZM4GuXQL>
.
|
They already work with LLVM 5.0.0 yes: All |
They get converted to both, yeah. Except in the return value. I changed all
return values to be String instead of BytePointer.
The most obvious use is for the getCString method to convert a CXString,
but I'm sure there are others.
…On Dec 7, 2017 18:21, "Samuel Audet" ***@***.***> wrote:
They already work with LLVM 5.0.0 yes:
https://github.com/bytedeco/javacpp-presets/tree/master/llvm
All const char * should already get mapped to String as well as
BytePointer, but if there are char * that should also be mapped to String,
yes please, do let me know! Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABvyDBy9tIChcWsbUcvo-ecY_UhmYk7ks5s95HwgaJpZM4GuXQL>
.
|
Right, the problem with return values is that when we need a |
Yeah, I understand, I'm just hoping that all "const char*" returns are for
Strings. Of course, I might be wrong, but it should be more or less
straightforward to check, I'll do that later.
On Dec 7, 2017 19:34, "Samuel Audet" <[email protected]> wrote:
Right, the problem with return values is that when we need a Pointer, we
can't get one from the String, but we can get a String from a BytePointer
with getString()...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABvyKHTKdKSN6oxH3-QElS_VwDIKCdTks5s96MZgaJpZM4GuXQL>
.
|
If not, we can add to |
Actually, after calling |
Ah, you're right, I didn't read that properly.
Ok, you win this one ;)
On Dec 8, 2017 00:51, "Samuel Audet" <[email protected]> wrote:
Actually, after calling clang_getCString() we need to call
clang_disposeString(), so simply returning a String isn't that convenient.
I've added the helper function I talked about in the commit above.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#51 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AABvyBcvbokGz-hp1U_81IBxEJCf0Fg6ks5s9-1pgaJpZM4GuXQL>
.
|
I've finally checked this properly, and it seems this method was the only one that made sense to have as BytePointer, as it has the dispose. As far as I can see, the others (except Anyway, for now I'll continue as it is, we can always improve things later without many changes. |
I'm now getting crashes (randomly, like 1 for every 5 executions or so) like this one:
Visitor is a class I created that looks exactly like this:
...and I'm just instantiating by calling |
Have you disabled "crash recovery"?
https://github.com/bytedeco/javacpp-presets/tree/master/llvm |
Ahh, nice, will try that. I've had a good run of ~10 without crashes though, so it will be difficult to prove if it worked (unless I get it again :D) |
…XEvalResult` (issue bytedeco/javacpp#51)
I've added |
Let me know if there's anything else missing from the API that would prevent you from making progress. Thanks!! |
Not really, the JavaCPP Presets for LLVM also essentially map the C API only. That's not the problem, the problem is that jextract was designed to work only with C, not C++. It fails miserably at anything that even remotely looks like C++. I think that would be the first thing to "fix" before going forward with that idea.
@mcimadamore @sundararajana might have some more recent insights into what they looked at, why it doesn't work, etc. |
I meant to use jextract to bind the c clang API only. Then clang can be used to parse C++.Where is the limitation due du jextract ? |
jextract also already maps the C API of Clang: jextract doesn't support C++, period. It never has and probably never will. |
Sure, but the C-API of Clang can parse C++. |
Yeah, but it's not going to be any better than the JavaCPP Presets for LLVM. You'll get the exact same thing. The only reason you may want to use jextract is to get potentially support from Oracle... |
And the bootstrapping ? |
I'm not sure I understand what you mean by "bootstrapping", but whatever it is, it's not going to be a bigger problem than supporting C++. Start with getting something working for C++, and if you get that working, the rest isn't going to be a problem. |
I mean the problem of "chicken or the egg": You need the LLVM presets to use the parser, and you need the parser to build the LLVM presets. |
Didn't you just say that you'd use the one from jextract? Just do that, that's fine. |
@HGuillemet I believe you are suggesting to use an approach similar to that used by jextract to e.g. generate libclang bindings which rely on the foreign function API. That part works well, and, assuming a tool only need the C clang API, that could be good enough. We did some experiments parsing C++ with the C API and these were not successful, as the C API, at the moment, does not expose enough information re. template instantiation (the information is there under the hood, just not exposed in the C API, unfortunately). These same problems were observed in other projects using the C API as well (I seem to recall Rust's bindgen having several workarounds to make C++ sort of work with that API). I do hope that, in the future, the clang C API will be improved to add those missing 2-3 functions which will make handling templates much more manageable. At this point in time I cannot recommend using the clang C API to emit bindings for real-world C++ code. |
Thank you for these informations. Yes, that's what I was suggesting. |
If I'm understanding correctly, the "bootstrap problem" is the problem that we would depend on the libclang implementation to create the libclang implementation, similar to how you need GCC to build GCC. We already solved that part, as we already have a stage 1 libclang implementation at https://github.com/bytedeco/javacpp-presets/blob/master/llvm/src/gen/java/org/bytedeco/llvm/global/clang.java made with the old parser which would suffice to build the new javacpp parser. I actually had a go at this some time back, and I seemed to be able to parse some very basic C headers with the libclang API from JavaCPP Presets. If missing Clang C functions is an issue, we can either:
|
IIRC, one of the main missing bit of functionality was being able to retrieve all template instantiations for a given template method/class (as a binder would need to generate special code for all of these). |
@HGuillemet Ah, you were referring to missing functionality from the C API of Clang. We can easily "extend" the C API ourselves, that's not an issue. I thought I mentioned that in this thread, but it's actually in bytedeco/javacpp-presets#475 (comment). So just add anything along that you need, that's not a problem. |
FYI, here's something that looks more useful than Panama since it supports C++ and it's actually able to inline native functions:
@HGuillemet You may want to start looking at that, in addition to Panama. Thanks to @frankfliu for letting me know about that! |
This project is interesting. It aims at providing a full alternative to JavaCPP (and Panama).
However:
|
It doesn't aim to be an alternative to Panama, that one is never going to support C++ or function inlining, it's not part of their goals. Like I explained before, I don't think anyone is going to switch from JNI to Panama, and that project (fastFFI) demonstrates that well. JNI is just fine, it's already fast enough and can be made user-friendly with tools like JavaCPP. However, to increase performance to any meaningful degree, what we need is to bring something like LLVM on the JVM without anything "foreign", which Panama is not willing to do, so in my opinion it's never going to give us anything substantial over JNI. As for being a "full alternative" to JavaCPP, it's possible, but JavaCPP doesn't use Clang or anything like that, so if that's what they have started to work on, I would consider that an evolution over JavaCPP, and we should probably try to collaborate with them instead of redoing the same thing ourselves. @frankfliu What do you think? |
@saudet I agree with you. If their architecture is clean and foundation is solid, improving usability is relatively easier. |
Their component (LLVM4JNI) that uses clang to compile the JNI glue code to bytecode and then translates it to JVM bytecode seems more or less independent and could probably be applied as is to JavaCPP. If they do plan to opensource a C++ parser based on clang, with support for generics, I agree that it would be interesting to know more about it before continuing to work on our own. This project seems quite old in fact. I'd say at least 10 years. They decide to opensource it now, for some reasons it would be interesting to also know about, as well as their plans and available resources. |
Aside from java-port/clank, the C#/Mono/Xamarin crowd also have a lot of experience binding and porting C++ class hierarchies. Both of these projects use the Clang frontend to produce ASTs and port the Clang AST class hierarchy to C# for consumer side codegen APIs:
I think both projects produce their own Clang C bindings and manually port the C++ AST bits they need. They also both have non-trivial C++ code they use to control the Clang frontend. Xamarin project has bindings for most Objective-C libraries on Mac and iOS here: xamarin/xamarin-macios. Would love to understand their process. It has to be one of the largest successful bindings projects ever. I'm sure it's largely automated and my guess is they use Clang's Objective-C frontend... |
SkiaSharp is another example. A large C# binding project for Google's Skia 2D graphics library. They are a mono project used by Microsoft in .NET. In the binding generator module they are using CppAst.NET, which implements a C++ AST in C#. CppAst.NET does not use the C++ library, cppast. They appear to have stolen the name, but cppast claims to expose bits of Clang's AST which are not exposed directly by libclang. If so, that may be useful. Edit: I was mistaken that CppAst.NET binds cppast. It is merely named after the latter. |
What about using clangd ? |
I've taken a cursory look at that. I somewhat like the idea. Clangd depends on understanding the project's build system through Many large bindings projects seem to effectively reproduce parts of the build system, dependency graph, and source file hierarchy of their underlying library anyway. Generating One possible issue: AST access is provided as an LSP protocol extension: https://clangd.llvm.org/extensions#ast.
There is an LSP implementation for Java here: https://github.com/eclipse/lsp4j. I think the protocol is similar to HTTP, so client implementation shouldn't be too bad. Other than providing per-file AST access, clangd provides an index which may be marginally helpful: |
Hi @saudet little update, You said
To which Mcimadamore outlined some possible scenarios where the foreign linker api could lead to better performance than JNI. The news:
And mention future internal use of the vector api. |
FMA is unrelated to Clang or JNI, please see issue #402 |
I'm in the process of doing this right now. Currently, the following issues exists with the approach I'm taking:
The text was updated successfully, but these errors were encountered: