-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a list of tokens that are always indexed #288
base: master
Are you sure you want to change the base?
Conversation
Currently, the list consists of arbitrarily selected tokens from c_common_reswords list present in GCC source code. https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/c-family/c-common.cc;h=0341c44a2cd99771c3eeb44878d3df7a9ae816ea;hb=HEAD#l385
|
update.py
Outdated
@@ -26,7 +26,7 @@ | |||
from threading import Thread, Lock, Event, Condition | |||
|
|||
import lib | |||
from lib import script, scriptLines | |||
from lib import script, scriptLines, always_indexed_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use lib.always_indexed_tokens
instead and avoid a manual import.
update.py
Outdated
# We only index CONFIG_??? in makefiles | ||
ref_allowed = db.defs.exists(tok) or (tok in always_indexed_tokens) | ||
# We only index CONFIG_??? in makefiles | ||
config_or_not_makefile = tok.startswith(b'CONFIG_') or family != 'M' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to reverse the conditional order from family != 'M' or tok.startswith(b'CONFIG')
to tok.startswith(b'CONFIG') or family != 'M'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC I called the variable config_or_not_makefile and decided to reverse order to make it match the name. Assuming that expression will be evaluated left to right, old order is probably a bit better, I will fix that.
I sometimes compare databases by doing |
Elixir seems to mostly focus on C now. Handling other languages properly will require a bigger rework of the update script. Let's maybe focus on handling more interesting builtins/extensions used in Linux for now and focus on other languages later.
Python set has better lookup time (O(1) average) than a list. I will replace the list with a set. |
Offtopic, but that's great to hear, I have seen very concerning I/O performance caused by the update script recently. On my VM, default update.py on many tags causes this: Sure, it's mostly because the VM is massively underpowered (2GB of ram), but I noticed better CPU usage after I added 100MB of cache to each database ( (200MB was even better, but then I accidentally got the script killed by OOM killer. Gonna upgrade to 4GB before I continue, I just need this job to finish first...) Another thing about the update script, it often does lookups in the definitions and references database. I think having our own cache for definitions would improve speed. And postponing key updates until a file is processed (in update_definitions and references) could probably help too. Apparently Berkeley DB supports batch inserts, but I'm not sure if it's possible to utilize that from the python wrapper unfortunately |
This is good advice, thanks!
There is a PR to add support for other languages, see #254. This is a useful direction for Elixir so we should be taking it into account even if it is not yet supported. The goal is to keep adding new languages as simple as possible. Ideally it would only be a new family and new implementations in About I've addressed |
Ok, so should I create a dict with a different list for each family? |
Try adding compiler-provided defs for already supported languages (other languages in the
Ideally we would index them as This needs to be looked into for other already-supported languages. I can't think of any (useful to index) compiler-provided idents for Kconfig and Makefiles. But C++ surely has a lot of compiler-provided defs. |
Add a list of always indexed prefixes
@tleb Ok, so I started to read through GCC and Clang docs about C extensions and other reserved keywords. Point is: there are many of them, and the list is getting bigger. Not often, but I still doubt we will keep it up to date in the long term. Of course that doesn't mean we shouldn't try, but we could either limit the scope a bit, or think about a future-proof way handle many identifiers at once. After reading this https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html and this https://clang.llvm.org/docs/LanguageExtensions.html and some GCC source code, I came to the (perhaps obvious) conclusion that there are a lot identifiers that start with __builtin. Same with __sync, __atomic, __cpp and some others. What do you think about handling all identifiers that start with these prefixes? Second thing - some standard C functions (quite a long list) don't require headers to be used. See https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html I pushed a commit (not tested too well yet) with the proposal - a list of suffixes and more interesting special identifiers. It's still missing some stuff, for example I would move the list to a different file. Extended attributes and most per-architecture stuff is missing too. (And again, does it make sense to handle that?) I couldn't find anything interesting to index in Kconfig and GNU Make docs: https://www.gnu.org/software/make/manual/make.html#Quick-Reference |
Sorry for jumping in the middle of the conversation here, but what is the motivation behind indexing those identifiers? To be Elixir is about navigating in the Linux kernel code (or U-Boot, or etc.). Why would I search for __atomic_store or __atomic_exchange or _Generic ? The implementation is not in Linux, U-Boot, etc. So why searching for those identifiers matters? |
It seems that some built-in identifiers without definitions are interesting to some users. For example #237 Some C extensions are used in the kernel but are not indexed because they have no definitions (that are recognized by ctags at least). See https://elixir.bootlin.com/linux/latest/source/include/linux/overflow.h#L355 |
@tpetazzoni I see three concrete reasons:
|
I see but I'm not extremely convinced that this is really that important, compared to many other things we have planned for Elixir. |
Currently, the list consists of arbitrarily selected tokens from c_common_reswords list present in GCC source code.
https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/c-family/c-common.cc;h=0341c44a2cd99771c3eeb44878d3df7a9ae816ea;hb=HEAD#l385
Needs testing and research about gcc/clang extensions.