-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIP-50: Adding regex extension #1480
base: master
Are you sure you want to change the base?
Conversation
Also, a question is do we have a way to determine a set of If yes, where are they defined? if no, should we talk about them here? maybe based on current implementations. |
Do you intend to build a relay that can do regexes? |
@vitorpamplona Currently I'm working on the Immortal implementation. It's at the early stages and kind of empty. But NIP-50 and especially regex pattern matching is a part of our plan to support to provide complex queries inside the text fields themselves besides the whole event queries. |
2 points: Search engines are not usually regex-friendly. You might want to look at how tools like Lucene, Solr, and Elasticsearch works to get a firmer basic understanding of how a full-text search engine can be built. The key component is indexing and regex is just too complex to allow pre-indexing. You will be left with applying the regex in all events in every new query. Regex is not a standard. Although the basic features work in all languages, there are plenty of grammar specializations going on. The basic features like x*, [abc], (capture groups), are widely supported. Character classes like \d, \w, \s are kinda supported, but, for example, \z means "strict end of string" in perl but it is unsupported in javascript. In C, \d is not supported in GNU regular expressions, and backslash character classes like \w, \s, etc. are not supported at all in pure POSIX regexes -- you must use [_[:alnum:]], [[:space:]], [[:digit:]], etc. For advanced things like negative lookaheads, it's the same story. It's supported in perl-compatible regexes (PCREs) and in javascript... but not in POSIX or in command-line tools like grep. You might want to define exactly which version you want to use or which features must be enabled. |
@vitorpamplona Thanks a lot for your explanation! So let me do more research on different regex versions and make a final decision on the change. I'll inform you once it got updated. |
@vitorpamplona I did some researches on how different regex flavors are supported. PCREs are mostly supported by programming languages, also my SQL and Postgres support them. In the case of Mongo, Redis, and Elastic they partially support regex and their supported flavor are super sets or simplified versions of PCREs as I understand. The POSIX flavor is mostly used on So I decided to use PCREs here. Also, about DBs that are not regex friendly, I think for full and high-performance support the domain-specific databases for Nostr events can work on this case maybe? and consider that they MAY support this extension at all. for specific cases mostly chats I think regex sometimes is very helpful! |
Any idea guys? |
Databases generally use indexes in order to find things fast. To do a regex match you can't use indexes, so they would have to load every single stored record and check it against the regex, even if MySQL supports it that is probably what it does underneath, which for a big enough dataset would be prohibitively costly and slow. If you want to support this, then sure, it's your choice, we can leave this PR open and merge it later if more people start using it too. |
@fiatjaf OK, I'll keep it open and wait to see it in action and see how it goes. |
Regular expressions are commonly used for querying and searching, so it's a good reason to have it as an extension so clients can make specific and complex search queries.