Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Conditional Regex Supported? #22

Open
afshindavoudy opened this issue Oct 7, 2023 · 9 comments
Open

Is Conditional Regex Supported? #22

afshindavoudy opened this issue Oct 7, 2023 · 9 comments

Comments

@afshindavoudy
Copy link

Hi,
I appreciate your excellent extension! However, I'm having trouble getting a conditional regex to work. I'd like to confirm if this plugin supports conditional regex.

@rioj7
Copy link
Owner

rioj7 commented Oct 7, 2023

@afshindavoudy Can you give an example?

I haven't found any doc about conditional constructs in the MDN JavaScript Regex pages

@afshindavoudy
Copy link
Author

afshindavoudy commented Oct 9, 2023

@rioj7
Hi, thanks for reply
Suppose we have this structure: 'foo-bar', foo-bar.

I need to implement a conditional regex as follows:
If a single quote is present, move the cursor to the closing quote using this expression: .(?=(?<=('[^',;:+=]*')).)
If not, then , move the cursor to the hyphen (if there is one).

@rioj7
Copy link
Owner

rioj7 commented Oct 10, 2023

@afshindavoudy does this work

('[^']*'|[^-]*-)

@afshindavoudy
Copy link
Author

afshindavoudy commented Oct 10, 2023

I've set it up as a keybinding for execution through the "Select By" extension, as shown below:

    {
        "key": "alt+j", // Keybinding
        "when": "editorTextFocus",
        "command": "moveby.regex", //Run by "Select By" extension
        "args": {
            "regex": "('[^']*'|[^-]*-)",
            "properties": [
                "next",
                "start",
            ]
        },
    },

However, it currently matches all hyphens and does not skip those within single quotes (' '). For example, in the text

'foo-bar', foo-bar, 'foo-bar', foo-bar,

it matches all hyphens, including the ones within the single quotes.

The regular expression I'm attempting to create is more complex than the previous example I provided. I believe it can only be achieved using a conditional regex.

@rioj7
Copy link
Owner

rioj7 commented Oct 11, 2023

@afshindavoudy

Can you give more detailed examples of where the cursor is and where you expect the cursor to move-to after the command.

If a single quote is present

Where is this quote relative to the cursor position?

Can you describe what you actually want to search/move?

In your example

.(?=(?<=('[^',;:+=]*')).)

you use Lookahead and Lookbehind, they are supported by JavaScript. Only fixed length Lookbehind.

Do a web search for regex conditional and you get different options implemented in PCRE and Python.

@afshindavoudy
Copy link
Author

afshindavoudy commented Oct 11, 2023

@rioj7
Indeed, I've been looking for a VSCode extension that can help navigate through all variable names, values, etc. So, this functionality can be assigned to keybindings like "alt+enter" or the "tab" key. Unfortunately, I couldn't find one that perfectly suits this task. As a result, I've chosen to pursue this goal using the "Select by" extension and a carefully crafted regex. Here's a summary of what I've achieved so far:

{
    "key": "alt+enter",
        "when": "editorTextFocus",
            "command": "moveby.regex",
                "args": {
        "regex": "(?<![\\])},:;\"'`])\\s*[\\])}]|(?<![\\s*\\[\\](){},:;\"'`\\-+/*^%&|<>=])\\s*[,:;\\-+/*^%&|<>=]|.(?=(?<=('[^',;:+=]*')).)|.(?=(?<=(\"[^\",;:+=]*\")).)|.(?=(?<=(`[^`,;:+=]*`)).)|(\\s*!=)",
        //        |   [  ])},:; "'`   ws    ])} |       ws  [  ](){},:; "'`  -+/*^%&|<>=   ws  ,:;  -+/*^%&|<>= |        closing '        |          closing "         |         closing `       |     != |
            "properties": [
                "next",
                "start",
            ]
    },
},

Descriptions:

"((?<![\\])},:;\"'`]\\s*)[\\])}])"

Captures any closing brackets ])} that are not preceded by any ])} or ,:; or "' characters, with or without any white space in between.

"(?<![\\s*\\[\\](){},:;\"'`\\-+/*^%&|<>=])\\s*[,:;\\-+/*^%&|<>=]"

Captures any ,:;-+/*^%&|<>= that are not preceded by any [{()}] or ,:; or "' or -+/*^%<>= or &| characters, with or without any white space in between.

".(?=(?<=('[^',;:+=]*')).)"
".(?=(?<=(\"[^\",;:+=]*\")).)"
".(?=(?<=(`[^`,;:+=]*`)).)|(\\s*!=)"

Captures the character right before the above character (The closing "'` itself)

This solution functions perfectly in most scenarios. However, it also captures characters like "-", "+", "&," "%", and so on, when they appear within quotations("", '', ``) and comments. This is the issue I am currently addressing (and trying to fix with conditional patterns).

@afshindavoudy
Copy link
Author

afshindavoudy commented Oct 11, 2023

By the way, I've encountered two more challenges:

I need to incorporate regex patterns to detect "end of line" and "The first empty line between brackets" . I've tried using the following patterns:

(?=\\n)
(^\\s * $)

However, these patterns interfere with the ability to navigate to the next line when they are detected.
Appreciate any help on these two also. :)

@rioj7
Copy link
Owner

rioj7 commented Oct 13, 2023

@afshindavoudy

You have the word was in the regex comment, most likely a typo for ws

(?<![\])},:;"']\s*)[])}]`

I just tried this in a text file (using the Find box of VSC, no need to escape) and I'm amazed that a variable length Negative Lookbehind works.

.(?=(?<=('[^',;:+=]*')).)

This is equivalent with .(?=(?<='[^',;:+=]*').), there is no need to group the content of a Look-Ahead-Behind.

What I don't get is the logic of a LookBehind INSIDE a LookAhead

'[^',;:+=]*' matches a series of characters enclosed in '' that does not contain any of ,;:+=

The whole expression matches a character (first .), at that position (X) it should find behind a '' string that does not contain any of ,;:+=, thus the first . must be the closing ' of the '' string, and it must have a character after position X (last .) that is not a newline

Using this explanation an equivalent regex is: (?<='[^,;:+=]*)'(?=.)

You don't want to match the ' + ' like "strings" that are found in:

  • 'foo' + 'bar'
  • myfunc('foo', 'bar')

'foo-bar', foo-bar.

The regex that matches the closing ' or the - is: (?<='[^,;:+=]*)(?=[^,;:+=]*?').*?(')(?=.)|(?<='[^,;:+=]*)'(?=.)|-

It checks if it is inside a string first, then try find next string, then try -.

The closing ' is put in a capture group (') to be able to locate the group in the big picture and go to the start or end of it.

If you are inside a string the whole string content and closing ' is matched, you can't go to the start of the closing ' without calculations or maybe the terminator has more than 1 character, like python """ strings

I have to modify the extension to make the following properties working:

{
    "key": "alt+enter",
    "when": "editorTextFocus",
    "command": "moveby.regex",
    "args": {
        "regex": "....",
        "properties": ["next", "start"],
        "groups": [1]
    },
},

It checks if group 1 has a match and uses that to determine the start or end position to place the cursor. If not it uses group 0 (the whole match) for start and end. Or I use the property "groups": true to use the first capture group that matches some text. So you don't have to count and it will use a capture group from any alternative that matches.

However, it also captures characters like -, +, &, %, and so on, when they appear within quotations("", '', ``) and comments.

In case of 'foo' + 'bar' do you want to match the + because it is not in a string, but you don't want to match the ' before bar because it creates a False-string, but + is inside a False-string.

Comments can be eliminated if you start the regex with: (?<!//.*) if // is the comment designator.

Regarding the false positives inside strings, how often does that happen and how bad is it to hit the key a second time to jump to the "next" positive or false hit.

newline

Have you looked at the m flag in Advanced searching with flags


To get it very smooth you have to use an AST or you might be helped with the TextMateScope of the cursor position to determine what to do, use which regex, based on the fact if the cursor is in a string or comment or other.

@afshindavoudy
Copy link
Author

afshindavoudy commented Oct 15, 2023

Hi @rioj7,
Thank you so much for your comprehensive reply, and I apologize for the delay in getting back to you. I greatly appreciate your insights and the time you took to provide feedback.

First, I want to acknowledge the typo you spotted in my regex comment – you're absolutely right; "was" was indeed a typo generated by auto-completion.

Regarding the regex pattern discussion: Your explanation of the logic behind .(?=(?<=('[^',;:+=]*')).) makes perfect sense. I'll certainly consider the more readable pattern you suggested.

I'm also excited about the potential modification of the extension to enhance its functionality. It would be a fantastic addition, and I'm looking forward to giving it a try once it's implemented. Please keep me updated on its progress.

Your other tips and suggestions are invaluable, and I'm keen to incorporate them into my existing regex pattern.

Thanks once again for your assistance and guidance. Please feel free to share any further advice or insights you may have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants