Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Perl support and some tests #1301

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from

Conversation

Leeft
Copy link

@Leeft Leeft commented Mar 12, 2023

Checklist

@Leeft Leeft requested a review from pokey as a code owner March 12, 2023 01:21
Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow you've been busy! Solid progress. I left a bunch of comments; happy to discuss at meet-up if helpful

In addition to the inline comments, might be worth adding tests for the other string types

scopeType: {type: ifStatement}
usePrePhraseSnapshot: true
initialState:
documentContents: if ( $this_is_a_test ) { }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth testing "state" on this one as well once that's implemented

packages/cursorless-engine/languages/perl.ts Outdated Show resolved Hide resolved
packages/cursorless-engine/languages/perl.ts Outdated Show resolved Hide resolved
packages/cursorless-engine/languages/perl.ts Outdated Show resolved Hide resolved
packages/cursorless-engine/languages/perl.ts Outdated Show resolved Hide resolved
) {
// heredoc_content does not seem to supported by tree-sitter-perl,
// leaving it anyway since it won't hurt
if (node.type === "string_content" || node.type === "heredoc_content") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a valid tree-sitter node type? If not I'd drop it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically again copied from Ruby, though I'm indeed not sure if it will be needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test like

"aaa (bbb) ccc"
'aaa (bbb) ccc'
q(aaa (bbb) ccc)
qq(aaa (bbb) ccc)
  1. use multiple cursors, with one cursor in middle of each bbb
  2. say "clear round"

should end up with

"aaa  ccc"
'aaa  ccc'
q(aaa  ccc)
qq(aaa  ccc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the test (had to use a failure as it wasn't working at all, and manually updated that to a hopefully passing test once the behaviour is fixed). Not yet looked whether I can fix the problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed last night, string extractor reverted to default behaviour now, and test reduced to the top two ("basic") strings. Though removed for now, most of them do work, here's take round with multi cursor on the middle b:
image
But until string parsing itself is handled correctly in all cases for Perl, I don't want to deal too much with those including any failing tests for unusual string representations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add tests for the ones that work. For the one that doesn't, are you certain that behaviour is wrong? Does perl take into account the second ( when determining where the string ends, or does it just blindly look for the next ) that it sees?

pre-commit-ci bot and others added 9 commits March 26, 2023 12:00
Was not yet working after the tree-sitter-perl updates, now fixed to
match the behaviour of the python implementation.
The implemented string parser wasn't working well enough; back to the
default parser which doesn't handle all Perl strings well either, but
it can at least support `round`. Future improvement is likely to happen.
Not perfect yet, some complicated constructs do not work well yet and
I'm not sure yet if it is me or the parser. Here is some exotic,
syntactically valid Perl:

```perl
foo( $bar )->bar();
foo( $bar )->bar( $baz );
```
For the file or a file scoped package declaration, takes the entire
file. For a lexically scoped package, takes just that package.

Should ideally take all of the file but not any packages declared within
to be "correct", but that is not yet possible.
Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming along nicely! Left a few comments below

@@ -0,0 +1,46 @@
languageId: perl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a test from within the body of both classes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the top body it doesn't work yet (which is technically correct; the nodes following the package and name nodes are siblings, not children, and thus can't be found). I've added a test for the second one, but the first one will need matcher improvement in some way.

Copy link
Member

@pokey pokey Apr 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be able to just do something like source_file.package_statement.package_name!, similar to what you're already doing for "class" to handle this situation

@@ -0,0 +1,22 @@
languageId: perl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding a "state" test for this one as well

Not perfect yet, I've not been able to yet find a way to get the first
argument when on the opening paren, last argument on the closing paren,
or the arg left of the cursor. But behaviour much improved otherwise
(respects comma's much better now, which was broken before).
Does not yet manage to find the condition for a C-style loop like

```
for ( my $i = 0; $i < 10; $i += 1 ) {}
```

when the cursor is at the keyword at the start of the line.

Also, the tree-sitter-perl fails processing for loops that assign to the
default variable, so can't add all the tests for that yet. Example:

```
for ( 1 .. 10 ) { say "$_" }
```

Taken a note to try and fix that or raise issues for.
Now works from the `for` keyword
tree-sitter-perl doesn't yet differentiate between LHS and RHS of
assignments (lvalue and rvalue) so support for taking the name of a
variable is not implemented yet.
Hash values only; rvalue not yet available in the parse tree, and have
yet to work out how to return just the value node from a return value.
Can't work out how to get the full branch node selected for ternary
operators, so not implemented this just yet. if/elsif/else behaviour
is like other languages that support it though.
Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely getting there! Not a ton left here; I believe my comments capture everything remaining to do. Good work!

- anchor: {line: 0, character: 16}
active: {line: 0, character: 16}
marks: {}
thrownError: {name: NoContainingScopeError}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about adding a test for behaviour that we don't actually want to preserve. any thoughts @AndreasArvidsson? some schools of thought support it; I believe it's called "characterization testing" in the literature

@@ -0,0 +1,34 @@
languageId: perl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding support for "funk name" (via functionName scope type), and then just reusing this test with "change funk name"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both "value" tests are good, but worth adding a few more versions of value:

  • return values
  • right-hand side of assignments
  • possibly other things; see other languages, and sometimes can be handy to see if there are any types / fields called "value" in the tree-sitter spec for your language

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top two are still on my todo list: the 1st one I forgot to ask about in the meetings; the 2nd isn't possible with the parser atm. If Ganesan implements things like I suggested in the issue that I raised then that would become possible.

Other languages, well I don't find the tests particularly easy to read nor to even find (without extensive clicking around in the tree). Can certainly have a look at the grammar for value though.

Copy link
Member

@pokey pokey Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Top two are still on my todo list: the 1st one I forgot to ask about in the meetings; the 2nd isn't possible with the parser atm. If Ganesan implements things like I suggested in the issue that I raised then that would become possible.

👍. Worth linking issue here so we can track it

Other languages, well I don't find the tests particularly easy to read nor to even find (without extensive clicking around in the tree). Can certainly have a look at the grammar for value though.

That's fair. The other one that comes to mind is default value for a function parameter, if such a thing is supported. For example:

function foo(bar = 0) {}

Saying "value" from anywhere in bar = 0 should target 0. Also, in that case, "chuck value" should clean up the =

I also just had a look at the typescript definitions to see if I was missing anything, and the one thing I missed is yield statements in generators. Not sure if Perl supports such a thing. So to sum up:

  • return statements,
  • key-value pairs,
  • default values for function parameters (if they exist)
    • clean up leading =
  • RHS of assignments
    • clean up leading = (kinda extra credit, but can be useful for converting an assignment to a declaration, if such a thing makes sense 🤷‍♂️
  • yield statements, if they exist
  • other things that feel like they should be values in Perl?

@@ -0,0 +1,22 @@
languageId: perl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

> = {
map: "hash",
list: "array",
condition: matcher(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ternaries. prob not a showstopper to ship without it; depending how common ternaries are in Perl

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth adding some "round" tests using the q- / qq- style strings

"source_file",
],
className: "package_statement.package_name!",
name: ["function_definition[name]", "*[key]"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • should include the matchers from className here as well
  • "name" should match variable name in assignment statement

className: "package_statement.package_name!",
name: ["function_definition[name]", "*[key]"],
value: ["*[value]"],
branch: cascadingMatcher(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ternaries. prob not a showstopper to ship without it; depending how common ternaries are in Perl

import { patternFinder } from "../util/nodeFinders";
import { branchMatcher } from "./branchMatcher";

const nodeMatchers: Partial<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "state" (statement)

@pokey
Copy link
Member

pokey commented Apr 12, 2023

Btw you might consider using checkboxes in the issue description as your todo list, as that way it's easy for us to stay on the same page about what's left to do

@pokey
Copy link
Member

pokey commented Jun 15, 2023

Just wanted to check in to see where we were with this one; are we still waiting on stuff to be changed in tree-sitter-perl? Fwiw the new query-based approach to defining languages in Cursorless has now merged, so might make it easier to define some complex tree patterns if you want to leverage it. See the updated https://www.cursorless.org/docs/contributing/adding-a-new-language/#2-define-parse-tree-patterns-in-cursorless

@pokey pokey self-assigned this Jun 20, 2024
@pokey pokey added the 30 mins PRs that are very close and just need a couple quick tweaks to merge; maintainer may take it label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
30 mins PRs that are very close and just need a couple quick tweaks to merge; maintainer may take it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants