Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tags with underscore are not parsed correctly #66

Open
jm-g opened this issue Apr 17, 2023 · 2 comments
Open

Tags with underscore are not parsed correctly #66

jm-g opened this issue Apr 17, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@jm-g
Copy link

jm-g commented Apr 17, 2023

According to the Org user guide, tags are defined as follows:

Tags are normal words containing letters, numbers, ‘_’, and ‘@’.

But the parser seems to handle the _ as a format annotation.

(transform (parse "* Headline :tag_a:\n")) 

;; => {:headlines
 [{:headline
   {:level 1,
    :title
    [[:text-normal "Headline :tag"]
     [:text-sub [:text-subsup-word "a"]]
     [:text-normal ":"]],
    :planning [],
    :tags []}}]}

In my opinion, the correct behavior would be

(transform (parse "* Headline :tag_a:\n")) 

;; => {:headlines
 [{:headline
   {:level 1,
    :title [[:text-normal "Headline"]],
    :planning [],
    :tags ["tag_a"]}}]}

This is with org-parser 0.1.27 with Clojure on the JVM.

@schoettl
Copy link
Collaborator

Thanks for the report.

I just tried this:

org-parser.core=> (read-str "* foo  :_:")
{:headlines [{:headline {:level 1, :title [[:text-normal "foo"]], :planning [], :tags ["_"]}}]}

But if "_" is followed by a letter, it doesn't work. Don't yet understand why...
https://github.com/200ok-ch/org-parser/blob/master/src/org_parser/transform.cljc#L66

@schoettl
Copy link
Collaborator

Oh, I think I got it. extract-tags function does not receive the raw string but the parsed headline text. And the "_" causes the headline text to be parsed to text followed by text-subsup-word (subscript text).

I don't have time currently to work on this. Do you want to give it a try to fix it?

The reason why we didn't parse the tags directly and instead leave it to transform is documented here:
https://github.com/200ok-ch/org-parser/blob/master/resources/org.ebnf#L37

@schoettl schoettl added the bug Something isn't working label Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants