Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: null chars in input stream #8

Open
freddieventura opened this issue Jan 29, 2024 · 4 comments
Open

Bug: null chars in input stream #8

freddieventura opened this issue Jan 29, 2024 · 4 comments

Comments

@freddieventura
Copy link

I am indexing the Mozilla mdn CSS web documents , and making tags accordingly. I have the documentation related to the CSS properties and some other topics in one single file localdocu.mdncssdan, each topic starts with # followed by a newline then the name of the topic.

The following is a simplified view of the document.

This page was last modified on Jul 7, 2023 by MDN contributors.

(...)

#
animation-composition

The animation-composition CSS property specifies the composite operation
to use when multiple animations affect the same property simultaneously.
(...)

This page was last modified on Jun 26, 2023 by MDN contributors.



#
animation-delay

The animation-delay CSS property specifies the amount of time to wait
from applying the animation to an element before beginning to perform
(...)

What I have is the following rule

--kinddef-mdncssdantags=t,topic,topics
--mline-regex-mdncssdantags=/^#\n(\w.*)$/\1/t/{mgroup=0}

Which works , detecting the tags , but the regex is refering to as the destiny of the tags is shifted some lines up

animation-composition	localdocu.mdncssdan	/^This page was last modified on Jul 7, 2023 by MDN contributors.$/;"	t
animation-delay	localdocu.mdncssdan	/^This page was last modified on Jun 26, 2023 by MDN contributors.$/;"	t

So this is not accurate, when I open say animation-delay , it will go 3 lines up to the This page was last modified.... , and the biggest issue is that, when this line is the same for other topics (say I have got other topic which its 3 lines upper is that same This page was last modified on Jun 26 , then the tag referecing system is completely messed up)

$ ctags --version
Universal Ctags 5.9.0, Copyright (C) 2015 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Sep  3 2021, 18:12:18
  URL: https://ctags.io/
  Optional compiled features: +wildcards, +regex, +gnulib_regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript
@masatake
Copy link
Member

It seems that your ctags is old.
Here is the output with ctags version 6.0.0 shipped as a binary package of Fedora 39.

$ ctags --version
Universal Ctags 6.0.0, Copyright (C) 2015-2022 Universal Ctags Team
Universal Ctags is derived from Exuberant Ctags.
Exuberant Ctags 5.8, Copyright (C) 1996-2009 Darren Hiebert
  Compiled: Jul 19 2023, 00:00:00
  URL: https://ctags.io/
  Output version: 0.0
  Optional compiled features: +wildcards, +regex, +iconv, +option-directory, +xpath, +json, +interactive, +sandbox, +yaml, +packcc, +optscript
$ cat ./mdncssdantags.ctags 
--langdef=mdncssdantags
--kinddef-mdncssdantags=t,topic,topics
--mline-regex-mdncssdantags=/^#\n(\w.*)$/\1/t/{mgroup=0}

$ ctags --options=NONE --options=./mdncssdantags.ctags --language-force=mdncssdantags -o - localdocu.mdncssdan
ctags: Notice: No options will be read from files or environment
animation-composition	localdocu.mdncssdan	/^#$/;"	t
animation-delay	localdocu.mdncssdan	/^#$/;"	t

If the pattern # is not what you want, use {mgroup=1} instead of {mgroup=0}.


$ sed -e 's/mgroup=0/mgroup=1/'  mdncssdantags.ctags > mdncssdantags2.ctags 
$ ctags --options=NONE --options=./mdncssdantags2.ctags --language-force=mdncssdantags -o - localdocu.mdncssdan
ctags: Notice: No options will be read from files or environment
animation-composition	localdocu.mdncssdan	/^animation-composition$/;"	t
animation-delay	localdocu.mdncssdan	/^animation-delay$/;"	t

See also https://docs.ctags.io/en/latest/man/ctags-optlib.7.html#flags-for-mline-regex-lang-option about group flag.

@freddieventura
Copy link
Author

I have updated to Universal Ctags 6.1. and used your parameters (that all identical to mines) , with my file and there was still the missalignment.
When using the simplified version of the docu (as I pasted it before ) it works (providing {mgroup=1}).
The issue is that my files are generated by a bash script that appends contents with >> , somehow adding the null terminator character in between those topics.
This file full of null-terminator characters seems to mess Universal Ctags engine and end up missbehaving in the way I showed in my first post.

I will just pre-process the files to delete those null-terminator bytes, but should you want to check on the issue (don't know if you consider it as an issue or a bug) , I attach my documentation file so you can check what I said.

(I dont know why is not letting me attach text-files through github attachments so I use google.drive)
https://drive.google.com/file/d/18adrICMqptt5EBzUFIj3k20GMATtLvnT/view?usp=sharing

Thank you

@masatake
Copy link
Member

Thank you for trying the newer version.

Oh, I see. I am not surprised if ctags cannot handle null chars in input because the internal of ctags highly depends on C strings, byte sequences terminated with '\0'.

Ideally, this should be fixed, but we do not have enough time to fix it.

@masatake
Copy link
Member

I will change the title of this issue.

@masatake masatake changed the title Multiline regex tagging matches 3 lines before the one it is actually matching Bug: null chars in input stream Jan 30, 2024
@masatake masatake transferred this issue from universal-ctags/ctags Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants