-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rule request: Use Ascii #1999
Comments
iRon7
added a commit
to iRon7/PSRules
that referenced
this issue
May 1, 2024
Thanks @iRon7 we'd love more community discussion on this issue |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary of the new feature
Remembering the days behind my TRS-80 where the first versions only had a
6
bit character set of64
characters.A few years later, I extend the character set with
32
more (mainly lower case) characters by soldering an additional chip piggybacked on the original character set chip.Nowadays, there many codepage extensions resulting in a thousands of characters. This is nice for human language support where it concerns outputs and/or comments but often causes issues with the code itself also knowing that the use of some specific non-ascii characters (as e.g. smart quotes and EM-dashes) that end up in code are even generally unintended.
UseBOMForUnicodeEncodedFile
The
UseBOMForUnicodeEncodedFile
rule is quiet useless if the author has no intention to use anything else than ASCII characters.It only mentions that there is a non-Ascii character somewhere in de code but were it resides is often a mystery.
Note that:
The current version of VSCode highlights some extended characters, but not all (as double smart quotes and diacritic characters).
ParseError
which causes the parser (and anything that relies on it as PSScriptAnalyzer) to stop process.Human language vs programming language
Were humans might not even notice a difference between certain characters and continue to understand the contents, a parser or a program might react unexpectedly. (Take the PSScriptAnalyzer with the suggested prototype as an example:
Invoke-ScriptAnalyzer -CustomRulePath .\UseASCII.psm1 -ScriptDefinition "Write-Host 'coöperate'"
, why does this work PowerShell 7 and throw anCannot convert
error with Windows PowerShell?)The argument "the whole file is checked without considering if it's actual code or not" makes some sense but the main goal of a PowerShell Script (
.ps1
) file is to run a script also knowing that there are several other ways to deal with any statements that require non-code characters (usually for output only)."co`u{00F6}perate"
(from PowerShell version 6) or"co$([char]0x00F6)perate"
(from PowerShell version 3).md
,.xml
) file or referred on theweb (
HelpUri=
).Proposed technical implementation details (optional)
This proposed rule covers rule requests:
Prototype
To capture any non-ascii character:
AST
parser might potentially break due to aParseError
caused by specific characters and PowerShell versionsTokenize
method can't be fully used as it doesn't capture specific control characters (as e.g. -smart- quotes).Meaning that to my opinion the only way to capture all potential undesired characters in a script is to scan the complete content of the script as text:
PSUseASCII
"Spot the 10 non-ascii characters:"
Analyzer results
Note that I have commented-out the
SuppressMessageAttribute
in the example PowerShell file.This is because of a known bug #1686 which causes several of the following errors to occur:
Also for this reason I would like to see a formal (disabled by default) rule for this.
What is the latest version of PSScriptAnalyzer at the point of writing:
1.22.0
The text was updated successfully, but these errors were encountered: