Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type.TemplateLiteral() with a nested Type.String() doesn't allow newlines in its RegExp #1027

Closed
sirlancelot opened this issue Oct 9, 2024 · 4 comments

Comments

@sirlancelot
Copy link

sirlancelot commented Oct 9, 2024

I am trying to create a type which accepts a JSON array like so:

import { Type } from "@sinclair/typebox"

const CommandArgs = Type.TemplateLiteral(
  [Type.Literal("["), Type.String(), Type.Literal("]")],
  { default: "[]" }
)

This works as expected until the value it's trying to validate contains newlines \n within it:

import { Value } from "@sinclair/typebox/value"

const works = Value.Parse(CommandArgs, `["--arg1", "value1"]`)

const broken = Value.Parse(CommandArgs, `[
  "--arg1", "value1",
  "--arg2', "value2"
]`)

I tracked the issue down to the RegExp built from PatternString in src/type/patterns/patterns.ts. As it turns out, A RegExp of .* without the g flag does not include newlines \n.

From the comment on #1026, I no longer think it makes sense to change PatternString, but what are folks' thoughts on changing its use within TemplateLiteral() only to a more relaxed pattern of [\S\s]*?

@sirlancelot
Copy link
Author

My workaround currently is to write my own string pattern and then cast it to the template literal type but it looks very messy:

const CommandArgs = Type.String({
  pattern: "^\[[\S\s]*\]$",
  default: "[]"
}) as any as TTemplateLiteral<[TLiteral<"[">, TString, TLiteral<"]">]>

@sinclairzx81
Copy link
Owner

sinclairzx81 commented Oct 9, 2024

@sirlancelot Hi,

Matching Json arrays encoded in strings may be a bit too much for TemplateLiteral which is only capable of matching fixed sized strings (with substitutable union variants for parts of that string). In cases like this you might want to consider Type.RegExp which allows for much more expressive patterns (including repeated patterns that would allow the expression to match on multiple array elements)

The following are two possible implementations to match on encoded string arrays. Both are using the /m flag for multiline mode (something not supported in standard Json Schema). Note that these patterns use repeats so would be subject potential ReDOS for large inputs. We can mitigate this by specifying the maxLength constraint to 128 characters on each (you can adjust as necessary)

import { Value } from '@sinclair/typebox/value'
import { Type } from '@sinclair/typebox'

// string, number and boolean elements e.g. [1, true, 'hello']
const ValueArrayString = Type.Unsafe<`[${string}]`>(Type.RegExp(/^\[\s*(\d+(\.\d+)?|true|false|'[^']*'|"[^"]*")(\s*,\s*(\d+(\.\d+)?|true|false|'[^']*'|"[^"]*"))*\s*\]$/m, {
  maxLength: 128
}))

// number elements only e.g. [1, 2, 3]
const NumberArrayString = Type.Unsafe<`[${string}]`>(Type.RegExp(/^\[\s*(\d+(\.\d+)?(\s*,\s*\d+(\.\d+)?)*\s*)?\]$/m, {
  maxLength: 128
}))

const R1 = Value.Check(ValueArrayString, `[
  1,
  true,
  'hello'
]`)

console.log(R1)

const R2 = Value.Check(NumberArrayString, `[
  1,
  2,
  3.14
]`)

console.log(R2)

The above is also using Unsafe. The Unsafe type is optional, but is being used to provide the RegExp a more specific inference type of [${string}] rather than the default string.

Does this help?
S

@sirlancelot
Copy link
Author

Oh that is pretty nice! I'll admit though, I don't actually want to completely validate JSON using Typebox since using JSON.parse() will actually provide better error messages. My goal with the [${string}] TypeScript literal was to provide a fast path for failures as well as a hint to readers of what value is expected.

The reason for my making an issue is actually to point out a discrepancy between Typebox and TypeScript. While TypeScript would allow me to assign values with newlines to the string part of a template literal, Typebox throws an error.

@sinclairzx81
Copy link
Owner

@sirlancelot Hiya,

Thanks for the follow up. Currently the TemplateLiteral implementation is constrained to only a finite subset of regular expressions, and doesn't handle all the cases TypeScript can (tho you may get further with combining Union / Literal values interior to the TemplateLiteral (where newlines are treated the same as literal strings)

const A = Type.TemplateLiteral([Type.Literal('A'), Type.Literal('\n'), Type.Literal('B')])

const R1 = Value.Parse(A, 'A\nB')

const R2 = Value.Parse(A, `A
B`)

console.log(R1) // ok
console.log(R2) // ok

There is work being done to enhance TemplateLiteral to support more advanced string parsing patterns (which would include supporting multiline patterns), but for now things are limited to what they are to service as a basis for future functionality. For now, would recommend either the Regex or String + pattern for this particular usecase until TemplateLiteral gets revised to support additional regex generation.

Will close off this issue for now, but happy to field any additional questions on this thread.
Cheers!
S

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants