Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tailwind Regex #10

Open
gino opened this issue May 27, 2020 · 33 comments
Open

Tailwind Regex #10

gino opened this issue May 27, 2020 · 33 comments
Labels
question Further information is requested

Comments

@gino
Copy link

gino commented May 27, 2020

Hello,

What regex should I use if I am using TailwindCSS?
I saw an issue with a Tailwind regex but this used the tw- prefix. I tried to remove this from the regex and it stopped working unfortunately. Since I am not using that prefix. I am just using basic Tailwind utilities without any configuration so far.

Thanks.

@sndyuk
Copy link
Owner

sndyuk commented May 29, 2020

@gino A prefix for css classes is required otherwise the plugin won't work. The prefix must be unique in the all your source code since the plugin replace a string in the code using regex.

FYI: https://tailwindcss.com/docs/configuration/#prefix

@gino gino closed this as completed May 29, 2020
@gino
Copy link
Author

gino commented Mar 21, 2021

Hey, I am going to use this for my project again and now will use the prefix as well but what is the latest and recommended regex that I should use for TailwindCSS? Since there are a lot of selectors and pseudo selectors..

@gino gino reopened this Mar 21, 2021
@sndyuk
Copy link
Owner

sndyuk commented Mar 21, 2021

Does anyone watching the repo have the knowledge?

Anyway, I'll close the issue in a few weeks since it doesn't the issue of the plugin.

@sndyuk sndyuk added the question Further information is requested label Mar 21, 2021
@gino
Copy link
Author

gino commented Mar 26, 2021

@sndyuk I am experimenting with a couple of different regex patterns but I was wondering, is it possible to have longer generated classnames instead of just having a b c and instead have actual obfuscation going on, like 1wbh5a2, like Twitter has for example.

Edit: for example, provide some sort of a pattern like localIdentName does: https://github.com/webpack-contrib/css-loader#localidentname

@sndyuk
Copy link
Owner

sndyuk commented Mar 28, 2021

@gino How about adding the option classGenerator:

module.exports = {
  ...
  plugins: [
    new MangleCssClassPlugin({
      ...,
      // original: original class name
      // opts: options of the plugin
      // context: own context of the class generator(initial value is just an empty object)
      classGenerator: (original, opts, context) => {
        // return custom generated class name.
        // Or return undefined if you want to leave it to the original behavior.
      }
    }),
  ],
};

@gino
Copy link
Author

gino commented Mar 28, 2021

@sndyuk That’s a really nice approach! Could you also perhaps provide an example how to achieve random strings. But so far this looks very promising! 🤩

@sndyuk
Copy link
Owner

sndyuk commented Mar 29, 2021

@gino I added the new option at version 4.0.12. Please check the example bellow in the test case. It replaces class names starts with c-* into c{auto generated number}

plugins: [new MangleCssClassPlugin({
classNameRegExp: defaultCssClassRegExp,
log: true,
classGenerator: (original, opts, context) => {
if (!context.id) {
context.id = 1;
}
if (original.startsWith('c-')) {
const className = `c${context.id}`;
context.id++;
return className;
}
}

@gino
Copy link
Author

gino commented Mar 29, 2021

@sndyuk Looks really good! I am currently trying it in a Tailwind project and also using the same scenario as you, with an auto generated number, but without the c-:

classGenerator: (original, opts, context) => {
  if (!context.id) {
    context.id = 1;
  }

  const className = `${context.id}`;
  context.id++;

  return className;
},

It does output all the numbers (log) and all the classes in my HTML are having those auto-generated numbers, but all the classes have 0 styles.. even though they all should have a background color. So I am not sure if there is a bug or if I am doing something wrong. I was just trying to get the auto-generated numbers working with all classes. Even though I would love to have just random strings instead of numbers, like how localIdentName handles their obfuscation. I might be able to achieve something like that with some base64 encode stuff, since that's also how they provide it in their pattern (localIdentName: '[path][name]__[local]--[hash:base64:5]', with this base64 hash stuff).

So yeah, looks very good though! But I might be implementing it the wrong way..
If you have a better idea how to achieve strings like "1wbh5a2" for example, let me know!

@sndyuk
Copy link
Owner

sndyuk commented Mar 31, 2021

@gino

but all the classes have 0 styles..

I think class name starts with number is invalid.

If you have a better idea how to achieve strings like "1wbh5a2" for example, let me know!

You can use a random 7 chars. For example https://stackoverflow.com/questions/1349404/generate-random-string-characters-in-javascript

@yuriti
Copy link

yuriti commented Apr 5, 2021

Please tell me how to use [\\\\]* ?

I use tailwind with jit and I need a similar expression
(([a-z-:]*)[\\\\]*:)*tw-[a-z_-](([.\[\]\%a-zA-Z0-9_-]*)[\\\\]*)*

@yuriti
Copy link

yuriti commented Apr 5, 2021

@gino

try a more elegant solution https://medium.com/@my_own_grave/recting-css-generated-by-vue-loader-by-using-classnames-shorten-trick-aa1d25d77473

all the magic lies in the createUniqueIdGenerator method

@gino
Copy link
Author

gino commented Apr 5, 2021

@mr-httdd I am not quite sure if I am using a great regex either and I am not quite sure how to use the approach from that Medium article since I am not using Vue.. I am using Next.js, not sure if I can also use that method in my environment?

@yuriti
Copy link

yuriti commented Apr 5, 2021

@gino the development environment does not matter here, everything will be done for you by the incstr library

haven't tested, but in your case it will be something like this:

import incstr from 'incstr';

const classNames = {};

const generateClassName = incstr.idGenerator({
        alphabet: 'abcdefghijklmnopqrstuvwxyz0123456789_-'
});

classGenerator: (original, opts, context) => {
        if (classNames[original]) {
            return classNames[original];
        }

        let nextId;

        do {
            // Class name cannot start with a number.
            nextId = generateClassName();
        } while (/^[0-9_-]/.test(nextId));

        classNames[original] = nextId;

        return classNames[original];
},

@gino
Copy link
Author

gino commented Apr 5, 2021

@mr-httdd Ohh I see! I will test this soon, thank you for your example though. Please let me know if you found a good regex pattern that I can use for Tailwind. I still don't really like that I have to use a prefix in order to make this all work but it's understandable.

@yuriti
Copy link

yuriti commented Apr 6, 2021

@gino It turned out to make the right interaction with the tailwind + jit (by the way, note that the tailwind has recently merged the jit repository).

import incstr from 'incstr';

const classNames = {};

const generateClassName = incstr.idGenerator({
  alphabet: 'abcdefghijklmnopqrstuvwxyz'
});

new MangleCssClassPlugin({
    classNameRegExp: '(([a-z-:]*)[\\\\\\\\]*:)*tw-[a-z_-]([\\[\\]\\%a-z0-9-]*([\\\\\\\\]*(\\.|\\[|\\]))*)*',
    classGenerator: (original, opts, context) => {
        if (classNames[original]) {
            return classNames[original];
        }

        let nextId;

        do {
            // Class name cannot start with a number.
            nextId = generateClassName();
        } while (/^[0-9_-]/.test(nextId));

        return classNames[original] = nextId;
    },
})

unfortunately, without the prefix, this is not quite the right approach in this package, it goes through the whole file, and this may give the wrong result.

@sndyuk ran into strange behavior in the classGenerator option, the original argument I get the same thing multiple times. This also affected the wrong generation of classes in my project, but I still managed to succeed by re-caching the result class.

@surjithctly
Copy link

I was having similar issue with Next.js as well. It doesn't work without a prefix. It throws error. Log shows it replaced javascript object as well.

I was using the following config:

module.exports = {
  webpack: (config) => {
    const MangleCssClassPlugin = require("mangle-css-class-webpack-plugin");

    config.plugins.push(
      new MangleCssClassPlugin({
        classNameRegExp:
          "(([a-z-:]*)[\\\\\\\\]*:)*[a-z_-]([\\[\\]\\%a-z0-9-]*([\\\\\\\\]*(\\.|\\[|\\]))*)*",
        ignorePrefixRegExp:
          "((hover|focus|active|disabled|visited|first|last|odd|even|group-hover|focus-within|xs|sm|md||lg|xl)(\\\\\\\\\\\\\\\\|\\\\)?:)*",
        log: true,
      })
    );
    return config;
  },
};

@sndyuk
Copy link
Owner

sndyuk commented Apr 18, 2021

@mr-httdd

the original argument I get the same thing multiple times

I couldn't figured out how it happens since that case should re-use the generated class name here:
https://github.com/sndyuk/mangle-css-class-webpack-plugin/blob/master/lib/classGenerator.js#L40-L41
Do you have any idea?

@sndyuk
Copy link
Owner

sndyuk commented Apr 18, 2021

@surjithctly

It doesn't work without a prefix

Please refer the note here: https://github.com/sndyuk/mangle-css-class-webpack-plugin#usage

This will replace class name matched regex in HTML, JavaScript, CSS files. Identify the class names not to match unexpected words since it replaces all words that are matched with the classNameRegExp. I suggest that your class names have specific prefix or suffix that identified as a class name.

@yuriti
Copy link

yuriti commented Apr 19, 2021

@sndyuk according to the logic of the code, everything should be correct, there are suspicions only in a synchronous loop, but in my case, when passing a function to the classGenerator method, everything works well, which should not be in such a scenario.

there is a thought that incstr somehow makes you wait, I have not tested their code, but this is possible, try using it as a basis

@polarathene
Copy link

polarathene commented Apr 27, 2021

If you have a better idea how to achieve strings like "1wbh5a2" for example, let me know!

@gino If you want deterministic output (the input value always generates the same output "randomness"), then you want to use a hash function. Don't use djb2 (or djb2a variant), these are commonly cited, even styled-components uses it, but has higher risk of collision (multiple output share same computed value, which is usually undesirable).

A simple hash function that is easy to implement and less likely to cause collision issues is FNV1a-32:

const fnvOffset = 2166136261
function fnv1a32(str) {
  // hash value to operate on
  let h = fnvOffset
  for (let i = 0; i < str.length; i++) {
    h ^= str.charCodeAt(i)
    // JS number type is inaccurate at calculating 'h *= fnvPrime',
    // Uses bit-shifts to accurately multiply the prime: '16777619'
    h += (h << 1) + (h << 4) + (h << 7) + (h << 8) + (h << 24)
  }

  // Cast to 32-bit uint
  return h >>> 0
}

// Converts string input to a 32-bit base36 string (0-9, a-z), highest radix `Number.toString()` supports
// Prevent invalid CSS class names in selectors starting with a digit by prefixing with `_`
const getShortKey = (input) => {
  fnv1a32(input)
    .toString(36)
    .replace(/^[0-9]/, `_$&`)
}

Just call getShortKey('some_class_name') and you'll get an output like _1cp30bg returned, the _ will only be prefixed if the string starts with a digit so you don't have to worry about escaping it for CSS selector usage.


@sndyuk Even without the hash function part, looking at the current default generator code, I think it would simplify the logic a fair bit? (but not be optimal for minification results):

// defaultClassGenerator no longer needed
// ...

if (!newClassName) {
  newClassName = this.newClassSize.toString(36).replace(/^[0-9]/, `_$&`);
}

Your current implementation does use - and _ for characters after the first; that provides a slightly better compression range (where I imagine many users are in the 2 char length at under 1k classes), but more importantly yields better results due to not using numbers for the first char (<=27) and having numbers handled at the end of the encoding charset instead of the start like toString(36) does.

I am curious why A-Z isn't included in the custom charset?

EDIT: It seems the _ prefix for digits can be avoided by using special symbols within the 0xA0 - 0xFF char code range according to the CSS spec:

const subs = `§£¥¢þ°Æ汬`
// ...
// If the string starts with a digit, parse it and use that as an index to a string or array for a substitute replacement:
this.newClassSize.toString(36).replace(/^[0-9]/, x => subs[parseInt(x, 10)])

I was wondering, is it possible to have longer generated classnames instead of just having a b c and instead have actual obfuscation going on, like 1wbh5a2, like Twitter has for example.

@gino Is there a specific reason you wanted that btw vs shorter names?

They're likely doing the same as what I've shown above. I was needing to generate classnames for a PR to use that ideally avoids conflicting against user or third-party classnames, but were still short. The code didn't have context of classes outside of it's own scope and they needed to generate the same value for both server-side and client-side.

For those that might have trouble making sense of the code above, it will turn any string length input into a 32-bit number (about 4 billion values), and convert that into base36 via native methods that are much more efficient than the native base64 functions (differs between nodeJS and client/browser JS).

This will return a string output anywhere from 1-8 characters long: log2(36^6) = ~31 shows that a length of 6 values in base36 will cover 31-bits, so if the number is a little bigger it will be up to 7 characters at the longest (8 if for those that start with a digit due to prefix with _).


If speed isn't a concern, and you'd rather slightly better compression then here's my base64 encoding variant of getShortKey():

Click for details
// Base64 encoding browser + node.js - Expensive to compute this way

// SSR and Client base64 encode methods, expects Uint8Array input
// Used for generating short class names from 32-bit values (hash)
const nodeBtoa = b => Buffer.from(b, `binary`).toString(`base64`)
const clientBtoa = b => btoa(String.fromCharCode(...b))
const base64encode = typeof btoa !== `undefined` ? clientBtoa : nodeBtoa

// NOTE: This method might provide a shorter string, but performance is not great,
// due to 32-bit values needing additional conversion/allocations.
function numToBase64(h) {
  // 32-bit number split into 4 separate bytes for encoding into base64.  32-bits is always <=6 base64 characters long.
  const b = [h >> 24, h >> 16, h >> 8, h]
  // Remove leading empty bytes, otherwise they're also encoded. `Uint8Array` type required for node and clamps the final 'h' byte.
  const bytes = Uint8Array.from( b.slice(b.findIndex(x => x > 0)) )

  // Replace '+' and '/' values (from base64 charset) which aren't ideal for CSS identifiers (eg class names in selectors).
  // `æ` should be a safe value, it's `0xE6` which is within a single byte but over `0xA0` which meets the spec: https://www.w3.org/TR/CSS21/syndata.html#value-def-identifier
  // '=' chars (base64 padding) are stripped off.
  // '-' & digits are not allowed as the first char of CSS identifiers, prefix with '_'.  
  return (
    base64encode(bytes)
      .replace(/[+/]/g, x => (x === `+` ? `æ` : `-`))
      .replace(/=/g, '')
      .replace(/^-?\d|--/, `_$&`)
  )
}

The final regex there is a bit more complicated to meet the CSS identifiers spec:

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_);
they cannot start with a digit, two hyphens, or a hyphen followed by a digit.

So it's covering the extra cases for -, one could instead just opt for taking another character from the ISO 10646 (Universal Coded-Character Set). You'd want to choose one that is ideally less than 1 byte (0xFF) and above 0xA0 as per the spec requirements to avoid needing to escape it.

That pretty much leaves you with Latin-1 Supplement, any prior to 0xA0 in that list are control characters, the rest are all valid, here's a bunch of them each separated by a space § © ® ¬ ± ¶ £ ¥ ¢ ø × ÷ ª ̄ ¿ ¡ μ þ « » Æ æ, etc are fine.

You should be able to avoid the need for a prefix with numbers and - by substituting all of them with these special characters:

const subs = `§£¥¢þ°Æ汬`
base64encode(bytes)
  .replace(/^[0-9]/, x => subs[parseInt(x, 10)])
  .replace(/[+/]/g, x => (x === `+` ? `¶` : `Ø`))
  .replace(/=/g, '')

I don't think the extra logic for base64 is worth it.

log2(64^5) = 30 means 25% (30-bit) of the 32-bit range fits into a string of 5 characters (+1 if we need to prefix) which is good, but the simpler base36 still covers 50% (31-bit) with just 1 more extra character. For these 32-bit hashes being generated and encoding the number into a string, it's often going to be in the 6 character range regardless, but you can try squeeze out 1 less character if the tradeoff doesn't pose any real issue.

If you weren't using a hash function and just incrementing, base36 covers 60 million for 5 characters, or up to 1296 values with only 2 characters (ignoring +1 char from prefix for about 30% outputs that begin with digits). ~~If you have more than the ~1300 values to account for, base64 at 2 characters will let you get to 4096 without increasing to 3 characters, otherwise no real benefit.~~

EDIT: It seems I misunderstood base64 a little. Each character (uses 1 byte aka 8 bits) represents a total of 6-bits of input; despite that output characters represent a minimum of 8 bits of input and thus 2 characters minimum. When the 2nd byte is reached (number >255), 16 bits are encoded increasing to 3 characters until that new range is exceeded (65535 + 1) expanding to 4 characters covering 16 to <24-bits. If the number uses the final 8-bits, it goes from 4 characters directly to 6 😞

For the purpose described here it would always be 2-3 characters long for most users (when ignoring any prefix). Likewise for the hash function, the transition from 3 bytes to 4 bytes (32-bit) of input increases characters in the output string from 4 to 6. For base64 method, 4 characters long or less is only 0.4% of the number of values in 32 bits, while the base36 method at 5 characters long or less covers a mere 1.4%.

Thus expect mostly 6 characters and with base36 50% of possible values will be 7.


If emoji comes to mind to anyone, don't do that. It's not as good as it might seem, and despite the visual length reduction, emoji often use 3-4 bytes or more (some are over 20 bytes for a single glyph) so it tends to be much larger under the hood file size wise.

@IRediTOTO
Copy link

IRediTOTO commented May 11, 2021

hi @yuriti i am using your code, it work so well. Thank you :)
But is this an error ?
image
the origin class is:

<h2 className="hover:tw-text-[#1da1f1] tw-inline tw-text-3xl tw-font-extrabold tw-tracking-tight tw-text-gray-900 sm:tw-block sm:tw-text-4xl">
              Want product news and updates?
    </h2>

image

the hover still work but just because the class was created be wrong i think

@yuriti
Copy link

yuriti commented May 19, 2021

@IRediTOTO use new regex

(([a-zA-Z-:]*)[\\\\\\\\]*:)*([\\\\\\\\]*!)?tw-[a-zA-Z-]([a-zA-Z0-9-]*([\\\\\\\\]*(\\%|\\#|\\.|\\[|\\]))*)*

@DamianGlowala
Copy link

Out of curiosity, has anyone of you guys managed to combine this webpack plugin with Nuxt.js project (running in SSR mode)? I can't get it to work properly on both ends (SSR rendered + after hydration). Either of these was malformed etc. If there is someone who managed to make it work, I'd be grateful for any advice :)

@CRYBOII
Copy link

CRYBOII commented Sep 5, 2021

Anyone had problem that CSS won't load when access first time?
Screen Shot 2564-09-05 at 21 14 45

@IRediTOTO
Copy link

@yuriti do you new regex for Tw 3.0 ? They have so many new class. I think old regex can't handle all of them

@Intaria
Copy link

Intaria commented May 22, 2022

This regex stop working if you apply .css with custom merged classes, like:

<div className="tw-message tw-message-positive"></div>

.tw-message.tw-message-positive { @apply tw-bg-green; }

@VSKut
Copy link

VSKut commented Jul 10, 2022

Hey @sndyuk! To begin with, thank you very much for this package!

I use Vue.js + tailwind + webpack
Build command: vue-cli-service build
Dev command: vue-cli-service --env.NODE_ENV=development build-watch --mode development

Webpack config:

new MangleCssClassPlugin({
  classNameRegExp: '((hover|focus|xs|md|sm|lg|xl|dark|before|disabled|group-disabled|group-focus|group-active|group-hover|after|active|checked|slider\-thumb)[\\\\]*:)*tw-[a-z_-][a-zA-Z0-9_-]*',
  ignorePrefixRegExp: '((hover|focus|xs|md|sm|lg|xl|dark|before|disabled|group-disabled|group-focus|group-active|group-hover|after|active|checked)[\\\\]*:)*',
  log: true,
}),

And I have a problem with Animations

When I use DEV - everything is perfect:
.tw-animate-wiggle > .cb, .tw-wiggle > .ue

.cb {
    -webkit-animation: ue 1s ease-in-out infinite;
    animation: ue 1s ease-in-out infinite;
}

But when I use PROD - I have a problem. The styles do not change the name to keyframes :(
.tw-animate-wiggle > .y, but .tw-wiggle > .tw-wiggle still the same

y {
    -webkit-animation: tw-wiggle 1s ease-in-out infinite;
    animation: tw-wiggle 1s ease-in-out infinite;
}

But in logs i see this:

Minify class name from tw-wiggle to x
Minify class name from tw-animate-wiggle to y

Can you help me with this?

I also have a suggestion - to make a whitelist for classes that will not be renamed.

@gggglglglg
Copy link

gggglglglg commented Sep 26, 2022

What about with -tw-mx-4 (with minus)
?

How do I include these classes in regex as well?

UPDATE:

classNameRegExp: '((hover|focus|active|disabled|visited|first|last|odd|even|group-hover|focus-within|xs|sm|md|lg|xl)[\\\\]*:)*(|-)tw-[a-zA-Z0-9_-]*([\\\\]*\/[0-9]*)?',
ignorePrefixRegExp: '((hover|focus|active|disabled|visited|first|last|odd|even|group-hover|focus-within|xs|sm|md||lg|xl)[\\\\]*:)*',

@sajadevo
Copy link

I've experienced the same issue while working with tailwindcss and next.js 13 and here is my workaround, you just need to make your next.config.js file look like this:

/** @type {import('next').NextConfig} */
const nextConfig = {
  swcMinify: true,
  reactStrictMode: true,
  webpack: (config, { dev }) => {
    if (!dev) {
      const MangleCssClassPlugin = require("mangle-css-class-webpack-plugin");

      config.plugins.push(
        new MangleCssClassPlugin({
          classNameRegExp:
            "((hover|focus|active|disabled|visited|first|last|odd|even|group-hover|focus-within|xs|sm|md|lg|xl)[\\\\]*:)*(|-)tw-[a-zA-Z0-9_-]*([\\\\]*/[0-9]*)?",
          ignorePrefixRegExp:
            "((hover|focus|active|disabled|visited|first|last|odd|even|group-hover|focus-within|xs|sm|md||lg|xl)[\\\\]*:)*",
        })
      );
    }

    return config;
  },
};

module.exports = nextConfig;

If you want to have the mangle on the development mode as well you can remove the !div statement.

@ossipov
Copy link

ossipov commented Mar 5, 2023

Mighty RegEx heroes, please help detect CSS attribute selectors.

With the rules from this thread I get:

Minify class name from tw-select to ce
Minify class name from tw-select[multiple] to fe
Minify class name from tw-select[disabled] to gb

Is it possible to have:

Minify class name from tw-select to ce
Minify class name from tw-select[multiple] to ce[multiple]
Minify class name from tw-select[disabled] to ce[disabled]

@sndyuk
Copy link
Owner

sndyuk commented Mar 11, 2023

@ossipov I just tried the regex against your sample class names. This should work. Could you try it?

@ossipov
Copy link

ossipov commented Mar 12, 2023

@sndyuk Thank you 🙏 Indeed I was using outdated regex.

@decoderid
Copy link

decoderid commented May 11, 2023

SOLUTION

add classGenerator to webpack, it's because double generation in nextjs #46 (comment)

{
  classGenerator: original => btoa(original).replace(/=/g, ''),
}

INTRO

can someone help? weird output

ERROR

error

Expect

nice

next.config.js

    webpack: (config, { dev }) => {
        const MangleCssClassPlugin = require('mangle-css-class-webpack-plugin')

        if (!dev) {
            config.plugins.push(
                new MangleCssClassPlugin({
                    classNameRegExp:
                    "((hover|focus|active|disabled|visited|first|last|odd|even|group-hover|focus-within|xs|sm|md|lg|xl)[\\\\]*:)*(|-)tw-[a-zA-Z0-9_-]*([\\\\]*/[0-9]*)?",
                    ignorePrefixRegExp:
                    "((hover|focus|active|disabled|visited|first|last|odd|even|group-hover|focus-within|xs|sm|md||lg|xl)[\\\\]*:)*",
                })
            );
        }


        return config
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests