Nearly every example I see of how to generate an element ID from text with JavaScript removes completely valid non-latin characters: emoji, Cyrillic characters, latin characters with accent marks, Chinese characters, Kanji, and so on. Today, I wanted to share the script I use to generate valid ID strings… without the colonialism. Let's dig in! When and why you'd need thisKelp has two components that generate anchor links from a heading element's ID. The My site generator automatically adds IDs to my headings, but a lot of CMSs don't.
You can't anchor link to an element without an ID. Rather than skip the element, I wanted both components to automatically generate an ID to use if the element doesn't have one already. Random IDsMy original approach to this problem was the to generate a random ID for each element using the It generates a long, unique, random string that you can use as an ID. Because it could start with a number, and valid IDs cannot, I prefixed all of my IDs with
This works great! But as Andrzej Stacherski pointed out in a GitHub issue for Kelp, it's problematic if someone uses one of those anchor links to deep-link to a section on the site. Every time the web component loads, the ID is different, so an anchor link in the URL never works properly. Generating from the heading textA more resilient solution is to generate anchor links from the heading text itself. For example, this heading…
Would end up like this…
We can get the text content from the element using the
But we need a way to remove the spaces and any invalid characters that aren't allowed in an ID (like quotes and brackets). Removing invalid charactersI found a lot of great articles and StackOverflow answers on how to do this. All of them use the the The simplest ones strip out everything that's not
For something like our simple heading string, this works great! But what about a heading like this, Mandarin written in Chinese characters?
This is the resulting heading with ID…
Chinese characters are perfectly valid in an ID, but this regex pattern treats them as "special characters" strips them out. In addition to being obnoxiously Anglo-centric, it also has a high likelihood of creating multiple elements with the same ID. A more surgical approachTaking a step up from the "nuke everything that's not basic latin characters" approach, I also some variation of this quite a bit…
Run on our Mandarin text above, this actually spits out an empty string. Even worse! It also "normalizes" characters with accents. Take a Spanish greeting, like this…
It creates an ID like this…
That's better than what happens to Mandarin characters, but there's no need to remove the accents. They're perfectly valid characters to have in an ID. Generating decolonial IDsIf your site is only in English, maybe those other solutions are fine. But for a project like Kelp, with a stated goal of being for everyone, that kind of English-centric, colonial approach to generating IDs doesn't work. After a lot of digging around, I found something that got me in the ballpark on StackOverflow. It targets anything that's not a letter, number, dash, or underscore, or within a specific unicode range. That last part is what let's use allow emoji, Cyrillic, Mandarin, Kanji, letters with accents, and more. A valid identifier can be anything in the unicode range I modified the pattern I found on StackOverflow to allow for a wider range of unicode characters and remove double dashes…
Now, a heading like this…
Gets transformed into this…
That Mandarin heading from earlier? It looks like this…
Simple, resilient, and works for everyone. Like this? A Go Make Things membership is the best way to support my work and help me create more free content. Cheers, Want to share this with others or read it later? View it in a browser. |
0 Komentar untuk "[Go Make Things] How to generate an ID from element text"