[Go Make Things] How to generate an ID from element text

Nearly every example I see of how to generate an element ID from text with JavaScript removes completely valid non-latin characters: emoji, Cyrillic characters, latin characters with accent marks, Chinese characters, Kanji, and so on.

Today, I wanted to share the script I use to generate valid ID strings… without the colonialism. Let's dig in!

When and why you'd need this

Kelp has two components that generate anchor links from a heading element's ID.

The <kelp-toc> component creates a table of contents from the headings on a page, and the <kelp-heading-anchors> component component adds anchor links directly to headings (like the ones on this article if you're reading it on my website).

My site generator automatically adds IDs to my headings, but a lot of CMSs don't.

<h2>Hey, universe!</h2>

You can't anchor link to an element without an ID.

Rather than skip the element, I wanted both components to automatically generate an ID to use if the element doesn't have one already.

Random IDs

My original approach to this problem was the to generate a random ID for each element using the crypto.randomUUID() method.

It generates a long, unique, random string that you can use as an ID.

Because it could start with a number, and valid IDs cannot, I prefixed all of my IDs with h_.

// Add an ID to an element if it doesn't have one already  function createID (elem) {  	if (elem.id) return;  	elem.id = `h_${crypto.randomUUID}`;  }  

This works great!

But as Andrzej Stacherski pointed out in a GitHub issue for Kelp, it's problematic if someone uses one of those anchor links to deep-link to a section on the site.

Every time the web component loads, the ID is different, so an anchor link in the URL never works properly.

Generating from the heading text

A more resilient solution is to generate anchor links from the heading text itself.

For example, this heading…

<h2>Hey, universe!</h2>

Would end up like this…

<h2 id="hey-universe">Hey, universe!</h2>

We can get the text content from the element using the Element.textContent property.

// Add an ID to an element if it doesn't have one already  function createID (elem) {  	if (elem.id) return;  	elem.id = `h_${elem.textContent}`;  }  

But we need a way to remove the spaces and any invalid characters that aren't allowed in an ID (like quotes and brackets).

Removing invalid characters

I found a lot of great articles and StackOverflow answers on how to do this.

All of them use the the String.replace() method with a regex pattern to remove characters that shouldn't be there and replace them with dashes (-) or underscores (_).

The simplest ones strip out everything that's not A-Z (case-insensitive) or 0-9.

// Add an ID to an element if it doesn't have one already  function createID (elem) {  	if (elem.id) return;  	elem.id = `h_${elem.textContent.replace(/\W/g, '-')}`;  }  

For something like our simple heading string, this works great!

But what about a heading like this, Mandarin written in Chinese characters?

<h2>我要一杯咖啡</h2>

This is the resulting heading with ID…

<h2 id="h_------">我要一杯咖啡</h2>

Chinese characters are perfectly valid in an ID, but this regex pattern treats them as "special characters" strips them out.

In addition to being obnoxiously Anglo-centric, it also has a high likelihood of creating multiple elements with the same ID.

A more surgical approach

Taking a step up from the "nuke everything that's not basic latin characters" approach, I also some variation of this quite a bit…

// Add an ID to an element if it doesn't have one already  // @link https://byby.dev/js-slugify-string  function createID (elem) {  	if (elem.id) return;  	const id = elem.textContent  			.normalize('NFKD') // split accented characters into their base characters and diacritical marks  			.replace(/[\u0300-\u036f]/g, '') // remove all the accents, which happen to be all in the \u03xx UNICODE block.  			.trim() // trim leading or trailing whitespace  			.toLowerCase() // convert to lowercase  			.replace(/[^a-z0-9 -]/g, '') // remove non-alphanumeric characters  			.replace(/\s+/g, '-') // replace spaces with hyphens  			.replace(/-+/g, '-'); // remove consecutive hyphens  	elem.id = `h_${id}`;  }  

Run on our Mandarin text above, this actually spits out an empty string. Even worse!

It also "normalizes" characters with accents. Take a Spanish greeting, like this…

<h2>Hola señor!</h2>

It creates an ID like this…

<h2 id="h_hola-senor">Hola señor!</h2>

That's better than what happens to Mandarin characters, but there's no need to remove the accents. They're perfectly valid characters to have in an ID.

Generating decolonial IDs

If your site is only in English, maybe those other solutions are fine.

But for a project like Kelp, with a stated goal of being for everyone, that kind of English-centric, colonial approach to generating IDs doesn't work.

After a lot of digging around, I found something that got me in the ballpark on StackOverflow.

It targets anything that's not a letter, number, dash, or underscore, or within a specific unicode range. That last part is what let's use allow emoji, Cyrillic, Mandarin, Kanji, letters with accents, and more.

A valid identifier can be anything in the unicode range U+00A0 and higher.

I modified the pattern I found on StackOverflow to allow for a wider range of unicode characters and remove double dashes…

// Add an ID to an element if it doesn't have one already  // Adapted from https://stackoverflow.com/a/25698970  function createID (elem) {  	if (elem.id) return;  	elem.id = `h_${elem.textContent.replace(/[^a-zA-Z0-9-_\u00A0-\uFFEF\s-]/g, '-').replace(/[\s-]+/g, '-')}`;  }  

Now, a heading like this…

<h2>123 #^&%@<code>.text-small</code> 是不 Sábado 😀🎉</h2>

Gets transformed into this…

<h2 id="h_123-text-small-是不-Sábado-😀🎉">123 #^&%@<code>.text-small</code> 是不 Sábado 😀🎉</h2>

That Mandarin heading from earlier? It looks like this…

<h2 id="h_我要一杯咖啡">我要一杯咖啡</h2>

Simple, resilient, and works for everyone.

Like this? A Go Make Things membership is the best way to support my work and help me create more free content.

Cheers,
Chris

Want to share this with others or read it later? View it in a browser.

Share :

Facebook Twitter Google+ Lintasme

Related Post:

0 Komentar untuk "[Go Make Things] How to generate an ID from element text"

Back To Top