Runes, Bytes and Graphemes in Go

In Go, strings are bytes, runes are Unicode code points, and graphemes are what users actually see. Pick the right one before slicing, counting, or reversing text.

Aug 09, 2025

I once ran into this problem of differentiating runes, bytes and graphemes while handling names in Tamil and emoji in a Go web app: a string that looked short wasn’t, and reversing it produced gibberish. The culprit wasn’t Go being flawed, it was me making assumptions about what “a character” means.

Let’s map the territory precisely:

1. Bytes. The raw material Go calls a string

Go represents strings as immutable UTF-8 byte sequences.

What we see isn’t what Go handles under the hood.

s := "வணக்கம்"
fmt.Println(len(s)) // 21

The length is 21 bytes not visible symbols. Every Tamil character can span 3 bytes. Even simple-looking emojis stretch across multiple bytes.

2. Runes. Unicode code points

string → []rune( gives you code points, but still not what a human perceives.

rs := []rune(s)
fmt.Println(len(rs)) // 7

Here it’s 7 runes, but some Tamil graphemes (like “க்”) combine two runes: க + ்.

3. Grapheme clusters the units users actually see

Go’s standard library stops at runes. To work with visible characters, you need a grapheme-aware library, like github.com/rivo/uniseg.

for gr := uniseg.NewGraphemes(s); gr.Next(); {
    fmt.Printf("%q\\n", gr.Str())
}

That outputs what a human reads “வ”, “ண”, “க்”, “க”, “ம்”, and even “❤️” as a single unit.

Why this matters

If your app deals with names, chats, or any multilingual text indexing by bytes will break things. Counting runes helps, but can still split what you intend as one unit. Grapheme-aware operations align with what users actually expect.

Real bugs I’ve seen: Tamil names chopped mid-character, emoji reactions breaking because only one code point was taken.

To put it simply

Task Approach Count code points utf8.RuneCountInString(s) Count visible units Grapheme iteration (uniseg) Reverse text Parse into graphemes, reverse slice, join Slice safely Only use s[i:j] on grapheme boundaries

Think about what you intend to manipulate: the raw bytes, the code points, or what a user actually reads on screen and choose the right level.

Ashwin Gopalsamy

Discussion about this post