256 Kilobytes

How Does Zalgo Text Work? | A Guide to Using and Misusing Unicode Diacritical Marks

Articles in Computer Software | By Louis J. V. Cicalese

Published 8 months agoThu, 28 Feb 2019 17:06:38 -0800 | Last update 7 months agoFri, 22 Mar 2019 17:36:14 -0700

Beware the end times, for Zalgo is upon us, Arch.

1,477 view, 0 RAMs, and 0 comments

What is Zalgo Text?

You have probably experience Zalgo text before, though you may not know it by name. For those unfamiliar, Zalgo text is vertically aligned and overlaid text that somewhat resembles a crazy person’s creepy wall-writing.

Seeing as Unicode characters are usually arranged horizontally and do not overlap, it may not make intuitive sense how Zalgo text can exist or how it is made. Let’s take a look at how Unicode functions and how this allows for the creation of Zalgo text.

What is Unicode?

Unicode Precursors

Before Unicode, a system called ASCII was used to represent text on computers. As a 7-bit character set, ASCII was capable of storing 128 characters. This was a fine amount for English speakers, but wholly insufficient when it came to other alphabets.

Below is an ASCII character chart. Please note that “control characters” (aka non-printing characters) are listed in the “0” and “1” columns. Control characters represent special effects rather than written/printable symbols.   

 

Shortly thereafter, it became common to use 8-bit bytes to store characters in memory. This allowed for 256 total, opening up the use of codes 128 through 255 for new characters (since 0 was also included).

These new codes acted as additions to ASCII, meaning that the original 128 characters were left unchanged. The additional characters were adequate for other Latin-based languages, as it allowed for the inclusion of accented characters and whatnot. Here is the full 8-bit ASCII character chart, with new symbols starting at row 128.

However, languages that used other alphabets, such as Russian, Japanese, Hebrew, and so on, were still out of luck, and had to develop their own codes.

Finally, Unicode began to be developed in 1987. Unicode would become a worldwide standard which allowed for a code to be assigned to virtually every character in every alphabet.

How Unicode Works

Unicode, specifically UTF-8 which is predominantly used on the Internet, uses one byte (8 bits) for the first 128 code points (which are identical to the 8-bit ASCII characters). It uses up to 4 bytes for other characters, which allows for over 4 billion separate codes.

The first 128 characters (US-ASCII) need one byte.

The next 1,920 characters need two bytes to encode, which covers the remainder of almost all Latin-script alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N'Ko alphabets, as well as Combining Diacritical Marks.

Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use[13] including most Chinese, Japanese and Korean characters.

Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).

Source: Wikipedia

Unicode also includes diacritical marks, which are glyphs such as accents, tildes, and umlauts (to name a few) that can be rendered above, below, or even inside base characters. The most common modifier symbols are stored in a Unicode block called Combining Diacritical Marks. As you can see from viewing the chart, these diacritical marks are all stored as their own unique characters.

Applying Diacritical Marks to Unicode Characters

A unique aspect of Unicode is that diacritical marks can be used to modify other Unicode characters. These modifiers are often known as combining characters in the context of digital typography. This function allows diacritics to be applied to any other Unicode symbol, even multiple times over if you so wish.

The method for applying diacritics varies depending on what application you are using. For example, Google Docs uses the shortcut Ctrl + Shift + U to directly apply codes.

Let’s say that you want to type an ordinary Unicode character (not using a combining character) You’re in Google Docs and want to type a capital O with an accent over it. There are a couple of other ways to do that, of course, but using Unicode you would first type in Ctrl + Shift + U, which will cause a little underline “u” to appear. This indicates that can type in the Unicode number for that symbol, which in this case “00D3.”

You have no way of knowing if that’s how I did this or not, but voila: Ó

But instead let’s pretend you want to create an “M” with a tilde over it in Google Docs. I don’t know why, but you want to do it. This will require a combining character since that isn’t a standard symbol.

First, type an “M,” then use the shortcut Ctrl + Shift + U followed by the appropriate Unicode number (in this case, 0303):

As will be mentioned below, diacritical modifiers (and Zalgo text, by extension) may not render perfectly in all applications. That was true here as well, which is why the "M with a tilde" is uploaded as a PNG. 

Note: All Unicode numbers can be found through this page. Combiner characters can be more easily be found here. Numbers are formatted as “U+0303” but just ignore the “U+” while typing in the number.

Note 2: Apparently Microsoft Word requires that you type the Unicode number first, and then hit Alt + X to make it take effect. Give it a shot, Wordians!  

For other applications, you’ll have to figure out the exact process yourself.

How Does Zalgo Text Work?

Essentially, seemingly corrupted Zalgo text is made possible by the inclusion of combining characters in Unicode. By using multiple diacritics to modify a single character, the text will extend vertically and overlap with other lines of text.

As you can see, that “M” now has a bunch of tildes over it. While you can make Zalgo text by applying combiner characters manually, there is a much faster way if your only goal is to fuck up the words.

N͡o̥̠̗̘͎̣͍ẁ ͏̗̤̺͕͎̠̺y͎͈̻͉̘̲ơ̱u̠̪͜ͅ ͖h͚̲̠͉̰͇͓a͖v̳̱͕̹̬͔e̩͖̗̲̣̜͢ ̻̳̙̣̭

̛̗̞t̬̹͉̤̺̮͓h̪̼̭̳͎͇e̼̬͉̱̜̫͡ ̛̲͈̘̳͎̘̝ạb̙̰͟i̭̯̞͉̫l̝̝̦͞i̹͚͉t̺͇̖͉̭̜͍̀y̯̥̥͙̜̦͝ ̼

̴to͈͝ ̮͙̙ͅm͈a̷̜͔k̘̲̠͖̮͇̪e̦̪̬ͅ ̨͉̥y͚̭o̘̪͍̹͘u̧͖r̢̳̱̣̱̞ ͕̗̟͉̳̤ͅơ̲͚w͍̥̻̬̮̼̞͝n̨̪͔ͅ

͍͔̝͉Źa͚̦l̜͡g̰̣͇̬̝o̦̙͕͖̬̣̬͡ ̞ṯ̷e̵̬̫x̮̻͟t̛̰̲̞̥͇

̭͉̼̮̗̫͞įͅn̼͎̲ͅ ̗̼ͅm̢̖͙ẹ̷̬̗r͙̥͈͠e̼̗̗ ̨̭̳̲̜m͓͎͕̘̞̥o̪̟̮̯̻m̟e͔̗n͍͓̜͖̦t͍͎̬̦͎͡s̮̜̪̣̮̩̗

 

There are many Zalgo text generators available online, saving you the time of making it yourself. This is just one of them that exists.

Real-World Application

Meme-orable Origins

Well, to be honest, Zalgo text is probably most widely used for the creation of spooky memes and the like. In fact, Zalgo text gets its name from the following:

Zalgo is an Internet legend about an ominous entity believed to cause insanity, death and destruction of the world, similar to the creature Cthulhu created by H.P. Lovecraft in the 1920s. Zalgo is often associated with scrambled text on webpages and photos of people whose eyes and mouth have been covered in black.

Source: KnowYourMeme.com

Zalgo is closely associated with “messy text” and is first mentioned in the following parody webcomics by Dave “Shmorky” Kelly, a Something Awful “goon.”

Are Applications Afraid of Zalgo?

Different software and applications tend to react differently to Zalgo text., sometimes in strange ways.

In 2018, Robert Bindi, an Italian white hat hacker and a member of the Cyber-security company We Are Segment, discovered a vulnerability that Gmail had to Zalgo text. By sending a messages containing Zalgo text with a great number of meta characters (over 1 million) he was able to cause a Gmail account to shut down. In one instance, the account shutdown lasted for four days.  

Luckily, Bindi was a nice guy and told Google about it, waiting for them to fix it before publicizing his findings.

No explanation was ever put forth for why the glitch took place, and the most accepted explanation was that Zalgo text was simply 3spooky5gmail to handle.

Additional Resources

Download more RAM. 🐏 ⨉ 0 Posted by Louis J. V. Cicalese 8 months ago

Edit History

• [2019-02-28 17:06 PST] Louis J. V. Cicalese (8 months ago)
• [2019-02-28 17:06 PST] Louis J. V. Cicalese (8 months ago)
• [2019-02-28 17:06 PST] Louis J. V. Cicalese (8 months ago)
• [2019-02-28 17:06 PST] Louis J. V. Cicalese (8 months ago)
🕓 Posted at 28 February, 2019 17:06 PM PST

Profile Photo - Louis J. V. Cicalese Louis J. V. Cicalese
🗎 55 🗨 57 🐏 53
Author

Louis Cicalese is a person who has written about the hacker known as 4chan, the hacker known as 2channel 5channel, lesser-known search engines, CSS color namesLeeroy Jenkins, hiring Kermit the Frog impersonators and various other topics.


Account created 11 months ago.
55 posts, 57 comments, and 53 RAMs.

Last active 5 months ago:
Posted thread Remember Oregon Trail? | A Brief History of the Most Popular Educational Video Game of All Time

Post a New Comment

To leave a comment, login to your account or create an account.

Do you like having a good time?

Read Quality Articles

Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.

View Articles →

Argue with People on the Internet

Use your account to explain why people are wrong on the Internet forum.

View Forum →

Vandalize the Wiki

Or don't. I'm not your dad.

View Wiki →

Ask and/or Answer Questions

If someone asks a terrible question, post a LMGTFY link.

View Answers →

Make Some Money

Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.

Register an Account
You can also login to an existing account or recover your password. All use of this site is subject to terms outlined in the terms of service and privacy policy.