How Does Zalgo Text Work? | A Guide to Using and Misusing Unicode Diacritical Marks
Published | Last Update
Beware the end times, for Zalgo is upon us, Arch.
2,721 views, 0 RAMs, and 0 comments
What is Zalgo Text?
You have probably experience Zalgo text before, though you may not know it by name. For those unfamiliar, Zalgo text is vertically aligned and overlaid text that somewhat resembles a crazy person’s creepy wall-writing.
Seeing as Unicode characters are usually arranged horizontally and do not overlap, it may not make intuitive sense how Zalgo text can exist or how it is made. Let’s take a look at how Unicode functions and how this allows for the creation of Zalgo text.
What is Unicode?
Before Unicode, a system called ASCII was used to represent text on computers. As a 7-bit character set, ASCII was capable of storing 128 characters. This was a fine amount for English speakers, but wholly insufficient when it came to other alphabets.
Below is an ASCII character chart. Please note that “control characters” (aka non-printing characters) are listed in the “0” and “1” columns. Control characters represent special effects rather than written/printable symbols.
Shortly thereafter, it became common to use 8-bit bytes to store characters in memory. This allowed for 256 total, opening up the use of codes 128 through 255 for new characters (since 0 was also included).
These new codes acted as additions to ASCII, meaning that the original 128 characters were left unchanged. The additional characters were adequate for other Latin-based languages, as it allowed for the inclusion of accented characters and whatnot. Here is the full 8-bit ASCII character chart, with new symbols starting at row 128.
However, languages that used other alphabets, such as Russian, Japanese, Hebrew, and so on, were still out of luck, and had to develop their own codes.
Finally, Unicode began to be developed in 1987. Unicode would become a worldwide standard which allowed for a code to be assigned to virtually every character in every alphabet.
How Unicode Works
Unicode, specifically UTF-8 which is predominantly used on the Internet, uses one byte (8 bits) for the first 128 code points (which are identical to the 8-bit ASCII characters). It uses up to 4 bytes for other characters, which allows for over 4 billion separate codes.
The first 128 characters (US-ASCII) need one byte.
The next 1,920 characters need two bytes to encode, which covers the remainder of almost all Latin-script alphabets, and also Greek, Cyrillic, Coptic, Armenian, Hebrew, Arabic, Syriac, Thaana and N'Ko alphabets, as well as Combining Diacritical Marks.
Three bytes are needed for characters in the rest of the Basic Multilingual Plane, which contains virtually all characters in common use including most Chinese, Japanese and Korean characters.
Four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji (pictographic symbols).
Unicode also includes diacritical marks, which are glyphs such as accents, tildes, and umlauts (to name a few) that can be rendered above, below, or even inside base characters. The most common modifier symbols are stored in a Unicode block called Combining Diacritical Marks. As you can see from viewing the chart, these diacritical marks are all stored as their own unique characters.
Applying Diacritical Marks to Unicode Characters
A unique aspect of Unicode is that diacritical marks can be used to modify other Unicode characters. These modifiers are often known as combining characters in the context of digital typography. This function allows diacritics to be applied to any other Unicode symbol, even multiple times over if you so wish.
The method for applying diacritics varies depending on what application you are using. For example, Google Docs uses the shortcut Ctrl + Shift + U to directly apply codes.
Let’s say that you want to type an ordinary Unicode character (not using a combining character) You’re in Google Docs and want to type a capital O with an accent over it. There are a couple of other ways to do that, of course, but using Unicode you would first type in Ctrl + Shift + U, which will cause a little underline “u” to appear. This indicates that can type in the Unicode number for that symbol, which in this case “00D3.”
You have no way of knowing if that’s how I did this or not, but voila: Ó
But instead let’s pretend you want to create an “M” with a tilde over it in Google Docs. I don’t know why, but you want to do it. This will require a combining character since that isn’t a standard symbol.
First, type an “M,” then use the shortcut Ctrl + Shift + U followed by the appropriate Unicode number (in this case, 0303):
As will be mentioned below, diacritical modifiers (and Zalgo text, by extension) may not render perfectly in all applications. That was true here as well, which is why the "M with a tilde" is uploaded as a PNG.
Note: All Unicode numbers can be found through this page. Combiner characters can be more easily be found here. Numbers are formatted as “U+0303” but just ignore the “U+” while typing in the number.
Note 2: Apparently Microsoft Word requires that you type the Unicode number first, and then hitAlt + Xto make it take effect. Give it a shot, Wordians!
For other applications, you’ll have to figure out the exact process yourself.
How Does Zalgo Text Work?
Essentially, seemingly corrupted Zalgo text is made possible by the inclusion of combining characters in Unicode. By using multiple diacritics to modify a single character, the text will extend vertically and overlap with other lines of text.
As you can see, that “M” now has a bunch of tildes over it. While you can make Zalgo text by applying combiner characters manually, there is a much faster way if your only goal is to fuck up the words.
N͡o̥̠̗̘͎̣͍ẁ ͏̗̤̺͕͎̠̺y͎͈̻͉̘̲ơ̱u̠̪͜ͅ ͖h͚̲̠͉̰͇͓a͖v̳̱͕̹̬͔e̩͖̗̲̣̜͢ ̻̳̙̣̭
̛̗̞t̬̹͉̤̺̮͓h̪̼̭̳͎͇e̼̬͉̱̜̫͡ ̛̲͈̘̳͎̘̝ạb̙̰͟i̭̯̞͉̫l̝̝̦͞i̹͚͉t̺͇̖͉̭̜͍̀y̯̥̥͙̜̦͝ ̼
̴to͈͝ ̮͙̙ͅm͈a̷̜͔k̘̲̠͖̮͇̪e̦̪̬ͅ ̨͉̥y͚̭o̘̪͍̹͘u̧͖r̢̳̱̣̱̞ ͕̗̟͉̳̤ͅơ̲͚w͍̥̻̬̮̼̞͝n̨̪͔ͅ
̭͉̼̮̗̫͞įͅn̼͎̲ͅ ̗̼ͅm̢̖͙ẹ̷̬̗r͙̥͈͠e̼̗̗ ̨̭̳̲̜m͓͎͕̘̞̥o̪̟̮̯̻m̟e͔̗n͍͓̜͖̦t͍͎̬̦͎͡s̮̜̪̣̮̩̗
There are many Zalgo text generators available online, saving you the time of making it yourself. This is just one of them that exists.
Well, to be honest, Zalgo text is probably most widely used for the creation of spooky memes and the like. In fact, Zalgo text gets its name from the following:
Zalgo is an Internet legend about an ominous entity believed to cause insanity, death and destruction of the world, similar to the creature Cthulhu created by H.P. Lovecraft in the 1920s. Zalgo is often associated with scrambled text on webpages and photos of people whose eyes and mouth have been covered in black.
Are Applications Afraid of Zalgo?
Different software and applications tend to react differently to Zalgo text., sometimes in strange ways.
In 2018, Robert Bindi, an Italian white hat hacker and a member of the Cyber-security company We Are Segment, discovered a vulnerability that Gmail had to Zalgo text. By sending a messages containing Zalgo text with a great number of meta characters (over 1 million) he was able to cause a Gmail account to shut down. In one instance, the account shutdown lasted for four days.
Luckily, Bindi was a nice guy and told Google about it, waiting for them to fix it before publicizing his findings.
No explanation was ever put forth for why the glitch took place, and the most accepted explanation was that Zalgo text was simply 3spooky5gmail to handle.
- Discussion of Combining Characters in the Unicode Standard V.6.2 (Section 2.11)
- Complete Unicode Table
- Combining Diacritical Marks Table
- How to block certain websites on your computer
- Scriptable Headless Browsers 101: PhantomJS vs. Headless Chrome/Chromium vs. Headless Firefox
- The Virgin CSV vs. the Chad TSV
- Remember Oregon Trail? | A Brief History of the Most Popular Educational Video Game of All Time
- Whatever Happened to Microsoft Silverlight? | Press F to Pay Respects
- Introduction to Inkscape | How to Make Low-Effort Memes and Impress Your Friends
- What is Rollback Rx Pro? | An Instant Time Machine For Your Computer
- Jupyter Notebook | The Interactive Coding Application
- [Infographic] The Beginner's Vim Cheat Sheet
- The Complete Guide to Discord Text Formatting, Syntax Highlighting, and Tricks for Adding Colors
Post a New Comment
Do you like having a good time?
Register an Account
Read Quality Articles
Read some quality articles. If you can manage to not get banned for like five minutes, you can even post your own articles.
Argue with People on the Internet
Use your account to explain why people are wrong on the Internet forum.
Vandalize the Wiki
Or don't. I'm not your dad.
Ask and/or Answer Questions
If someone asks a terrible question, post a LMGTFY link.
Make Some Money
Hire freelancers and/or advertise your goods and/or services. Hire people directly. We're not a middleman or your dad. Manage your own business transactions.