UTF-16, the Glossary
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16).[1]
Table of Contents
78 relations: API, ASCII, Bengali language, Binary Runtime Environment for Wireless, Byte order mark, C Sharp (programming language), C++, CCSID, CD-ROM, CDMA2000, CESU-8, Character encoding, CJK characters, Code page, Code point, Comparison of Unicode encodings, D (programming language), Devanagari, Disjoint sets, Dollar sign, ECMAScript, Emoji, Endianness, Euro sign, GB 18030, GSM, GSM 03.38, IBM, IBM i, Institute of Electrical and Electronics Engineers, International Components for Unicode, Internet, Internet Assigned Numbers Authority, Internet Engineering Task Force, IPhone, ISO 9660, ISO/IEC 8859-1, ISO/IEC JTC 1/SC 2, Java (programming language), Java Platform, Standard Edition, JavaScript, Microsoft Windows, MySQL, PHP, Plain text, Plane (Unicode), Python (programming language), Qt (software), Regional indicator symbol, Self-synchronizing code, ... Expand index (28 more) »
- Computer-related introductions in 1991
- Unicode Transformation Formats
API
An is a way for two or more computer programs or components to communicate with each other.
See UTF-16 and API
ASCII
ASCII, an acronym for American Standard Code for Information Interchange, is a character encoding standard for electronic communication. UTF-16 and ASCII are character encoding.
See UTF-16 and ASCII
Bengali language
Bengali, also known by its endonym Bangla (বাংলা), is an Indo-Aryan language from the Indo-European language family native to the Bengal region of South Asia.
See UTF-16 and Bengali language
Binary Runtime Environment for Wireless
Binary Runtime Environment for Wireless (Brew MP, Brew, Qualcomm BREW, or BREW) was an application development platform created by Qualcomm, originally for code division multiple access (CDMA) mobile phones, featuring third-party applications such as mobile games.
See UTF-16 and Binary Runtime Environment for Wireless
Byte order mark
The byte-order mark (BOM) is a particular usage of the special Unicode character code,, whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text.
See UTF-16 and Byte order mark
C Sharp (programming language)
C# is a general-purpose high-level programming language supporting multiple paradigms.
See UTF-16 and C Sharp (programming language)
C++
C++ (pronounced "C plus plus" and sometimes abbreviated as CPP) is a high-level, general-purpose programming language created by Danish computer scientist Bjarne Stroustrup.
See UTF-16 and C++
CCSID
A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. UTF-16 and CCSID are character encoding.
See UTF-16 and CCSID
CD-ROM
A CD-ROM (compact disc read-only memory) is a type of read-only memory consisting of a pre-pressed optical compact disc that contains data computers can read—but not write or erase—CD-ROMs.
CDMA2000
CDMA2000 (also known as C2K or IMT Multi‑Carrier (IMT‑MC)) is a family of 3G mobile technology standards for sending voice, data, and signaling data between mobile phones and cell sites.
CESU-8
The Compatibility Encoding Scheme for UTF-16: 8-Bit (CESU-8) is a variant of UTF-8 that is described in Unicode Technical Report #26. UTF-16 and CESU-8 are character encoding and Unicode Transformation Formats.
Character encoding
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. UTF-16 and character encoding are encodings.
See UTF-16 and Character encoding
CJK characters
In internationalization, CJK characters is a collective term for graphemes used in the Chinese, Japanese, and Korean writing systems, which each include Chinese characters.
Code page
In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. UTF-16 and code page are character encoding.
Code point
A code point, codepoint or code position is a particular position in a table, where the position has been assigned a meaning. UTF-16 and code point are character encoding.
Comparison of Unicode encodings
This article compares Unicode encodings in two types of environments: 8-bit-clean environments, and environments that forbid the use of byte values with the high bit set. UTF-16 and Comparison of Unicode encodings are Unicode Transformation Formats.
See UTF-16 and Comparison of Unicode encodings
D (programming language)
D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001.
See UTF-16 and D (programming language)
Devanagari
Devanagari (देवनागरी) is an Indic script used in the northern Indian subcontinent.
Disjoint sets
In set theory in mathematics and formal logic, two sets are said to be disjoint sets if they have no element in common.
Dollar sign
The dollar sign, also known as the peso sign, is a currency symbol consisting of a capital crossed with one or two vertical strokes (or depending on typeface), used to indicate the unit of various currencies around the world, including most currencies denominated "dollar" or "peso".
ECMAScript
ECMAScript (ES) is a standard for scripting languages, including JavaScript, JScript, and ActionScript.
Emoji
An emoji (plural emoji or emojis; 絵文字) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages.
See UTF-16 and Emoji
Endianness
''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word of digital data are transmitted over a data communication medium or addressed (by rising addresses) in computer memory, counting only byte significance compared to earliness.
Euro sign
The euro sign is the currency sign used for the euro, the official currency of the eurozone and adopted, although not required to, by Kosovo and Montenegro.
GB 18030
GB 18030 is a Chinese government standard, described as Information Technology — Chinese coded character set and defines the required language and character support necessary for software in China. UTF-16 and gB 18030 are Unicode Transformation Formats.
GSM
The Global System for Mobile Communications (GSM) is a standard developed by the European Telecommunications Standards Institute (ETSI) to describe the protocols for second-generation (2G) digital cellular networks used by mobile devices such as mobile phones and tablets.
See UTF-16 and GSM
GSM 03.38
In mobile telephony GSM 03.38 or 3GPP 23.038 is a character encoding used in GSM networks for SMS (Short Message Service), CB (Cell Broadcast) and USSD (Unstructured Supplementary Service Data).
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American multinational technology company headquartered in Armonk, New York and present in over 175 countries.
See UTF-16 and IBM
IBM i
IBM i (the i standing for integrated) is an operating system developed by IBM for IBM Power Systems.
See UTF-16 and IBM i
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) professional association for electronics engineering, electrical engineering, and other related disciplines.
See UTF-16 and Institute of Electrical and Electronics Engineers
International Components for Unicode
International Components for Unicode (ICU) is an open-source project of mature C/C++ and Java libraries for Unicode support, software internationalization, and software globalization.
See UTF-16 and International Components for Unicode
Internet
The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices.
The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet Protocol–related symbols and Internet numbers.
See UTF-16 and Internet Assigned Numbers Authority
Internet Engineering Task Force
The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP).
See UTF-16 and Internet Engineering Task Force
IPhone
The iPhone is a smartphone produced by Apple that uses Apple's own iOS mobile operating system.
ISO 9660
ISO 9660 (also known as ECMA-119) is a file system for optical disc media.
ISO/IEC 8859-1
ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No.
ISO/IEC JTC 1/SC 2
ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that develops and facilitates standards within the field of coded character sets.
See UTF-16 and ISO/IEC JTC 1/SC 2
Java (programming language)
Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible.
See UTF-16 and Java (programming language)
Java Platform, Standard Edition
Java Platform, Standard Edition (Java SE) is a computing platform for development and deployment of portable code for desktop and server environments.
See UTF-16 and Java Platform, Standard Edition
JavaScript
JavaScript, often abbreviated as JS, is a programming language and core technology of the Web, alongside HTML and CSS.
Microsoft Windows
Microsoft Windows is a product line of proprietary graphical operating systems developed and marketed by Microsoft.
See UTF-16 and Microsoft Windows
MySQL
MySQL is an open-source relational database management system (RDBMS).
See UTF-16 and MySQL
PHP
PHP is a general-purpose scripting language geared towards web development.
See UTF-16 and PHP
Plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters.
Plane (Unicode)
In the Unicode standard, a plane is a contiguous group of 65,536 (216) code points.
See UTF-16 and Plane (Unicode)
Python (programming language)
Python is a high-level, general-purpose programming language.
See UTF-16 and Python (programming language)
Qt (software)
Qt (pronounced "cute" or as an initialism) is cross-platform application development framework for creating graphical user interfaces as well as cross-platform applications that run on various software and hardware platforms such as Linux, Windows, macOS, Android or embedded systems with little or no change in the underlying codebase while still being a native application with native capabilities and speed.
Regional indicator symbol
The regional indicator symbols are a set of 26 alphabetic Unicode characters (A–Z) intended to be used to encode ISO 3166-1 alpha-2 two-letter country codes in a way that allows optional special treatment.
See UTF-16 and Regional indicator symbol
Self-synchronizing code
In coding theory, especially in telecommunications, a self-synchronizing code is a uniquely decodable code in which the symbol stream formed by a portion of one code word, or by the overlapped portion of any two adjacent code words, is not a valid code word.
See UTF-16 and Self-synchronizing code
Shift JIS
Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1.
SMS
Short Message Service, commonly abbreviated as SMS, is a text messaging service component of most telephone, Internet and mobile device systems.
See UTF-16 and SMS
Swift (programming language)
Swift is a high-level general-purpose, multi-paradigm, compiled programming language created by Chris Lattner in 2010 for Apple Inc. and maintained by the open-source community.
See UTF-16 and Swift (programming language)
Symbian
Symbian was a mobile operating system (OS) and computing platform designed for smartphones.
UEFI
Unified Extensible Firmware Interface (UEFI, or as an acronym) is a specification that defines the architecture of the platform firmware used for booting the computer hardware and its interface for interaction with the operating system.
See UTF-16 and UEFI
UIQ
UIQ (formerly known as User Interface Quartz) is a discontinued software platform based upon Symbian OS, created by UIQ Technology AB.
See UTF-16 and UIQ
Unicode
Unicode, formally The Unicode Standard, is a text encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. UTF-16 and Unicode are character encoding.
Unicode Consortium
The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California, U.S. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intention of replacing existing character encoding schemes that are limited in size and scope, and are incompatible with multilingual environments.
See UTF-16 and Unicode Consortium
Unicode in Microsoft Windows
Microsoft was one of the first companies to implement Unicode in their products.
See UTF-16 and Unicode in Microsoft Windows
Universal Coded Character Set
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/IEC 10646, Information technology — Universal Coded Character Set (UCS) (plus amendments to that standard), which is the basis of many character encodings, improving as characters from previously unrepresented typing systems are added.
See UTF-16 and Universal Coded Character Set
UTF-16
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). UTF-16 and UTF-16 are character encoding, computer-related introductions in 1991, encodings and Unicode Transformation Formats.
UTF-32
UTF-32 (32-bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode code points, needing actually only 21 bits). UTF-16 and UTF-32 are character encoding and Unicode Transformation Formats.
UTF-8
UTF-8 is a variable-length character encoding standard used for electronic communication. UTF-16 and UTF-8 are character encoding, encodings and Unicode Transformation Formats.
See UTF-16 and UTF-8
Variable-width encoding
A variable-width encoding is a type of character encoding scheme in which codes of differing lengths are used to encode a character set (a repertoire of symbols) for representation, usually in a computer. UTF-16 and variable-width encoding are character encoding.
See UTF-16 and Variable-width encoding
WHATWG
The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies.
A widget toolkit, widget library, GUI toolkit, or UX library is a library or a collection of libraries containing a set of graphical control elements (called widgets) used to construct the graphical user interface (GUI) of programs.
Windows 10
Windows 10 is a major release of Microsoft's Windows NT operating system.
Windows 2000
Windows 2000 is a major release of the Windows NT operating system developed by Microsoft and oriented towards businesses.
Windows 7
Windows 7 is a major release of the Windows NT operating system developed by Microsoft.
Windows Embedded Compact
Windows Embedded Compact, formerly Windows Embedded CE, Windows Powered and Windows CE, is a discontinued operating system developed by Microsoft for mobile and embedded devices.
See UTF-16 and Windows Embedded Compact
Windows Insider
Windows Insider is an open software testing program by Microsoft that allows users globally who own a valid license of Windows 11, Windows 10, or Windows Server to register for pre-release builds of the operating system previously only accessible to software developers.
See UTF-16 and Windows Insider
Windows NT
Windows NT is a proprietary graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993.
Windows Vista
Windows Vista is a major release of the Windows NT operating system developed by Microsoft.
Windows XP
Windows XP is a major release of Microsoft's Windows NT operating system.
Word joiner
The word joiner (WJ) is a Unicode format character which is used to indicate that line breaking should not occur at its position.
Xbox
Xbox is a video gaming brand that consists of five home video game consoles, as well as applications (games), streaming service Xbox Cloud Gaming, and online services such as the Xbox network and Xbox Game Pass.
See UTF-16 and Xbox
.NET Framework
The.NET Framework (pronounced as "dot net") is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows.
16-bit computing
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
See UTF-16 and 16-bit computing
See also
- AF/91
- ATM (computer)
- Am386
- Amiga 3000T
- Apple IIe Card
- Architecture Neutral Distribution Format
- Atari MEGA STE
- CDTV
- ChessMachine
- Compaq Portable 486
- Cray C90
- DECpc
- Dubna 48K
- Floptical
- Gateway AnyKey
- General MIDI
- Gopher (protocol)
- HP 95LX
- I486SX
- IBM 386SLC
- IBM PCradio
- IBM PS/2 Model L40 SX
- IBM PS/55 Note
- IrisVision
- Macintosh Classic II
- Macintosh Quadra
- Macintosh Quadra 700
- Macintosh Quadra 900
- Neo Geo (system)
- PowerBook
- PowerBook 100
- PowerBook 100 series
- PowerBook 140
- PowerBook 170
- PowerBuilder
- PowerPC
- Psion Series 3
- QuickTime File Format
- Resource Interchange File Format
- ST Book
- Scorpion ZS-256
- Sharp PC-3000
- UTF-16
- WAV
- WIMG (computing)
- Z-variant
Unicode Transformation Formats
- Binary Ordered Compression for Unicode
- CESU-8
- Comparison of Unicode encodings
- GB 18030
- Popularity of text encodings
- Punycode
- Standard Compression Scheme for Unicode
- UTF-1
- UTF-16
- UTF-32
- UTF-7
- UTF-8
- UTF-EBCDIC
References
[1] https://en.wikipedia.org/wiki/UTF-16
Also known as 16-bit characters, AL16UTF16, Code page 1200, Code page 1201, Code page 13488, CsUTF16, CsUTF16BE, CsUTF16LE, Oracle AL16UTF16, Supplementary character, Surrogate pair, Surrogate pairs, UCS 2, UCS-2, UCS-2BE, UCS-2LE, UCS2, UTF 16, UTF-16/UCS-2, UTF-16BE, UTF-16LE, UTF16, UTF16BE, UTF16LE, Unicode 16, Windows-1200, Windows-1201.
, Shift JIS, SMS, Swift (programming language), Symbian, UEFI, UIQ, Unicode, Unicode Consortium, Unicode in Microsoft Windows, Universal Coded Character Set, UTF-16, UTF-32, UTF-8, Variable-width encoding, WHATWG, Widget toolkit, Windows 10, Windows 2000, Windows 7, Windows Embedded Compact, Windows Insider, Windows NT, Windows Vista, Windows XP, Word joiner, Xbox, .NET Framework, 16-bit computing.