This document outlines the technical principles governing SMS character limits and the mechanism for sending long messages, known as Concatenated SMS.
Contact Us
Oct-13-25 10:32
When an SMS message exceeds the character limit of a single message, it is segmented into multiple parts for transmission and reassembled at the receiving device. This is known as a Concatenated SMS or Long SMS. Billing is typically based on the number of segments.
The character limit is determined by the encoding scheme used, primarily falling into two categories:
GSM-7 Encoding
Supports a basic character set of approximately 128 characters, including English letters, numbers, and basic Latin-based punctuation.
Single Message Limit: 160 characters.
For long messages, each segment of 153 characters uses a 7-byte User Data Header (UDH) to manage reassembly.
UCS-2 (UTF-16) Encoding
Used for messages containing characters outside the GSM-7 character set, such as non-Latin scripts (e.g., Chinese, Cyrillic, Arabic), complex symbols, or emojis.
Single Message Limit: 70 characters.
For long messages, each segment of 67 characters uses a UDH for reassembly.
GSM-7 Encoding (e.g., Pure English Text)
| Long SMS Segments | Total Available Characters | Header Overhead (UDH) | Notes |
|---|---|---|---|
| 1 | 160 | 0 bytes | Standard single SMS. |
| 2 | 306 | 7 bytes | 160 chars - 7-byte UDH = 153 chars per segment. |
| 3 | 459 | 14 bytes | |
| 4 | 612 | 21 bytes | |
| 5 | 765 | 28 bytes |
UCS-2 Encoding (e.g., Non-English Characters)
| Long SMS Segments | Total Available Characters | Header Overhead (UDH) | Notes |
|---|---|---|---|
| 1 | 70 | 0 bytes | Standard single SMS. |
| 2 | 134 | 6 bytes | 70 chars - 3-byte UDH = 67 chars per segment. |
| 3 | 201 | 9 bytes | |
| 4 | 268 | 12 bytes | |
| 5 | 335 | 15 bytes |
Character Set Reference:
The GSM-7 default alphabet includes (but is not limited to):@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}\[~]|€
Any character not found in this set will typically cause the message to be encoded in UCS-2, reducing the per-segment character capacity.