SMS Encoding and Segmentation

This document outlines the technical principles governing SMS character limits and the mechanism for sending long messages, known as Concatenated SMS.

Contact Us
SMS Encoding and Segmentation

Oct-13-25 10:32

Core Concepts

When an SMS message exceeds the character limit of a single message, it is segmented into multiple parts for transmission and reassembled at the receiving device. This is known as a Concatenated SMS or Long SMS. Billing is typically based on the number of segments.

The character limit is determined by the encoding scheme used, primarily falling into two categories:

  1. GSM-7 Encoding

    • Supports a basic character set of approximately 128 characters, including English letters, numbers, and basic Latin-based punctuation.

    • Single Message Limit: 160 characters.

    • For long messages, each segment of 153 characters uses a 7-byte User Data Header (UDH) to manage reassembly.

  2. UCS-2 (UTF-16) Encoding

    • Used for messages containing characters outside the GSM-7 character set, such as non-Latin scripts (e.g., Chinese, Cyrillic, Arabic), complex symbols, or emojis.

    • Single Message Limit: 70 characters.

    • For long messages, each segment of 67 characters uses a UDH for reassembly.


Long SMS Segmentation Details

GSM-7 Encoding (e.g., Pure English Text)

 
 
Long SMS Segments Total Available Characters Header Overhead (UDH) Notes
1 160 0 bytes Standard single SMS.
2 306 7 bytes 160 chars - 7-byte UDH = 153 chars per segment.
3 459 14 bytes  
4 612 21 bytes  
5 765 28 bytes  

UCS-2 Encoding (e.g., Non-English Characters)

 
 
Long SMS Segments Total Available Characters Header Overhead (UDH) Notes
1 70 0 bytes Standard single SMS.
2 134 6 bytes 70 chars - 3-byte UDH = 67 chars per segment.
3 201 9 bytes  
4 268 12 bytes  
5 335 15 bytes  

Character Set Reference:

The GSM-7 default alphabet includes (but is not limited to):
@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}\[~]|€

Any character not found in this set will typically cause the message to be encoded in UCS-2, reducing the per-segment character capacity.

Get Support

The product you were looking for was not found

Contact US

Submit order

If you have any questions, you are welcome to submit a ticket to us

Get a Touch