EDIFACT Encoding – EDI Character Set Support

For EDIFACT encoded interchanges, you can set the character set for a party by setting the UNB1.1 party property in the UNB Segment Definition property page for the party as interchange receiver.

The encoding used in an incoming interchange is determined by the value of the UNB1.1 field in the header of the interchange.

This topic indicates which character sets are supported in the EDIFACT features of BizTalk Server:

Encoding Allows
UNOA As defined in ISO 646 (with the exception of letters, lowercase a to z).

  • A to Z
  • 0 to 9
  • . , – ( ) / = (space)
UNOB As defined in ISO 646

  • All of UNOA
  • a to z
  • ‘ + : ? ! ” % & * ; < >
UNOC As defined in ISO 8859-1: Information processing – Part 1: Latin alphabet No. 1.
UNOD As defined in ISO 8859-2: Information processing – Part 2: Latin alphabet No. 2.
UNOE As defined in ISO 8859-5: Information processing – Part 5: Latin/Cyrillic alphabet.
UNOF As defined in ISO 8859-7: Information processing – Part 7: Latin/Greek alphabet.
UNOG As defined in ISO 8859-3: Information processing – Part 3: Latin alphabet.
UNOH As defined in ISO 8859-4: Information processing – Part 4: Latin alphabet.
UNOI As defined in ISO 8859-6: Information processing – Part 6: Latin/Arabic alphabet.
UNOJ As defined in ISO 8859-8: Information processing – Part 8: Latin/Hebrew alphabet.
UNOK As defined in ISO 8859-9: Information processing – Part 9: Latin alphabet.
UNOX Code extension technique as defined by ISO 2022 utilizing the escape techniques in accordance with ISO 2375.

ISO-2022-JP character set

  • This code page allows the escape techniques in accordance with ISO 2375. The text starts in ASCII and switches to Japanese characters through an escape sequence. The bytes following the escape sequence are encoded in two bytes each
UNOY ISO 10646-1 octet without code extension technique.
KECA A to Z
0 to 9
. , – ( ) / = ! ” % & * ; < > Windows 949 code page

  • Korean Syllables (2350 characters)
  • Korean Hanja (4888 characters)
  • Korean Alphabets
  • Characters and numbers enclosed in a circle
  • The length of the strings are counted by bytes instead of characters. So if you have a data element of length 3, you can have 3 latin characters, 1 Korean character or 1 Korean and 1 Latin character!
#1 Azure Monitoring Platform
Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc. He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community.

9 thoughts on “EDIFACT Encoding – EDI Character Set Support”

  1. thank you for the throughout description of edi encodings.
    what about different versions? How is UNOC:2 different from UNOC:3?

    1. The number in the next element (2, 3) refers to the syntax of the message. UNOC:3 = Encoding ISO 8859-1 and syntax of the message 3.

  2. Hello I am facing an issue in processing EDIFACT file containing European characters. UNB1.1 Contains UNOD,
    An output message of the component “Unknown ” in receive pipeline “Microsoft.BizTalk.Edi.DefaultPipelines.EdiReceive, Microsoft.BizTalk.Edi.EdiPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35” is suspended due to the following error:
    Error encountered during parsing. The Edifact transaction set with id ‘00000269034’ contained in interchange (without group) with id ‘00003’, with sender id ‘O0013000001AB R3A’, receiver id ‘209690’ is being suspended with following errors:
    Error: 1 (Field level error)
    SegmentID: FTX
    Position in TS: 6
    Data Element ID: C10801
    Position in Segment: 5
    Position in Field: 1
    Data Value:
    21: Invalid character found
    ŠKOT This characters do not process

  3. Actually, it is a bit more complicated than that, AFAIK. Syntax version 1+2 only support UNOA and UNOB. With Syntax version 3 support for UNOC-UNOF were added, according to the VDA specification, which builds up on Edifact, syntax version 3 should also defines UNOE and UNOG, though I couldn’t find any freely accessible sources on that matter. All the other encodings such as UNOW for UTF-8 encoded representations should only be valid if syntax version 4 is declared.

Leave a Reply

Your email address will not be published. Required fields are marked *

turbo360

Back to Top