BizTalk Guide: EDIFACT Encoding and Character Set Support

EDIFACT Encoding – EDI Character Set Support

Sandro Pereira
Aug 15, 2009
3 min read

In the world of Electronic Data Interchange (EDI), the way data is encoded is just as important as the data itself. In BizTalk Server, if your pipeline configuration doesn’t match the character set defined in the EDIFACT interchange, your integration will fail before it even reaches your mapping logic.

For EDIFACT encoded interchanges, you can set the character set for a party by setting the UNB1.1 party property in the UNB Segment Definition property page for the party as interchange receiver.

The encoding used in an incoming interchange is determined by the value of the UNB1.1 field in the interchange header.

📝 One-Minute Brief

Processing EDIFACT messages in BizTalk Server requires a precise match between the sender’s character set and the BizTalk pipeline configuration. This post explores the support for UNB1 (Syntax Identifier) and UNB2 (Syntax Version), explaining how to handle different encoding levels like UNOA, UNOB, and UNOC. Failure to align these settings often leads to “Invalid Character” errors or serialization failures in the EDI Assembler.

This topic indicates which character sets are supported in the EDIFACT features of BizTalk Server:

Encoding	Allows
UNOA	As defined in ISO 646 (with the exception of letters, lowercase a to z). A to Z0 to 9. , – ( ) / = (space)
UNOB	As defined in ISO 646 All of UNOAa to z‘ + : ? ! ” % & * ; < >
UNOC	As defined in ISO 8859-1: Information processing – Part 1: Latin alphabet No. 1.
UNOD	As defined in ISO 8859-2: Information processing – Part 2: Latin alphabet No. 2.
UNOE	As defined in ISO 8859-5: Information processing – Part 5: Latin/Cyrillic alphabet.
UNOF	As defined in ISO 8859-7: Information processing – Part 7: Latin/Greek alphabet.
UNOG	As defined in ISO 8859-3: Information processing – Part 3: Latin alphabet.
UNOH	As defined in ISO 8859-4: Information processing – Part 4: Latin alphabet.
UNOI	As defined in ISO 8859-6: Information processing – Part 6: Latin/Arabic alphabet.
UNOJ	As defined in ISO 8859-8: Information processing – Part 8: Latin/Hebrew alphabet.
UNOK	As defined in ISO 8859-9: Information processing – Part 9: Latin alphabet.
UNOX	A to Z 0 to 9 . , – ( ) / = ! ” % & * ; < > Windows 949 code page – Korean Syllables (2350 characters) – Korean Hanja (4888 characters) – Korean Alphabets – Characters and numbers enclosed in a circle. – The length of the strings is counted by bytes instead of characters. So if you have a data element of length 3, you can have 3 Latin characters, 1 Korean character, or 1 Korean and 1 Latin character!
UNOY	ISO 10646-1 octet without code extension technique.
KECA	A to Z 0 to 9 . , – ( ) / = ! ” % & * ; < > Windows 949 code page – Korean Syllables (2350 characters) – Korean Hanja (4888 characters) – Korean Alphabets – Characters and numbers enclosed in a circle. – The length of the strings is counted by bytes instead of characters. So if you have a data element of length 3, you can have 3 latin characters, 1 Korean character or 1 Korean and 1 Latin character!

Character set support is the “fine-tuning” of your EDI solution. By understanding the limitations of each syntax identifier, you can proactively prevent serialization errors and ensure smooth data flow with your international partners.

Hope you find this helpful! If you liked the content or found it useful and would like to support me in writing more, consider buying (or helping to buy) a Star Wars Lego set for my son.

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc. He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community. View all posts by Sandro Pereira

9 thoughts on “EDIFACT Encoding – EDI Character Set Support”

useful

thank you for the throughout description of edi encodings.
what about different versions? How is UNOC:2 different from UNOC:3?

Tomas says:

April 8, 2019 at 9:27 am

The number in the next element (2, 3) refers to the syntax of the message. UNOC:3 = Encoding ISO 8859-1 and syntax of the message 3.

Reply

Thank you for this post!

BTS 2016: Is there any way how to set syntax version 4 in outbound one way agreement? Even documented:

https://docs.microsoft.com/en-us/biztalk/core/configuring-charset-and-separators-edifact

I do not see this option in GUI (Characters set and separators. There is only UNB1.1 to select not UNB1.2

Robert says:

June 27, 2019 at 6:19 am

Just additional comment, this part of GUI disappeared after applying CU 5 and FP 3

Reply

Hello I am facing an issue in processing EDIFACT file containing European characters. UNB1.1 Contains UNOD,
An output message of the component “Unknown ” in receive pipeline “Microsoft.BizTalk.Edi.DefaultPipelines.EdiReceive, Microsoft.BizTalk.Edi.EdiPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35” is suspended due to the following error:
Error encountered during parsing. The Edifact transaction set with id ‘00000269034’ contained in interchange (without group) with id ‘00003’, with sender id ‘O0013000001AB R3A’, receiver id ‘209690’ is being suspended with following errors:
Error: 1 (Field level error)
SegmentID: FTX
Position in TS: 6
Data Element ID: C10801
Position in Segment: 5
Position in Field: 1
Data Value:
21: Invalid character found
ÂŠKOT This characters do not process

Hello, I need set encoding UNOW, but I cannot find it in BizTalk. Is some choice how can I set it please?

Thank you

Actually, it is a bit more complicated than that, AFAIK. Syntax version 1+2 only support UNOA and UNOB. With Syntax version 3 support for UNOC-UNOF were added, according to the VDA specification, which builds up on Edifact, syntax version 3 should also defines UNOE and UNOG, though I couldn’t find any freely accessible sources on that matter. All the other encodings such as UNOW for UTF-8 encoded representations should only be valid if syntax version 4 is declared.

EDIFACT Encoding – EDI Character Set Support

📝 One-Minute Brief

Author: Sandro Pereira

9 thoughts on “EDIFACT Encoding – EDI Character Set Support”

Leave a Reply Cancel reply

The Ultimate Cloud
Management Platform for Azure

Supercharge your Azure Cost Saving

EDIFACT Encoding – EDI Character Set Support

📝 One-Minute Brief

Author: Sandro Pereira

9 thoughts on “EDIFACT Encoding – EDI Character Set Support”

Leave a Reply Cancel reply

The Ultimate Cloud Management Platform for Azure

Supercharge your Azure Cost Saving

The Ultimate Cloud
Management Platform for Azure