BizTalk Server: Transform text files (Flat Files) into XML–Introduction (Part 1)

Posted: August 18, 2012  |  Categories: BizTalk Schemas

Introduction

Transformations are one of the most common components in the integration processes. They act as essential translators in the decoupling between the different systems to connect. This article aims to help you understand the process of transforming a text file (also called Flat Files) into an XML document using BizTalk Server Flat File Schemas.

Normally we associate the transformations of documents to BizTalk maps, but the reality is that there are two types of transformations: structure transformation (semantics) and representation transformation (syntax). These latest occurs typically at receiving or sent ports of BizTalk Server,

This article intends to be an introductory note for whom is taking the first steps in this technology.

One of the most ancient and common standards for message representation is to use text files (Flat Files) like: CSV (Comma Separated Values) or TXT files, many of them custom-made for their systems. However, over the time, XML became the standard message format because of its widespread use by major corporations and open source development efforts. However do not be fooled and think that these kinds of message are outdated and rarely used, a good example is EDI messages, which is used extensively by large companies, so it is often necessary to transform text files into XML and vice versa.

FlatFile-sample

While tools like Excel can help us interpret such files, this type of process is always iterative and requires few user tips so that software can determine where is a need to separate the fields/columns as well the data type of each field. But for a system integration (Enterprise Application Integration) like BizTalk Server, you must reduce any ambiguity, so that these kinds of operations can be performed thousands of times with confidence and without having recourse to a manual operator.

Map or Schema Annotation?

As mentioned in the introduction, we can characterize two types of transformations existing in BizTalk:

  • Semantic Transformations: This type of transformation usually occurs only in BizTalk maps. Here, the document maintains the same syntax that is represented (XML) but changes its semantics (data content). This type of transformation is typically one-way, since that when we added and aggregate small parts of the information, that compose the document into another different document, we may miss important details for its reconstruction.

  • Syntax Transformations: This type of transformations occurs in the receive or send pipelines and aim to transform a document into another representation, e.g. CSV to XML. Here, the document maintains the same data (semantics) but changes the syntax that is represented. I.e. we translate the document, but typically we don’t modify the structure. Normally, this type of transformation is bidirectional, since we still have the same semantic content, we can apply the same transformation logic and obtain the document in its original format. Common examples of these transformations are also conversions between HL7 and XML, or EDI and XML.

Note: In this article, we will talk only of Syntax transformations. If you are looking to learn more about semantic transformations, you can consult the article “BizTalk Server: Basics principles of Maps“.

How does the text files (Flat Files) are processed by BizTalk?

Internally, BizTalk “prefers” to use the message type XML. If messages are in XML format BizTalk “offers” numerous automatisms that are very useful in these environments, such as: message routing based on a particular field (promoted property); tracking and analysis of multidimensional values and dimensions with BAM (Business Activity Monitoring), or making logical decisions within orchestrations (business processes) using elements of the message.

If messaging is the foundation of BizTalk Server, the message schemas are the bedrock on which messaging is built. Fortunately, BizTalk supports the conversion of text files to XML in a simple and intuitive manner, using for that “Flat File Schemas” that are simple XML schemas (XSD) with specific annotations. At first glance, this may seem strange, because the XML Schemas (XSD) are used to describe XML files, however, BizTalk uses them as metadata to describe not only XML documents but also text files (flat file).

The trick is that all the necessary information, such as the delimiter symbols, or the element size in a positional file, i.e. the definition of the rules of parsing (transformation rules) are embedded in the form of annotations in XML Schema (XSD), thereby simplifying the reuse of all these schemes in different parts of the process. At any point, the document can be translated back into flat-file because the definition is declarative and symmetric.

Where the Syntax Transformations can occur?

This type of transformations can occur in receive or send pipelines, usually, text files (Flat Files) are processed at runtime as follows:

  • The Flat Files are received by an adapter associated with a receive location (Folder in File System for example)
  • A pipeline configured in the receive location will be responsible for transforming the Flat File into its equivalent XML;
  • One or more interested in the message, such as an orchestration will subscribe to the XML document and this message will go through the business process. Note, in a pure messaging scenario there is no need to have orchestrations;
  • If and when necessary, BizTalk can send XML messages, again, as text files (Flat Files) by using another pipeline in the send ports, which will be responsible for transforming the XML into its equivalent the Flat File;

As the image below shows:

biztalk-flatlfile-runtime-architecture

The receive pipeline consists of four stages, being that syntax transformations may occur in two of them:

  • Decode stage: This stage is used for components that decode or decrypt the message. The MIME/SMIME Decoder pipeline component or a custom decoding component should be placed in this stage if the incoming messages need to be decoded from one format to another. The syntax transformations can occur at this stage through a custom component.
  • Disassemble stage: This stage is used for components that parse or disassemble the message. The syntax transformations should occur at this stage. In the example that will be demonstrated in this article, we will use the “Flat file disassembler” to transform a text file into XML.

receive-pipeline-architecture

  • Validate stage: This stage is used for components that validate the message format. A pipeline component processes only messages that conform to the schemas specified in that component. If a pipeline receives a message whose schema is not associated with any component in the pipeline, that message is not processed. Depending on the adapter that submits the message, the message is either suspended or an error is issued to the sender.
  • ResolveParty stage: This stage is a placeholder for the Party Resolution Pipeline Component.

Regarding send pipelines, they consist of three stages, being that syntax transformations may occur also in two of them:

  • Pre-assemble stage: This stage is a placeholder for custom components that should perform some action on the message before the message is serialized.

send-pipeline-architecture

  • Assemble stage: Components in this stage are responsible for assembling or serializing the message and converting it to or from XML. The syntax transformations should occur at this stage.
  • Encode stage: This stage is used for components that encode or encrypt the message. Place the MIME/SMIME Encoder component or a custom encoding component in this stage if message signing is required. The syntax transformations can occur at this stage through a custom component.

Related links

Author: Sandro Pereira

Sandro Pereira lives in Portugal and works as a consultant at DevScope. In the past years, he has been working on implementing Integration scenarios both on-premises and cloud for various clients, each with different scenarios from a technical point of view, size, and criticality, using Microsoft Azure, Microsoft BizTalk Server and different technologies like AS2, EDI, RosettaNet, SAP, TIBCO etc. He is a regular blogger, international speaker, and technical reviewer of several BizTalk books all focused on Integration. He is also the author of the book “BizTalk Mapping Patterns & Best Practices”. He has been awarded MVP since 2011 for his contributions to the integration community.

  • If messaging is the foundation of BizTalk Server, the message schemas are the bedrock on which messaging is built.

One Platform Operations, Monitoring and Analytics Software
BizTalk360

microsoft biztalk

Learn more

Over 500 customers across 30+ countries depend on BizTalk360

ServiceBus360

Azure service bus

Learn more

Start managing your Azure Service Bus namespaces in minutes

One Platform - Operations, Monitoring and Analytics Software
BizTalk360

microsoft biztalk

Learn more

Over 500 customers across 30+ countries depend on BizTalk360

One Platform - Operations, Monitoring and Analytics Software
ServiceBus360

Azure service bus

Learn more

Start managing your Azure Service Bus namespaces in minutes

Back to Top