Skip to content
Contact Us

HL7 v2 Encoding & Delimiters

HL7 v2 messages are plain text. Their entire structure comes from a small set of delimiter characters that divide a message into a hierarchy of segments, fields, components, and subcomponents. Every message declares its own delimiters in the MSH header, so a receiver can decode it without any prior agreement.

HL7 delimiters are the special characters that give a plain-text HL7 v2 message its structure. Five are standard: the field separator (|), component separator (^), repetition separator (~), escape character (\), and subcomponent separator (&). Each message declares its own delimiter set in the first two MSH fields, conventionally written MSH|^~\&.

An HL7 v2 message nests several levels deep:

  1. Message — one MSH segment followed by other segments. Each segment is terminated by a carriage return.
  2. Segment — a three-character segment ID (MSH, PID, OBX, …) followed by fields joined by the field separator (|). Fields are referenced by position: PID-5 is the fifth field of the PID segment.
  3. Field — may carry repetitions joined by the repetition separator (~), and divides into components joined by the component separator (^).
  4. Component — divides into subcomponents joined by the subcomponent separator (&).

For example, the patient-name field PID-5 holding DOE^JOHN^ALEXANDER has three components — family name DOE, given name JOHN, and middle name ALEXANDER.

Every message declares the delimiters it uses in the first two fields of the MSH segment:

  • MSH-1 — Field Separator. The single character that immediately follows MSH. Conventionally |. This field defines itself — it is the separator.
  • MSH-2 — Encoding Characters. The next four characters: the component, repetition, escape, and subcomponent separators, in that order. Conventionally ^~\&.

So virtually every HL7 v2 message begins:

MSH|^~\&|...

A conformant receiver reads these characters from each message rather than assuming them. In practice the values above are near-universal, but the standard permits others.

MSH-2 declares these four characters:

CharacterNameRole
^Component separatorSeparates components within a field
~Repetition separatorSeparates repeated values of a single field
\Escape characterIntroduces an escape sequence
&Subcomponent separatorSeparates subcomponents within a component
Every HL7 v2 message opens by declaring its own delimiters. MSH-1 is whatever character follows MSH — the field separator. MSH-2 lists the next four: component, repetition, escape, and subcomponent. Those five characters are all a receiver needs to unpack the message into its segment, field, component, and subcomponent hierarchy.

Each segment is terminated by a single carriage return<CR>, ASCII 0x0D, written \r. Not a line feed (\n), and not a CRLF pair.

A field value cannot contain a raw delimiter character — it would be parsed as structure. To include a delimiter literally, or to embed formatting, use an escape sequence: the escape character, a short code, and a closing escape character.

SequenceMeaning
\F\A literal field-separator character
\S\A literal component-separator character
\T\A literal subcomponent-separator character
\R\A literal repetition-separator character
\E\A literal escape character
\.br\Line break, in formatted-text fields
\Xdddd\One or more characters given as hexadecimal
\H\, \N\Start and end highlighted text

For example, a company name that contains an ampersand — Smith & Sons — is encoded Smith \T\ Sons, so the & is not misread as a subcomponent separator.