HL7 v2 Encoding & Delimiters
HL7 v2 messages are plain text. Their entire structure comes from a small set of delimiter characters that divide a message into a hierarchy of segments, fields, components, and subcomponents. Every message declares its own delimiters in the MSH header, so a receiver can decode it without any prior agreement.
What are HL7 delimiters?
Section titled “What are HL7 delimiters?”HL7 delimiters are the special characters that give a plain-text HL7 v2 message its structure. Five are standard: the field separator (|), component separator (^), repetition separator (~), escape character (\), and subcomponent separator (&). Each message declares its own delimiter set in the first two MSH fields, conventionally written MSH|^~\&.
Message structure
Section titled “Message structure”An HL7 v2 message nests several levels deep:
- Message — one
MSHsegment followed by other segments. Each segment is terminated by a carriage return. - Segment — a three-character segment ID (
MSH,PID,OBX, …) followed by fields joined by the field separator (|). Fields are referenced by position:PID-5is the fifth field of thePIDsegment. - Field — may carry repetitions joined by the repetition separator (
~), and divides into components joined by the component separator (^). - Component — divides into subcomponents joined by the subcomponent separator (
&).
For example, the patient-name field PID-5 holding DOE^JOHN^ALEXANDER has three components — family name DOE, given name JOHN, and middle name ALEXANDER.
The delimiter declaration
Section titled “The delimiter declaration”Every message declares the delimiters it uses in the first two fields of the MSH segment:
MSH-1— Field Separator. The single character that immediately followsMSH. Conventionally|. This field defines itself — it is the separator.MSH-2— Encoding Characters. The next four characters: the component, repetition, escape, and subcomponent separators, in that order. Conventionally^~\&.
So virtually every HL7 v2 message begins:
MSH|^~\&|...A conformant receiver reads these characters from each message rather than assuming them. In practice the values above are near-universal, but the standard permits others.
Encoding characters
Section titled “Encoding characters”MSH-2 declares these four characters:
| Character | Name | Role |
|---|---|---|
^ | Component separator | Separates components within a field |
~ | Repetition separator | Separates repeated values of a single field |
\ | Escape character | Introduces an escape sequence |
& | Subcomponent separator | Separates subcomponents within a component |
The segment terminator
Section titled “The segment terminator”Each segment is terminated by a single carriage return — <CR>, ASCII 0x0D, written \r. Not a line feed (\n), and not a CRLF pair.
Escape sequences
Section titled “Escape sequences”A field value cannot contain a raw delimiter character — it would be parsed as structure. To include a delimiter literally, or to embed formatting, use an escape sequence: the escape character, a short code, and a closing escape character.
| Sequence | Meaning |
|---|---|
\F\ | A literal field-separator character |
\S\ | A literal component-separator character |
\T\ | A literal subcomponent-separator character |
\R\ | A literal repetition-separator character |
\E\ | A literal escape character |
\.br\ | Line break, in formatted-text fields |
\Xdddd\ | One or more characters given as hexadecimal |
\H\, \N\ | Start and end highlighted text |
For example, a company name that contains an ampersand — Smith & Sons — is encoded Smith \T\ Sons, so the & is not misread as a subcomponent separator.