COBOL - Unstring Statement
The UNSTRING statement causes contiguous data in a sending field to be separated and placed into multiple receiving fields.
The UNSTRING statement is used to parse individual items from within a single string. Any number of items may be parsed. Entire or partial strings may be parsed. As many items as are provided as INTO operands will be parsed.
Syntax:
UNSTRING identifier-1 DELIMITED BY ALL identifier-2 or literal-1
[OR] [ALL identifier-3 or literal-2]
INTO identifier-4
[DELIMITER IN identifier-5]
[COUNT IN identifier-6]
[WITH POINTER identifier-7]
[TALLYING IN identifier-8]
[ON OVERFLOW imperative-statement-1]
[NOT ON OVERFLOW imperative-statement-2]
END-UNSTRING |
identifier-1
Represents the sending field. Data is transferred from this field to the data receiving fields (identifier-4).
identifier-1 must reference a data item of category alphabetic, alphanumeric, alphanumeric-edited, DBCS, national, or national-edited.
identifier-2, literal-1, identifier-3, literal-2
Specifies one or more delimiters.
identifier-2 and identifier-3 must reference data items of category alphabetic, alphanumeric, alphanumeric-edited, DBCS, national, or national-edited.
literal-1 or literal-2 must be of category alphanumeric, DBCS, or national and must not be a figurative constant that begins with the word ALL.
identifier-4
Specifies one or more receiving fields.
identifier-4 must reference a data item of category alphabetic, alphanumeric, numeric, DBCS, or national. If the referenced data item is of category numeric, its picture character-string must not contain the picture symbol P, and its usage must be DISPLAY or NATIONAL.
identifier-5
Specifies a field to receive the delimiter associated with identifier-4.
Identifier-5 must reference a data item of category alphabetic, alphanumeric, DBCS, or national.
identifier-6
Specifies a field to hold the count of characters that are transferred to identifier-4.
identifier-6 must be an integer data item defined without the symbol P in its PICTURE character-string.
identifier-7
Specifies a field to hold a relative character position during UNSTRING processing.
identifier-7 must be an integer data item defined without the symbol P in the PICTURE string.
identifier-7 must be described as a data item of sufficient size to contain a value equal to 1 plus the number of character positions in the data item referenced by identifier-1.
identifier-8
Specifies a field that is incremented by the number of delimited fields processed.
identifier-8 must be an integer data item defined without the symbol P in its PICTURE character-string.
DELIMITED BY
This phrase specifies delimiters within the data that control the data transfer.
Each identifier-2, identifier-3, literal-1, or literal-2 represents one delimiter.
If the DELIMITED BY phrase is not specified, the DELIMITER IN and COUNT IN phrases must not be specified.
ALL
Multiple contiguous occurrences of any delimiters are treated as if there were only one occurrence; this one occurrence is moved to the delimiter receiving field (identifier-5), if specified.
The delimiting characters in the sending field are treated as an elementary item of the same usage and category as identifier-1 and are moved into the current delimiter receiving field according to the rules of the MOVE statement.
When DELIMITED BY ALL is not specified, and two or more contiguous occurrences of any delimiter are encountered, the current data receiving field (identifier-4) is filled with spaces or zeros, according to the description of the data receiving field.
Delimiter with two or more characters
A delimiter that contains two or more characters is recognized as a delimiter only if the delimiting characters are both of the following:
- Contiguous
- In the sequence specified in the sending field
Two or more delimiters
When two or more delimiters are specified, an OR condition exists, and each nonoverlapping occurrence of any one of the delimiters is recognized in the sending field in the sequence specified.
For example:
DELIMITED BY "AB" or "BC" |
An occurrence of either AB or BC in the sending field is considered a delimiter. An occurrence of ABC is considered an occurrence of AB.
INTO
This phrase specifies the fields where the data is to be moved.
identifier-4 represents the data receiving fields.
DELIMITER IN
This phrase specifies the fields where the delimiters are to be moved.
identifier-5 represents the delimiter receiving fields.
The DELIMITER IN phrase must not be specified if the DELIMITED BY phrase is not specified.
COUNT IN
This phrase specifies the field where the count of examined character positions is held.
identifier-6 is the data count field for each data transfer. Each field holds the count of examined character positions in the sending field, terminated by the delimiters or the end of the sending field, for the move to this receiving field. The delimiters are not included in this count.
The COUNT IN phrase must not be specified if the DELIMITED BY phrase is not specified.
POINTER
When the POINTER phrase is specified, the value of the pointer field, identifier-7, behaves as if it were increased by 1 for each examined character position in the sending field.
When execution of the UNSTRING statement is completed, the pointer field contains a value equal to its initial value plus the number of character positions examined in the sending field.
When this phrase is specified, the user must initialize the pointer field before execution of the UNSTRING statement begins.
TALLYING IN
When the TALLYING phrase is specified, the area count field, identifier-8, contains (at the end of execution of the UNSTRING statement) a value equal to the initial value plus the number of data receiving areas acted upon.
When this phrase is specified, the user must initialize the area count field before execution of the UNSTRING statement begins.
ON OVERFLOW
An overflow condition exists when:
- The pointer value (explicit or implicit) is less than 1.
- The pointer value (explicit or implicit) exceeds a value equal to the length of the sending field.
- All data receiving fields have been acted upon and the sending field still contains unexamined character positions.
When an overflow condition occurs
An overflow condition results in the following actions:
- No more data is transferred.
- The UNSTRING operation is terminated.
- The NOT ON OVERFLOW phrase, if specified, is ignored.
- Control is transferred to the end of the UNSTRING statement or, if the ON OVERFLOW phrase is specified, to imperative-statement-1.
imperative-statement-1
Statement or statements for dealing with an overflow condition.
If control is transferred to imperative-statement-1, execution continues according to the rules for each statement specified in imperative- statement-1.
If a procedure branching or conditional statement that causes explicit transfer of control is executed, control is transferred according to the rules for that statement.
Otherwise, upon completion of the execution of imperative-statement-1, control is transferred to the end of the UNSTRING statement.
When an overflow condition does not occur
When, during execution of an UNSTRING statement, conditions that would cause an overflow condition are not encountered, then:
- The transfer of data is completed.
- The ON OVERFLOW phrase, if specified, is ignored.
- Control is transferred to the end of the UNSTRING statement or, if the NOT ON OVERFLOW phrase is specified, to imperative-statement-2.
imperative-statement-2
Statement or statements for dealing with an overflow condition that does not occur.
If control is transferred to imperative-statement-2, execution continues according to the rules for each statement specified in imperative- statement-2. If a procedure branching or conditional statement that causes explicit transfer of control is executed, control is transferred according to the rules for that statement. Otherwise, upon completion of the execution of imperative-statement-2, control is transferred to the end of the UNSTRING statement.
END-UNSTRING
This explicit scope terminator serves to delimit the scope of the UNSTRING statement. END-UNSTRING permits a conditional UNSTRING statement to be nested in another conditional statement. END-UNSTRING can also be used with an imperative UNSTRING statement.
Tips:
- The operands should be non-numeric.
- The POINTER and COUNT operands, if any, must be positive integers that are their pictures should contain only 9's.
- INITIALIZE the receiving items before the UNSTRING, to remove unwanted characters that may be left from a prior operation.
- Use the OVERFLOW clause to detect field overflow on the receiving field(s).
Example 1:
UNSTRING ID-SEND DELIMITED BY DEL-ID OR ALL "*"
INTO ID-R1 DELIMITED IN ID-D1 COUNT IN ID-C1
ID-R2 DELIMITED IN ID-D2
ID-R3 DELIMITED IN ID-D3 COUNT IN ID-C3
ID-R4 COUNT IN ID-C1
WITH POINTER ID-P
TALLYING IN ID-T
ON OVERFLOW GO TO OFLOW-EXIT
|
Note: All the data receving fields are defined as alphanumeric
Let us see the execution of UNSTRING statement,
Where,
ID-P(pointer) = 21
ID-T(tallying field) = 05
|
Note: after executeion - both initialized to 01 before execution.
Order of Execution below:
- 3 characters are placed in ID-R1.
- Because ALL * is specified, all consecutive asterisks are processed, but one one asterisk is placed in ID-01.
- 5 characters are placed in ID-R2.
- A ? is placed in ID-R2. The current receivin field is now ID-R3.
- A ? is placed i ID-D3; ID-R3 is filled with spacces. no characters are transferred, So 0 is placed n ID-C3.
- No deimiter is encountered before 5 characters fill ID-R4; 8 is placed in ID-C4, representing the number of characters examined since the last delimiter.
- ID-P is updated to 21, the total length of the sending field +1. ID-T is updated to 5, the number of fields acted upon +1, since there are no unexamined characters in the ID-send, the OVERFLOW EXIT is not taken.
Example 2:
........
FILE SECTION.
* Record to be acted on by the UNSTRING statement:
01 INV-RCD.
05 CONTROL-CHARS PIC XX.
05 ITEM-INDENT PIC X(20).
05 FILLER PIC X.
05 INV-CODE PIC X(10).
05 FILLER PIC X.
05 NO-UNITS PIC 9(6).
05 FILLER PIC X.
05 PRICE-PER-M PIC 99999.
05 FILLER PIC X.
05 RTL-AMT PIC 9(6).99.
*
* UNSTRING receiving field for printed output:
01 DISPLAY-REC.
05 INV-NO PIC X(6).
05 FILLER PIC X VALUE SPACE.
05 ITEM-NAME PIC X(20).
05 FILLER PIC X VALUE SPACE.
05 DISPLAY-DOLS PIC 9(6).
*
* UNSTRING receiving field for further processing:
01 WORK-REC.
05 M-UNITS PIC 9(6).
05 FIELD-A PIC 9(6).
05 WK-PRICE REDEFINES FIELD-A PIC 9999V99.
05 INV-CLASS PIC X(3).
*
* UNSTRING statement control fields:
77 DBY-1 PIC X.
77 CTR-1 PIC S9(3).
77 CTR-2 PIC S9(3).
77 CTR-3 PIC S9(3).
77 CTR-4 PIC S9(3).
77 DLTR-1 PIC X.
77 DLTR-2 PIC X.
77 CHAR-CT PIC S9(3).
77 FLDS-FILLED PIC S9(3).
-----
|
In the PROCEDURE DIVISION, these settings occur before the UNSTRING statement:
A period (.) is placed in DBY-1 for use as a delimiter.
CHAR-CT (the POINTER field) is set to 3.
The value zero (0) is placed in FLDS-FILLED (the TALLYING field).
Data is read into record INV-RCD, whose format is as shown below.
column 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890
DATA ZYFOUR-PENNY-NAILS 707890/BBA 475120 00122 000379.50
|
UNSTRING statement:
* Move subfields of INV-RCD to the subfields of DISPLAY-REC
* and WORK-REC:
UNSTRING INV-RCD
DELIMITED BY ALL SPACES OR "/" OR DBY-1
INTO ITEM-NAME COUNT IN CTR-1
INV-NO DELIMITER IN DLTR-1 COUNT IN CTR-2
INV-CLASS
M-UNITS COUNT IN CTR-3
FIELD-A
DISPLAY-DOLS DELIMITER IN DLTR-2 COUNT IN CTR-4
WITH POINTER CHAR-CT
TALLYING IN FLDS-FILLED
ON OVERFLOW GO TO UNSTRING-COMPLETE.
|
Because the POINTER field CHAR-CT has value 3 before the UNSTRING statement is performed, the two character positions of the CONTROL-CHARS field in INV-RCD are ignored.
After the UNSTRING statement is performed, the fields contain the values shown below.
Field | Value |
DISPLAY-REC | 707890 FOUR-PENNY-NAILS 000379 |
WORK-REC | 475120000122BBA |
CHAR-CT (the POINTER field) | 55 |
FLDS-FILLED (the TALLYING field) | 6 |
Execution explanation:
- Positions 3 through 18 (FOUR-PENNY-NAILS) of INV-RCD are placed in ITEM-NAME, left justified in the area, and the four unused character positions are padded with spaces. The value 16 is placed in CTR-1.
- Because ALL SPACES is coded as a delimiter, the five contiguous space characters in positions 19 through 23 are considered to be one occurrence of the delimiter.
- Positions 24 through 29 (707890) are placed in INV-NO. The delimiter character slash (/) is placed in DLTR-1, and the value 6 is placed in CTR-2.
- Positions 31 through 33 (BBA) are placed in INV-CLASS. The delimiter is SPACE, but because no field has been defined as a receiving area for delimiters, the space in position 34 is bypassed.
- Positions 35 through 40 (475120) are placed in M-UNITS. The value 6 is placed in CTR-3. The delimiter is SPACE, but because no field has been defined as a receiving area for delimiters, the space in position 41 is bypassed.
- Positions 42 through 46 (00122) are placed in FIELD-A and right justified in the area. The high-order digit position is filled with a zero (0). The delimiter is SPACE, but because no field was defined as a receiving area for delimiters, the space in position 47 is bypassed.
- Positions 48 through 53 (000379) are placed in DISPLAY-DOLS. The period (.) delimiter in DBY-1 is placed in DLTR-2, and the value 6 is placed in CTR-4.
- Because all receiving fields have been acted on and two characters in INV-RCD have not been examined, the ON OVERFLOW statement is executed. Execution of the UNSTRING statement is completed.
If you have any doubts or queries related to this chapter, get them clarified from our Mainframe experts on ibmmainframer Community!