U. S. Patent 5,598,564 - Details, Part One

U. S. Patent 5,598,564 - Jan 28, 1997

SINGLE PASS METHOD OF COMPRESSING DATA TRANSMITTED TO COMMAND DRIVEN TERMINAL

9 DETAILED DESCRIPTION

Turning next to the drawing figures in which like numerals represent like parts and steps, the preferred embodiment of the present invention will now be described. The environment in which the present invention is most likely to be used is one in which a large main frame host computer 10 is operated communicating with a plurality of video terminals that include addressable screen buffers. In Fig.1 a set of k locally connected terminals are represented by terminals 11a-11k. These are connected over bidirectional links 12a-12k. Also shown is attached to host computer 10 n remote terminals 15a-15n. While these may be connected over a wide variety of telecommunications facilities, they are represented by the most common connection configuration, i.e., connection via plurality of modems over the public switch telephone network, leased lines, or through commercial networks. A plurality of local modems 16 are connected to host computer 10 via serial links 17. Each of the local modems 16 is connected to a remote modem 19 via an intervening telecommunications link 18, which is of arbitrary length. Remote modems 19 are connected to remote terminals 15 via serial data links 20.

As noted hereinabove, the value of n often exceeds 100, and can be a very large number for complex systems such as airline reservation computers and the like. Principle purposes of the present invention are to reduce CPU overhead caused by use of compression, to reduce the amount of the memory of host computer 10 that needs to be allocated for an efficient data compression function for the communication between host computer 10 and remote terminals 15, and to reduce the amount of data that flows over data links 17, 18 and 20. The present invention may also be used to reduce the amount of data over local data links 12 but this is less critical as these connections normally operate at a higher bit rate than the connections to the remote terminals.

The method of the present invention is preferably executed by a program running concurrently as a task on host computer 10. The basic logic of the preferred embodiment is illustrated in Figures 3-5. First, some terminology will be addressed. As noted hereinabove, the video terminals with which the present invention was designed to be used have a concise defined command language. These commands control the writing of data into the screen buffer of the remote terminals 15 (Fig.1).

Figures 2A and 2B show block representations of the flow of data through program modules of the environment in which the present invention is used. Fig. 2A shows prior art flow of data and information and Fig. 2B shows flow in a system embodying the method of the present invention. Turning first to FIG. 2A, host computer 10 is shown as running application program 25 that passes data to VTAM software module 26 over bidirectional link 27.

10

VTAM 26 likewise passes data from the terminals back to the application over bidirectional link 27. Bidirectional link 28 represents the bidirectional path for input and output through the VTAM module that connects the application to a terminal controller 29 over bidirectional link 30. Thus, it should be understood that bidirectional link 27 represents a defined interface between the input to and output from an application program and the VTAM software module which accepts and provides such data according to the definition of the interface, and provides terminal specific commands over bidirectional link 30 to controller 29 according to information provided to the VTAM software module 26 about the physical nature of the devices connected to controller 29.

Fig. 2B shows a representation of a system employing the present invention. Data flowing between application 25 and VTAM module 26 is intercepted by bidirectional link 31 and passed over bidirectional link 32 to a program module 35 that embodies the method of the present invention. This module in tum passes data over bidirectional link 36 to a bidirectional link 37 through the VTAM which allows the VTAM to perform its other tasks relative to the data. Bidirectional link 37 is connected to bidirectional link 30 that connects input and output from controller 29 to the software modules.

Thus, the terminal output from the application flows through path 27, 31, and 32. The compressed output data flows from single pass compressor program module 35 over path 36, 37, and 30. Inbound data from the terminals flows over path 30, 37, and 36 to compressor 35. Because compressor 35 strips all MDTs set by the application, as explained in greater detail hereinbelow, the compressor module adds back to the data provided to the application over the path of links 32, 31, and 27, any data that the application is expecting because of application set MDTs. Thus, the single pass compressor maintains internal information about changes it has made to data coming from application 25 to link 30 and compensates for these changes when it passes data from link 30 back to application 25 so that the operation of single pass compressor 35 is transparent to application 25 running on host computer 10.

It should be noted that other prior art data compression methods are implemented by interposing modules between application 25 and VTAM module 26. However, the inventor of the present invention believes this is the only single pass compressor and believes it is the most efficient method of providing a minimum string length data compressor, in terms of CPU overhead, in this environment.

Before proceeding with a detailed description of the implementation of the preferred embodiment, it will be explained in terms of a simplistic example that should be of assistance in understanding its operation and the terminology employed in this specification. A Figures 3A through 3C show a simplified terminal screen that is having one field of data added to it as a result of output from an application. Fig. 3A shows a present command string representation 37 of the screen of Fig. 3B. A set of current command strings 38 are provided from the application in order to write the resultant screen shown in Fig. 3C. The output objects of the preferred embodiment are shown as write command string 39, erase/write command string 40, and the expected state command string 50. Fig. 3B shows three words displayed on a computer screen. While it shows alphabetic data, these may be thought of as suggesting fields for a data base program as might be used by a business extending credit.

11

In Fig. 3A the current command string representation of the screen of Fig. 3B is shown. This is indicated as two Start Field orders and three Set Buffer Address (SBA) orders. The first Start Field order sets all following display (until the next SF order) to have attribute normal. The normal attribute means the display area is unprotected for input and low intensity display. The second SF order sets the protected from operator input attribute, the non-display attribute, and the MDT bit is set on. The Set Buffer Address order sets a pointer in the controller to a particular address in the terminal's buffer and writes the data or order that follows the SBA command into the buffer commencing at the address pointed to by the SBA command. Those skilled in the art will recognize that the normal command format for SBA commands specifies a particular buffer address. However, in Fig. 3A the SBA commands are shown in a row-column format for the sake of allowing an easy understanding of the correspondence between the commands and the output represented in A HREF="Fig3'564.html">Figures 3B and 3C.

Command string representation 37 includes SBA commands indicated by brackets 41 for writing the first SF order and the word "COMPANY" beginning at row 1, column 1, and the word "ADDRESS" at row 2, column 2. A third SBA order, indicated by bracket 42, is for writing the word "PHONE" beginning at row 3, column 5. This output is represented in Fig. 3B. It should be understood that, at the moment represented by command string 37, this is a command string representation that indicates the entire history of what has been written to the screen since the last time the application completely erased the buffer.

The set of current commands is shown at 38 on Fig. 3A. This duplicates the commands to write the first SF order, the words "COMPANY" and "ADDRESS". Additionally, a command to write additional verbiage to the screen, indicated by bracket 45, is present. This indicates that the buffer address should be set to row 2, column 15 and the word "CREDIT" should be written.

Three output objects of the preferred embodiment are indicated at 39, 40, and 50. Output object 39 is the write command string (WRT), output object 40 is the erase/write command string (E/W), and output object 50 is the expected state command string (ESC) which will become the CSR for the next input or output operation. It should be noted that the set of current command strings is sorted into ascending buffer address order, keeping in mind that buffer addresses commence with the address corresponding to row 1, column 1 and increase going left to right across an entire row, then moving to the left hand end of the screen for the next row. The statement that the present invention sorts strings in buffer address order indicates that it sorts them according to the lowest buffer address that is impacted by the command. The details of the sort logic are described hereinbelow.

Viewing command string representation 37 and the set of current command strings 38 together, the operation of the analysis and merger of the present invention can be intuitively appreciated. The orders indicated by brackets 41 in the command string representation 37 are duplicated in the set of current commands 38. Therefore, each of these orders is brought into the erase/write command string and the ESC, as indicated by dashed lines 46a and 46b.

12

The preferred embodiment detects that the command 45 in the set of current commands 38 precedes command 42 in the current command string representation and therefore, the former is brought into the erase/write command string first, as indicated by dashed line 47. Dashed line 48 indicates that the last command of the erase/write command string is command 42 from the current command string representation.

In creating the write command string output object, the present invention notes that the pair of commands indicated by brackets 41 are the first two commands of the set of current command strings simply duplicate commands already present in command string representation 37. Therefore, both of these are eliminated when creating the write command string 39. The preferred embodiment further detects that command 45 is writing new data to a particular portion of the screen buffer starting at row 2, column 15. Therefore, command 45 is the only one that appears in the write command string 39, as indicated by dashed line 49.

As noted hereinabove, after write command string 39 and erase/write command string 40 are created, the preferred embodiment compares the length of the two and transmits the shorter. In the present case, the write command string 39 is the shorter and it is transmitted to the terminal. Viewing the previous state of the screen shown on Fig. 3B, it will be appreciated that the write command string 39 will create the screen display shown in Fig. 3C.

It should further be appreciated that, if the erase/write command string had been sent, preceded by an erase/write command, the identical screen shown in Fig. 3C would be produced.

It should be also noted that the final SF and data in the CSR as shown by bracket 51 are present in neither the WRT nor the E/W command string because the DARK specification prevents the data from displaying and the protected attribute prohibits the terminal operator from modifying the field contents. This data must be reflected in the ESC so the present invention reflects this in the string indicated by bracket 52 in Fig. 3A. An input operation in which data was provided from the terminal back to the application would cause the compressor to insert the INTERNAL DATA along with data from the terminal in the data sent to the application running on the host because the MDT bit is set in the ESC.

Before proceeding to the detailed description, some of the terminology used in this application will be explained. First of all, the particular details of the commands, attribute, bytes, write, control characters, order and the like for the 3270 family of terminal devices made by International Business Machines of White Plains, N.Y. ("IBM") are well known to those skilled in the art. These are documented in a publication by IBM entitled "3270 Control Unit Reference Summary", IBM Document No. GX20-1878-6. The information contained in said publication is well known to those skilled in the art and said publication is hereby incorporated by reference exactly as if set forth herein.

13

The buffer control orders are particular bytes in the order code and have the following mnemonics associated therewith.

SF	START FIELD	an order to start a field at the current buffer address.
SBA	SET BUFFER ADDRESS	an order to set the pointer to a particular buffer address into which the data following the SBA order code will be written.
IC	INSERT CURSOR	an order to insert a cursor at a particular address.
PT	PROGRAM TAB	an order to tab the buffer pointer to the next unprotected field.
RA	REPEAT TO ADDRESS	an order to fill all addresses between the current location and a designated address in the order with the character that is an argument of the order.
SFE	START FIELD EXTENDED	same as START FIELD but includes extended attributes for terminals with color and other more sophisticated attribute characteristics.
EUA	ERASE UNPROTECTED TO ADDRESS	an order to erase all unprotected data from a current address to a specified address.
MF	MODIFY FIELD	an order' to modify attributes and characteristics of a field.
SA	SET ATTRIBUTE	an order to set an attribute characteristic for following character data.

Attributes of screen locations are contained in attribute characters which are transmitted by the application and stored in the terminals buffer. There are several attributes which control the type of data to be displayed. The important attributes that must be dealt with by the present invention are whether the protected attribute flag bit is set and whether the modified data tag (MDT) is set. It should be noted that the first two bits of the attribute character are not used to determine the attributes of a field or buffer location. The preferred embodiment of the present invention takes advantage of this particular feature of the command language for 3270 type terminals, although it does not form an essential part of the underlying invention.

Several mnemonics have been adopted for use in this specification in order to clarify the disclosure and to make the drawing figures more readable. The following list gives the mnemonics used and their significance in this specification.

CSR-COMMAND STRING REPRESENTATION. The current representation, maintained by the program implementing the present invention in 3270 language format that represents the current contents of the buffer. This representation would write the entire state of the buffer, if sent to the terminal buffer after it had been cleared, which state would be expected by the application that has been run in conjunction with the preferred embodiment to generate the command string representation.

CCS-CURRENT COMMAND STRING. The incoming string of commands and orders to the terminal that is generated by the application program running on the host.

CSRP-COMMAND STRING REPRESENTATION POINTER. The pointer to a buffer address with a character or order that is currently being analyzed and merged with CCS by the preferred embodiment. It is the buffer address into which the currently processed character or order would be written if the CSR were transmitted to the buffer.

CCSP-CURRENT COMMAND STRING POINTER. A pointer to a particular buffer address for the data or order in the current command string that is currently being analyzed and merged with the CSR by the preferred embodiment. It points to the buffer address into which the data or order currently being processed would be written if the current command string were transmitted to the buffer.

14

ESC-EXPECTED STATE COMMANDS. One of the output objects of the preferred embodiment. A list of commands in ascending buffer address order that represents the new state of the buffer as a result of the merger of the current command string representation and the current command string. Upon completion of the merger, the expected state commands (ESC) will become the new command string representation (CSR).

E/W-ERASE/WRITE. The erase/write command string that is another output object of the preferred embodiment. It is a string of commands in 3270 command language that would write to the screen all of the data that is visible to the user of the terminal following an erase/write command.

WRT-WRITE COMMAND STRING. The third output object of the preferred embodiment, a string of commands in 3270 command language that would write all necessary data to the screen to provide all necessary output visible to the terminal user using only a write command.

With that background, the preferred embodiment will now be described in detail.

Fig. 4 shows the overall logical structure of the preferred embodiment of the present invention. Fig. 4 shows operation of the preferred embodiment for creating the three output objects in response to receipt of a current command string from the application. It should be noted that the preferred embodiment also handles inbound data from the terminal and can modify and possibly expand same, depending on the current state of the command string representation, so that the application receives its expected data. Logic is used to create the expected state commands (ESC) from merger and analysis of the command string representation and information returned by the terminal is identical to that for updating the ESC by analysis and merger of the CSR with the CCS except the MDTs are set as appropriate and different output objects are created. In both cases, the expected state command output object is created and, upon completion of the analysis and merger, becomes the new command string representation.

Turning now to Fig. 4, the first step is shown at 60 where the current command string from the application running on the host is obtained. Referring for a moment to Fig. 2B, this output from the host comes via links 27, 31, and 32 into the code that is implementing the process of the preferred embodiment. The balance of the steps illustrated on Fig. 3 take place within the code segment 35 that implements the preferred embodiment.

Next, the CCS is sorted by buffer address at a routine indicated by step 61. This is described in detail hereinbelow in connection with Fig. 5. For the moment, it is sufficient to note that the output of the sort is a set of commands in 3270 command format in ascending buffer address order. As will become apparent from reading the balance of the disclosure, it is much more efficient to first sort the current command string and then analyze and merge it with the existing command string representation than to deal with the commands in unsorted order. One of the results of this approach is that all of the output objects are, of necessity, in ascending buffer address order, including the expected state commands that become the new command string representation. Thus, the preferred embodiment inherently maintains its command string representation of the expected buffer contents in ascending buffer address order.

15

At steps 62 and 63, the CCSP and the CSRP are set to the first buffer address that is affected by a command in the CCS and a command in the CSR, respectively. A reentry point A is shown at 65 and is the top of the outer loop of the analysis and merger logic that lies below it in Fig. 4. Thus, once reentry point 65 is reached, the program will operate to analyze and merge the CCS and CSR until all three output objects have been created and it exits to determine the shorter of two of the objects and transmit one onto the terminal buffer.

A plurality of test steps lead to four blocks labeled states 100, 200, 300, and 400. These are the four basic families of submachines in the state machine implementation of the preferred embodiment. The tests that determine which of these four state submachines will be entered are referred to as the state zero tests. The fundamental logic of the preferred embodiment may be appreciated from considering the following and viewing Fig. 4. At procedure 61, the incoming set of current command strings is sorted in ascending buffer address order. As noted above, the command string representation (CSR) is inherently maintained in ascending buffer address order as a result of sorting the CCS and the way that the analysis and merger routine works. The basic logic of the preferred embodiment is to provide the above described pointers to the next address that will be operated on by a CCS command and a CSR command, and to determine the following from these pointer values. The existence of more buffer addresses that are affected by the CSR or the CCS is determined by whether the pointer value is less than a predetermined maximum that lies outside the address range of the buffer. The routine that increments the pointers sets the pointers to this highest address when it discovers that it has no more commands in the respective one of the strings to process.

Once it is determined whether there are more addresses that are affected by the CCS and CCR, the values of the pointer are tested. If they are unequal, then the lesser value pointer points to the next buffer address that needs to be processed for purposes of creating the expected state command string (ESC) and the erase/write string (E/W). Depending on whether the information being written to these objects comes from the command string representation (representing data that is already considered to be on the screen by the application) or whether it is new data from the CCS will determine whether or not that data appears in the write (WRT) output object.

If the pointer values are equal it means that the currently processed data in the CCS is writing over data that has already been written to by a previous command and, according to the CSR, is writing to a buffer address that the application considers to have already been written to by previous output. When this occurs, the present invention tests to see if the data at these locations from the CCS is different from that in the CSR for the same locations. If it is different, it is written to all output objects and if it is the same, it is written to the E/W and the ESC, but not the WRT.

Returning to Fig. 4, the implementation of this logic used in the preferred embodiment is described.

At step 66, the first test is to determine if the command string representation exists. As noted hereinabove, this is done by testing the value of the CSR pointer to see if it has been set to the maximum address beyond the highest valid buffer address. If the CSR is absent, NO branch 67 is taken to test 68 which tests the CCSP to see if there is any data in the current command string, CCS.

16

If there is none, both strings are exhausted and NO branch 69 is taken to exit point 70. When the logic indicated in Fig. 4 is exited, the preferred embodiment tests the relative lengths of the WRT and the E/W and sends the shorter of the two to the terminal.

If the test at step 68 is true, YES branch 71 is taken to the state 100 state machine, indicated as 72 on Fig. 3. As lo stated therein, the state 100 state machine processes data from the current command string into the E/W and ESC output objects until the CCS is exhausted, after which it exits at point 75. Since the state 100 is only reached through branches 67 and 71 (except as noted below), there is no current command string representation to process and thus, it follows that the CCS commands being processed are commands to write to a previously blank screen buffer. Thus, the processing of this data writes it to only the E/W and ESC objects.

If step 66 tests affirmative, YES branch 76 is taken to step 77 which is logically identical to step 68. If test 77 is affirmative, YES branch 78 is taken to terinary test 79 that compares the values of the pointers. Note that if branch 78 is reached, the machine has determined that there is more of both the CSR and the CCS to be processed. In this case, the particular one of the submachine states to be entered depends on the relative values of the pointers. If the pointer for the command string representation is the lesser of the two, branch 80 is taken. If the two are equal, branch 81 is taken, and if the current command string representation pointer is greater, branch 82 is taken from step 79.

At this point, it is appropriate to look ahead to see an advantage of using the finite state machine implemented in the preferred embodiment. The correspondence between some of the steps shown in Fig. 4 and those shown in Figures 6 and 7 will become apparent when the reader finishes with this entire specification. However, at this point, it should be noted that the test at steps 68 and 77 in Fig. 4 exist in only one definition in the source program for the code that implements the preferred embodiment. The result of the test branches on conditions MORE or FINISHED. However, if the MORE result is obtained at step 68, the state machine continues knowing that there is more of the CCS available and that no CSR existed because it got to step 68 only by traveling through a state in the state machine that corresponds to test 66.

The same test and result from step 77 drives a finite state machine with MORE CCS and MORE CSR. Simply by being in step 77 and executing the same tests as was executed at step 68, processing is continued as effectively as if two tests had been performed. Additionally, the test is threaded into the state machine definition at the point it is needed. This obviates the overhead of calling a subroutine but allows easy examination and maintenance of the source language while providing efficient computer operation when the compiled code implementing the machine is run.

If the CSRP is less, it means that the next data to be processed is that contained in the current command string representation. Therefore, branch 80 is taken to state 200 machine 87. Thus, at state 200 CSR data is processed into the E/W and ESC output objects until the command string representation pointer CSRP catches up with or passes the current command string pointer CCSP.

17

In other words, information from the current command string representation is written into the erase/write command output object and the expected state command string (which later becomes the new CSR) until the CCS is writing into the same locations as the CSR or a command in the CCS becomes the only command writing to the next buffer address.

In a similar vein, if the current command string pointer is the lesser of the two, the state 400 submachine 86 is entered that processes the CCS into all three output objects until the current command string pointer CCSP reaches or exceeds the command string representation pointer CSRP. Note that this always writes into all three output objects since by virtue of having reached state 400, the command being processed from the CCS is writing into new territory in the screen buffer.

If the two pointers are equal, and branch 81 is taken to state 300 submachine 87, the data being processed from the current command string is overwriting a buffer location that has previously been written to by the application. Therefore, the state 300 machine determines whether this data is different or the same as that which the application believes already exists in the buffer. It should again be noted that the state of what the application expects to be in the buffer is indicated by the command string representation. The state 300 machine continues this process until the pointers are no longer equal. As noted on Fig. 4, the state 300 machine can write into all three output objects. Naturally, when it detects that data from the CCS being processed at the CCSP location is identical to data from the CSR at the same location, this data is not written into the WRT output object since them is no need to rewrite the data to get the visual image on the screen to conform to the new expected state of the display. Each of the submachines is exited at one of points 88, or 94a or 94 b. The state 300 submachine is exited at point 94a and loops back to point A at 65, to the test for more of the current command string at 77. State submachine 400 is exited at point 94b, which takes it to the state indicated by node B at 88. Thus, state submachines 200 and 400 logically exit to the same state. From here, tests for more of the command string representation or the current command string are conducted at 95 and 96. If either of these exists, control is looped back to point A, as shown at 97, or point C as shown at 98, respectively. If there is no more of either string, analysis and merger of this current command string has terminated and the state machine terminates operation at 99.

Lastly, what happens if NO branch 89 is taken from step 77 should be considered. This leads to a set of routines indicated at 90 as other tests for new data and newly undark data that have alternative exit points. Note that tests 90 are only reached when there is some information in the current command string representation CSR but the current command string CCS has been exhausted. If this is new data, i.e., the application has transmitted an erase command as part of the most recently processed CCS, then the balance of the CSR is obliterated and tests 90 are exited via branch 91 shown in phantom. If tests 90 determine that it is only existing data, but that it has been newly undarkened by an order in the most recently processed CCS, branch 92 is taken that processes data from the command string representation into the E/W and ESC output objects. This is because it is necessary to maintain the integrity of the previously darkened data from the CSR in the output objects.

18

The last branch shown from routines 90 is branch 92b that merges with branch 71 from step 68. It should be understood that, in a manner that will become apparent from the detailed description of the state machine hereinbelow, the machine sets the source of the processed data to the CSR rather than the CCS before it enters submachine 100 via this route. Branch 92b is taken when there is more of the command string representation to be processed and the current command string has been exhausted, and the orders in the most recently processed CCS have not changed the DARK attribute of the locations of the data that is being processed from the CCS. Therefore, remaining data from the CSR that has its visual characteristics unchanged, is processed into the E/W and ESC output objects by the state 100 submachine until it is is exhausted.

One other aspect of the operation of the preferred embodiment should be mentioned here. The significant advantage is speed and reduction in host overhead that is accomplished by the present invention has been described. The present invention also exhibits a slight improvement in compression ratio over some prior art machines designed to generate minimum length strings of data such as that shown in the Harper et al. patent. The preferred embodiment detects a condition when creating its WRT output object that notes when a changed data character is followed by one or two bytes of redundant data, that are in turn followed by additional changed bytes. Rather than slavishly create an SBA command each time a changed byte is encountered, and closing it when it finds redundant data, the preferred embodiment notes when a single SBA command that includes both the first changed byte, the few bytes of redundant data, and additional changed data will be shorter than two separate SBA commands followed by their data. It then generates the shorter one in the WRT output object.

Consider the following example. Assume the screen contains the characters 1 2 3 sp sp, where "sp" represent a space character, and write a command from the application sends instruction to change this portion of the screen so that it reads 1 A 3 4 5. Prior art compressors would generate two SBA commands, the first for the A change. This command would be terminated when the redundant "3" was encountered and a new SBA command would be generated for writing the "4 5" string. Since each SBA command takes three bytes, a total of nine bytes would be transmitted by prior art compressors.

The preferred embodiment of the present invention notes that there is only a single character separating the first new data byte and the second new data byte and generates an SBA command designating the address of the first changed byte that includes the redundant data and the two following new data bytes. Therefore, it generates an SBA command for writing "A 3 4 5" which takes up seven bytes, i.e., two less bytes than the prior art. It is the inventor's belief that this attention to an opportunity to achieve further compression explains why the present invention has been found to have slight improvements in compression ratio, on the order of five percent, over the prior art in addition to its dramatic reduction in processor overhead.

By similar logic, an SF order (two bytes) may be chosen to advance a single position when the terminal already contains that SF order (yielding a savings of one byte) but an SFE (eight bytes) that was already in the terminal buffer would be replaced by an SBA, saving five bytes.

19

The current embodiment also retains pointers to duplicated characters and the most recent non-null, or protected buffer modification. Thus, when the state machine directs the subroutine POSITION to be called for the WRT object, the shortest order of the five possibilities, or a combination thereof, is selected:

SBA order
use one or two existing characters
use an existing SF order
Erase Unprotected to Address
Repeat to Address.

It is believed that the foregoing is sufficient to enable one skilled in the art to practice the method of the present invention. The balance of this disclosure is directed to a more detailed disclosure of the novel method of representing and implementing an algorithmic state machine that is embodied by the preferred embodiment, to provide a sufficient example so that the implementation may be understood, and to provide additional information about the best mode of practicing the current invention as is currently considered by the inventor of the present invention. To accomplish this, first details of the procedure represented by procedure 61 on Fig. 4 are described. Next, the state tables for the state zero tests and the state 100 state machine are provided together with state diagrams of same and an explanation of how these implement these portions of the preferred embodiment of the data compressor of the present invention.

Sort Logic

The physical hardware of the 3270 family of terminals has no restrictions as to the sequence in which screen orders and display data is presented. It is possible to write the entire screen contents in reverse order, although same is highly undesirable from a programming point of view. However, it is quite common to encounter applications that write data to terminal screens in an order other than ascending address sequence. This may result from factors such as the order in which it occurred to the programmer to include the information written to the screen, the order in which procedures or subroutines are called, each of which may write to the screen and how bug fixes or new features have been added to revisions of a program. Irrespective of how that is encountered, the first task of the preferred embodiment is to sort the incoming current command string into ascending buffer address order. The terminals always return data in buffer address order and therefore the son is not necessary when processing data traveling from the terminals to the application.

To avoid moving large amounts of data about during the sort process, the preferred embodiment creates relatively small descriptors of the information in the current command string and sorts these. In particular, the descriptor includes the following:

a pointer to the next string segment (or zero indicating the end)
character attribute at the string beginning
pointer to the actual data in memory
screen offset at the beginning of the string
screen offset at the end of the string
length of the string
sequence number.

The sequence number is the number of the order in which the command was received from the application.

20

This is to handle current command strings that, within the set of command strings, overwrite some of the data written by previous members of the set.

The CCS input is scanned and contiguous segments are described with the string descriptors having the above listed elements. Each time a segment is created, either due to the buffer address being non-contiguous or decreasing in value, the sequence number in the string descriptor is incremented. In this manner, if two portions of the data string place different characters in the same buffer location, the latter of the two to occur will have the higher sequence number, and the preferred embodiment can discard the other data that would have been written to that location had the entire CCS been transmitted to the terminal.

The string descriptors are sorted with a radix search by sequentially sorting on the following parameters of the string descriptor in the following order.

sequence number
screen offset at the end of string
screen offset at the beginning of the string.

Therefore, the screen offset at the beginning has the effect of being the most significant value in the radix sort. As those skilled in the art will know, a radix sort is one which simply sequentially sorts the members of a set from the least significant element to the most significant element. When this is finished, the string descriptors will be in an ordered list in order of screen offset at the beginning of the string. Once this is accomplished, the routine illustrated in Fig. 5 for determining if there is any overlap in the strings, and discarding appropriate portions of same, is executed.

In Fig. 5, the code segment illustrated is entered at step 210, which is the top of a loop. If there is no following string detected at test 260, the routine has completed and exits. The test at step 261 determines if the two strings overlap by seeing if the end of the current string is less than the start of the next string. If not, branch 263 is taken to step 219 which retains the THIS segment in the set and replaces it with the NEXT set, after which it loops back along branch 220 to test 260. If an overlap occurs, branch 264 is taken and the older portion of the overlap is discarded by the merge logic shown on the balance of Fig. 5. At step 211, the starting buffer address for the current descriptor is tested to see if it is equal to the starting buffer address for the next string descriptor. If it is not, NO branch 212 is taken and if it is, YES branch 213 is taken. Considering the NO branch for a moment, it will be appreciated that, by operation of the sort that has previously been performed, the starting address for the current descriptor must be less than that for the next descriptor when this branch is taken. This is because the radix sort, with the starting address is the most significant element, cannot leave the descriptor list in a state in which a higher number start address precedes a lower number start address. Therefore, the next inquiry is whether the current descriptor is newer than the next following descriptor. This is tested at step 215. Note that the current descriptor is for a newer segment if it has a higher sequence number, meaning that it occurred later in the CCS. If the result of this test is negative, NO branch 216 is taken to step 217 that tests to see if the starting buffer address for the next descriptor overlaps any portion of the command represented by the current descriptor. If this test is negative, the current descriptor is for a segment older than the next segment and NO branch 218 is taken to step 219 at which the next descriptor is obtained, and the program loops back to point A at step 220.

21

If there is no next descriptor at step 219, processing of the sorted list has been completed. After this, all the string descriptors are combined into one buffer containing all string segments in linked order of their respective string descriptors. When this is completed, the preferred embodiment moves to step 62 shown on Fig. 4.

From the description so far, it will be appreciated that if the current command string is a well ordered set of commands each of which writes to a unique portion of the screen buffer, the path described heretofore via branches 212, 216, and 218 to step 219 will be followed. This is because the starting addresses will never be equal and there will never be an overlap detected by step 217. In a similar vein, if the current command string consists of a set of commands that all write to unique addresses in the screen buffer, but they were issued by the application in an order other than ascending buffer address order, the results of the above described sort will be such that the path from step 210 through branch 212 will be followed to step 215. From here, for at least one member of the descriptor list, YES branch 221 will be taken from step 215 because at least one descriptor will describe a command that has lower starting and ending buffer addresses but a higher sequence number than its next member. However, so long as they write to unique portions of the screen, test 222 will test negative and NO branch 225 will be taken to step 226.

The split at step 226 splits the next string descriptor starting at a value for the end of the current descriptor plus one. When this occurs, the current descriptor is retained in the list, the left hand portion of the split next descriptor is retained in the list and will become the THIS descriptor the next time step 219 is performed, and the fight portion of the split is placed in the list at its appropriate location, based on the sort sequence. It should be noted that this can operate on a next descriptor that does not overwrite any of the previous descriptor in the list and therefore, the left hand portion will be kept and the fight hand portion will be a null descriptor that will be discarded at step 219. This is what happens when the current command set consists of a set of commands that all write to unique areas of the screen, but come from the host in other than ascending buffer address order. It should be noted that the split function described in the code represented by Fig. 5 at steps 226, 237, and 239 split the descriptor into two contiguous descriptors. References to the left and right results of the split refer, respectively, to the portions having lower buffer addresses and higher buffer addresses.

Step 222 tests whether the end address of the next descriptor in the list is less than or equal to the end address for the current descriptor. If it is not, the current descriptor should remain in the list unamended and the next descriptor is split as described above. If test 222 tests positive, this means that the next descriptor on the list describes an older command that is completely overlapped by the buffer space written to by the present, and newer, descriptor. When this occurs, the next descriptor is discarded at step 240.

Next consider what happens if the starting addresses for two string descriptors are equal. In this case, YES branch 213 is taken to test 227 that checks to see if the end point of the current descriptor and the next descriptor are equal. If they are, YES branch 228 is taken. Naturally, this means that the two descriptors simply describe coextensive strings and the descriptor for the older of the two, which will be the current one as a result of the sort, is discarded at step 229.

22

If the end points are not equal, NO branch 230 is taken from step 227 and it is then necessary to test the sequence numbers to see if the current descriptor represents a string that is older than the next one. This is accomplished by test 231. If the current descriptor represents an older string, YES branch 232 is taken to step 235 at which the descriptor for the current string is discarded. The program then moves to step 219 to obtain the next string descriptor. Note that once branch 232 is taken, it has been determined that the starting points of the strings represented by the descriptors are the same, the end points are different, but the current descriptor is the older. The fact that the two end points are different leads to the conclusion that the end point for the next following descriptor is larger than the end point for the current descriptor since buffer address end point is the next most significant element of the radix sort. This means that the next descriptor completely overlaps the current one and, since test 231 has indicated that the current one is older, it is appropriate to throw away the entire string for the current descriptor at step 235.

If step 231 tests negative, NO branch 236 is taken and the descriptor for the next string is split by step 237. As noted on Fig. 5, this split takes place at the current end point plus one and the left hand portion is discarded.

Again, relying on the result of the radix sort, it will be appreciated that if step 237 is reached, we have descriptors for two strings with the same starting address, the next descriptor has a larger end point address and the next descriptor is the older one.

23

Therefore, the current string descriptor describes a string which overlaps a portion of the string of the next descriptor and the next descriptor represents an older command. Therefore, the overlapping portion for the older command, i.e., the string of the next descriptor, is thrown away and the command for the present descriptor is maintained. The overlapping portions will be the left hand portion that results in the split at step 237 and thus, step 237 indicates that the left hand portion is to be discarded. A new descriptor is created that has as its beginning point the end point of the present and newer command plus one, and the remaining portion to the right (i.e., higher buffer addresses) is maintained in a new descriptor that is placed in the list at the appropriate location, per the radix sort.

A numerical example for descriptors of overlapping strings may make the operation of the program logic illustrated in Fig. 5 more intuitively appealing. Consider two string descriptors having elements 5, 15, and 1, and 10, 20, and 2. This triplet represents, in order, starting buffer address, ending buffer address, and sequence number. Thus, the radix sort will leave these descriptors in the order in which they are stated above.

When the (5,15,1) descriptor is the current descriptor, step 260 will test positive because (10,20,2) follows. Step 261 determines that an overlap exists due to 15 being greater than 10. Step 211 will test negative because the starting addresses are different. This leads to test 215 which will also be negative since the current descriptor has a lower sequence number than the (10,20,2) descriptor. Therefore, NO branch 216 will be taken to step 217. Step 217 tests positive and therefore YES branch 238 is taken to step 239. Step 217 is positive because the next starting address, i.e., 10, is less than or equal to the end of the current segment, that is, 15. Therefore, the current string is split at the value of the next starting address minus 1, i.e., 9. This leaves a descriptor for addresses 5 through 9 (5,9,1) in the list as the left hand portion that is kept, as indicated at step 239. The right hand portion of the split descriptor which will have elements (10,15,1) will become the new next descriptor. This will be discarded as is completely overlapped by a newer string, via a path traveling along branches 213, 230 and 232 on the next pass through the logic. That path will be taken any time there are identical beginning points with the string descriptors and the newer (i.e., higher sequence number) string has an end point that is greater than the end point for the current string.

Next consider two strings having descriptors (5,20,1) and (10,15,2). These will sort in this order. On the first pass through, with the (5,20,1) descriptor as the current one branch 212 will be reached and branch 216 will be taken since the current descriptor is the older of the two. Step 217 will test positive and the current descriptor will be split into two descriptors at step 239, namely (5,9,1) and (10,20,1). The first is kept.

It should be noted here that the split routine executed at step 239 takes the new fight hand portion and inserts it into the linked list of descriptors at its appropriate place according to the sort rules. Since the new right hand descriptor that results from the split, i.e., (10,20,1) should follow the NEXT descriptor (10,15,2), according to the rules of the radix sort, the split routine inserts it in the list below what was the NEXT descriptor in the most recent pass. In this manner, the integrity of the logic of the program segment shown on Fig. 5 that is based on the assumption that the radix sort was properly performed is maintained.

On the next pass through the (10,15,2) descriptor will be the current segment and the (10,20,1) will be the next segment. For this pass, step 211 tests true and step 227 tests false. Step 231 tests false since the (10,15,2) descriptor is newer than the next one. Therefore, the (10,20,1) descriptor is split into (10,15,1) and (16,20,1) at step 237 and the left hand segment is discarded keeping the (16,20,1), which is the proper result.

Back to Patent's Contents