The Compress Data Base File command compresses a single member,
generic members, all members of a physical data base file or data in
a save file into an output file. The amount of reduced space is very
data dependent. Typical reduction ranges from 50 to 80 percent for
physical file data. The companion command is DCPDBF to de-compress
the data back to its original form.
The CPRDBF tool is intended for the case where you are going to
transmit data to another system. In general, it is desirable to
compress the data before transmitting it to keep the transmission
time to a minimum.
The system SAV commands support the capability to compress the data
into a save file with the DTACPR(*YES) option. However, the system
compression technique is not very efficient on commercial data. You
will generally find that CPRDBF will provide much better compression.
The following provides some examples of the compression achieved with
CPRDBF.
Physical data file without keys
-------------------------------
Original data size 569,344
SAVF size - DTACPR(*YES) 413,696 27% savings
CPRDBF size 217,088 70% savings
RPG source file with 73 members
-------------------------------
Original file size 1,847,296
SAVF size - DTACPR(*YES) 675,840 63% savings
CPRDBF size 413,696 78% savings
Typical application
--------------------
Assume you want to transmit only data files (either source or data or
both) to another system.
You may compress one or more physical files with one or more members
using CPRDBF. Each file to be compressed requires a unique CPRDBF
command. A typical command would look like:
CPRDBF FROMFILE(xxx) CPRDBFLIB(zzz)
The TAACPRDBF file is created by CPRDBF with the compressed data.
You could add other compressed data to the same TAACPRDBF file with
additional CPRDBF commands. You would then transmit the TAACPRDBF
file to another system.
At the other system, the DCPDBF (De-compress) command would be used
to de-compress the TAACPRDBF data back to the original file names. A
typical command would be:
DCPDBF TOLIB(xxx) CPRDBFLIB(zzz)
The members of the file do not have to exist in the corresponding
files, but the files must exist. A different library could be used.
CPRLIBDBF command
-----------------
A separate command is provided with the tool to compress all physical
files in a library to the same TAACPRDBF file.
TAACPRDBF File
--------------
The TAACPRDBF file is automatically created by either the CPRDBF or
CPRLIBDBF command if it does not exist. The file is defined with a
200 byte record length and does not have any keys.
The file has only a single member but may contain the data for
multiple physical file members or save files. Heading information
exists within the single member to logically separate the data from
multiple physical file members or save files.
The PRTCPRDBF command may be used to print a list of the members that
are contained within the TAACPRDBF file.
Internal checking ensures that every record that is de-compressed
matches the record length of the file where the data is to be
written.
Therefore, you must ensure that the file to be de-compressed to has a
duplicate definition (in terms of record length) as the original
file.
Spooled file output
-------------------
When CPRDBF is run, an optional spooled file will be created
describing the results. The spooled file will contain one line per
member that has been compressed. A comparison is included of the
original data space size and the compressed size. Data space values
are determined by multiplying the number of records in the file by
the record length. Deleted records are not considered.
You may optionally request that a comparison be made of the object
size of the two objects. This level of information is only
meaningful if you compress the data from a single file (it may have
multiple members) or save file. The information is intended to
assist you in evaluating the effectiveness of the compression
technique.
When CPRLIBDBF runs, it uses CPRDBF for each physical file or save
file to be compressed. A separate spooled file is created for each.
An option on CPRLIBDBF can be used to delete the spooled files.
The DCPDBF command also produces an optional spooled file for each
file that was de-compressed.
The PRTCPRDBF command produces a spooled file with one line for each
member or save file in the TAACPRDBF file.
Save file support
-----------------
CPRDBF can be used to compress a save file. However, the compression
technique used by CPRDBF is not effective on save files that have
been saved with DTACPR(*YES). For the best compression, use CPRDBF
directly on physical files.
CPRDBF can be used to compress a save file containing any object
type. The save file could have been created by any save command.
The compression results are very data dependent, but you should avoid
using the DTACPR(*YES) function when saving to the save file if you
are going to use CPRDBF on the save file.
See the later discussion on 'Compression results'.
Using multiple libraries
------------------------
The simplest use of the CPRDBF tool is to use one TAACPRDBF file for
each library containing data to be compressed. This allows the
DCPDBF command to default to de-compress all data to the same
library. The library may differ from the original library, but the
files must exist.
You can add to an existing TAACPRDBF file with data from different
libraries. When you de-compress the data, you must use separate
DCPDBF commands for each library.
Testing
-------
To test the technique used, you could do the following:
** Use CPRDBF to compress a file into QTEMP. The TAACPRDBF file
will be automatically created.
** Use CRTDUPOBJ to create a duplicate of the original file
(without any data) into QTEMP.
** Use DCPDBF to de-compress the data in the TAACPRDBF file into
the newly created duplicate in QTEMP.
** Use a function like the CMPDBF or CMPSRC2 TAA Tools to compare
the two versions of the file.
** To compare save files, you must first convert the data in the
save file to a data base file (such as by use of the
CPYFRMSAVF TAA Tool) and then use CMPDBF.
Compression technique used
--------------------------
The compression technique used is RPG code which looks for the
following types of data:
** A string of blanks or X'00's.
** Packed fields which are all zeros (such as X'000F')
** Packed fields which have a value of 1 (such as X'00001F')
** A string of characters repeated from the previous record
** A repeated character (such as a string of asterisks)
Special characters in the range of X'EA' - X'EF' and X'FA' - X'FF'
are placed in the compressed data to signal a compression technique.
The special characters may be followed by a binary count (one byte)
or a character or both. There is no restriction on the use of these
special characters within the data file to be compressed. If one of
the special characters is found in your data, a special character is
output followed by your byte of data.
Compression results
-------------------
The results from CPRDBF are very data dependent.
CPRDBF works best on typical commercial data in physical files
(either data or source).
Typical commercial data or source compresses very well. It would be
normal to see results that reduce the size by 50 to 80%. Files with
a good deal of text data (non-blanks) do not compress well.
A single small source member as the only member in the TAACPRDBF file
does not compress well because of the amount of overhead for the
TAACPRDBF member and file control blocks. Multiple source members
normally compress well.
CPRDBF is generally not as effective when compressing save file data.
CPRDBF may produce a larger size if you attempt to compress a save
file saved with DTACPR(*YES).
** You will generally see a reasonable gain if a data base file
is saved to a save file with DTACPR(*NO) and then compressed
with CPRDBF. However, the results will not be as good as
using the CPRDBF command directly on the physical files.
** Objects which are not physical files and are saved to a save
file with DTACPR(*NO) generally do not compress as well as
physical file data. The results can vary significantly
depending on the the type of object and attribute.
For example, the makeup of a CL program object differs from an
RPG program object. Different internal approaches are used
which can lead to different compression results. The CRTCLPGM
command also defaults to create the program with
ALWRTVSRC(*YES) which means the CL source (minus the comments)
is stored with the program. Since this appears as a string of
unique text, it does not compress very well. A program may
also include observability which has different characteristics
from the instruction stream of a program.
The best solution is generally to try some of your own typical files
and test the differences.
CPRDBF Command parameters *CMD
-------------------------
FROMFILE The qualified file name of the file to be
compressed. The library value defaults to *LIBL.
*CURLIB may also be used.
A physical file (data or source) or a save file may
be specified. A logical file may not be used.
CPRDBFLIB The library which will contain the TAACPRDBF file.
*LIBL is the default, but may not be used if the
TAACPRDBF file is not found on the library list.
*CURLIB may be specified.
If a library is named and the TAACPRDBF file does
not exist, the file is created with a length of 200
bytes. The file will contain only a single member.
If a TAACPRDBF file does exist, it will be used.
You must consider the REPLACE option for the first
use of the file.
FROMMBR The member to be compressed from the FROMFILE. The
default is *FIRST.
A single member may be specified, a generic member
name, or *ALL for all members. If a file has only a
single member, there is no difference between *FIRST
and *ALL.
If the file to be saved is a save file, a specific
member name may not be used.
REPLACE Whether to replace the data in the existing
TAACPRDBF file. The default is *NO which is used to
make it convenient to compress multiple files into
the same TAACPRDBF file with separate CPRDBF
commands.
REPLACE(*YES) may be specified if you have an
existing TAACPRDBF file containing data and want to
start a new set of files to be compressed.
DLTSPLF Whether to delete the spooled file. A spooled file
is created with one line per member that has been
compressed. It describes the number of records and
the amount of compression achieved.
*NO is the default to retain the spooled file.
*YES may be specified to delete the spooled file.
CMPSAVINGS Whether to print a comparison of the savings. The
default is *NO meaning no comparison is made.
*YES may be specified to cause a comparison. The
DSPOBJD size of the specified file and the same size
of the TAACPRDBF file are compared.
This comparison is a better indication of the
savings than that which appears with each member in
the spooled output. Because of this technique, the
comparison is only meaningful when all members of a
file are compressed.
The intent of the parameter is to allow for a simple
means of comparisons when evaluating the
effectiveness of the CPRDBF tool.
DCPDBF Command parameters *CMD
-------------------------
TOLIB The library where the files are to be de-compressed
to. Any files to be de-compressed into must exist
in the library. If the files do not exist, you
should consider creating them with a function like
CRTDUPOBJ or the DUPFILFMT TAA Tool.
The members in the file do not need to exist. If
the member exists, it will be cleared before any
data is added to the member. If the member does not
exist, it will be added using the information stored
within the member header from the TAACPRDBF file.
The approach of adding the members is used to allow
a convenient method of moving source with compressed
files.
If you want to add records to an existing file, you
must create a work version of the file (without any
records) and de-compress to the work version. Then
use a function such as CPYF to add the de-compressed
records to your existing file.
DCPFILE The file name of the file which will be written to.
The default is *ALL meaning all files found in the
TAACPRDBF file.
If *ALL is used, you may specify the DCPMBR
parameter to name a specific member which may appear
in multiple files or a generic member name which may
appear in multiple files.
Any files to be de-compressed to must exist in the
library named in the TOLIB parameter.
The file to be de-compressed to must be the same
length as the file that was originally compressed.
DCPMBR The member name of the file to be written to. The
default is *ALL for all members of the file that
exist in the TAACPRDBF compressed file.
A specific member, a generic member name, or the
special value *SAVF may be entered.
The value entered applies to the DCPFILE parameter.
For example, you could de-compress all generically
named members from different files by specifying
DCPFILE(*ALL) and DCPMBR(xxx*).
If DCPFILE(*ALL) is specified, DCPMBR(*SAVF) may be
used to de-compress only the save files that exist.
See the DCPSAVF parameter to omit save files from
being de-compressed.
CPRDBFLIB The library which contains the TAACPRDBF file.
*LIBL is the default. *CURLIB may be used. The
data in the file must have been written by the
CPRDBF command.
DLTSPLF A *YES/*NO option for whether the spooled file will
be deleted. The default is *NO which retains the
spooled file. The spooled file lists one line per
member that was de-compressed.
*YES may be specified to delete the spooled file.
DCPSAVF A *YES/*NO option for whether any save files should
be de-compressed. The default is *YES which allows
save files to be de-compressed.
*NO may be specified to omit save files.
CPRLIBDBF Command parameters *CMD
----------------------------
LIB The library where the files exist that are to be
compressed.
CPRDBFLIB The library where the TAACPRDBF file exists or will
be created. The default is *LIB meaning the same
library as named in the LIB parameter. *LIBL or
*CURLIB may be used if the TAACPRDBF file already
exists and can be found with the special library
value.
REPLACE A *YES/*NO option for whether the data should be
replaced in the TAACPRDBF file before writing to it.
The default is *YES which would cause the member of
the TAACPRDBF file to be cleared. If multiple files
or members are compressed using CPRLIBDBF, the
TAACPRDBF file is only cleared for the first member
written.
*NO may be specified to add records to the TAACPRDBF
file.
SAVF A *YES/*NO option for whether the data should be
compressed from any save files found in the library.
The default is *NO which bypasses any save files.
*YES may be specified to compress the data from any
save files.
If the save file was saved with DTACPR(*NO), some
reasonable savings can be made by compressing save
file data. If the save file was saved with
DTACPR(*YES), you should avoid compressing the save
file as there will be either little gain or a loss.
DLTSPLF A *YES/*NO option for whether the spooled file
created for each file that is compressed should be
deleted. *NO is the default to retain the spooled
file.
*YES may be specified to delete each spooled file.
PRTCPRDBF Command parameters *CMD
----------------------------
CPRDBFLIB The library where the TAACPRDBF file exists. The
default is *LIBL. A specific name or *CURLIB may
also be used.
Restrictions
------------
Only physical files or save files may be compressed.
Any file to be compressed must have a record length of 9997 or less.
The file to be used for the de-compressed data must have the same
record length as the file where the data was compressed from.
The algorithm used is unique for CPRDBF. The DCPDBF command may only
be used with a file that was compressed by CPRDBF or CPRLIBDBF.
Prerequisites
-------------
The following TAA Tools must be on your system:
CHKGENERC Check generic
EDTVAR Edit variable
FILEFDBCK File feedback
HLRMVMSG HLL Remove message
RTVMBRLST Retrieve member list
RTVSAVFD Retrieve save file description
RTVSYSVAL3 Retrieve system value 3
SNDCOMPMSG Send completion message
SNDESCMSG Send escape message
SNDSTSMSG Send status message
Implementation
--------------
None, the tool is ready to use.
Objects used by the tool
------------------------
Object Type Attribute Src member Src file
------ ---- --------- ---------- ----------
CPRDBF *CMD TAADBIC QATTCMD
DCPDBF *CMD TAADBIC2 QATTCMD
PRTCPRDBF *CMD TAADBIC3 QATTCMD
CPRLIBDBF *CMD TAADBIC4 QATTCMD
TAADBICC *PGM CLP TAADBICC QATTCL
TAADBICC2 *PGM CLP TAADBICC2 QATTCL
TAADBICC3 *PGM CLP TAADBICC3 QATTCL
TAADBICC4 *PGM CLP TAADBICC4 QATTCL
TAADBICC12 *PGM CLP TAADBICC12 QATTCL
TAADBICR *PGM RPG TAADBICR QATTRPG
TAADBICR2 *PGM RPG TAADBICR2 QATTRPG
TAADBICR3 *PGM RPG TAADBICR3 QATTRPG
Structure
---------
CPRDBF Cmd
TAADBICC CL pgm
TAADBICR RPG pgm
DCPDBF Cmd
TAADBICC2 CL pgm
TAADBICR2 RPG pgm
TAADBICC12 CL pgm
CPRLIBDBF Cmd
TAADBICC4 CL pgm
PRTCPRDBF Cmd
TAADBICC3 CL pgm
TAADBICR3 RPG pgm
|