TAA Tools
CPRDBF          COMPRESS DATA BASE FILE                TAADBIC

The  Compress  Data  Base File  command  compresses  a  single  member,
generic members,  all members of a  physical data base file  or data in
a  save file into an output file.   The amount of reduced space is very
data dependent.   Typical reduction  ranges from 50  to 80 percent  for
physical file  data.   The companion command  is DCPDBF  to de-compress
the data back to its original form.

The  CPRDBF  tool is  intended  for the  case  where you  are  going to
transmit data  to  another system.   In  general,  it is  desirable  to
compress  the data  before  transmitting it  to  keep the  transmission
time to a minimum.

The  system SAV commands  support the  capability to compress  the data
into a save  file with the  DTACPR(*YES) option.   However, the  system
compression technique is  not very efficient  on commercial data.   You
will generally  find that CPRDBF will provide  much better compression.

The  following provides some examples  of the compression achieved with
CPRDBF.

      Physical data file without keys
      -------------------------------

           Original data size          569,344
           SAVF size - DTACPR(*YES)    413,696   27% savings
           CPRDBF size                 217,088   70% savings

      RPG source file with 73 members
      -------------------------------

           Original file size        1,847,296
           SAVF size - DTACPR(*YES)    675,840   63% savings
           CPRDBF size                 413,696   78% savings

Typical application
--------------------

Assume you want to transmit  only data files (either source or  data or
both) to another system.

You may  compress one or more  physical files with one  or more members
using  CPRDBF.   Each file  to be compressed  requires a  unique CPRDBF
command.  A typical command would look like:

             CPRDBF    FROMFILE(xxx) CPRDBFLIB(zzz)

The TAACPRDBF  file is  created  by CPRDBF  with the  compressed  data.
You could  add other compressed  data to the  same TAACPRDBF  file with
additional  CPRDBF commands.   You  would  then transmit  the TAACPRDBF
file to another system.

At the other  system, the  DCPDBF (De-compress) command  would be  used
to de-compress the TAACPRDBF  data back to the original file  names.  A
typical command would be:

             DCPDBF    TOLIB(xxx) CPRDBFLIB(zzz)

The  members of  the file  do not  have to  exist in  the corresponding
files,  but the files must  exist.  A different  library could be used.

CPRLIBDBF command
-----------------

A separate command is provided  with the tool to compress  all physical
files in a library to the same TAACPRDBF file.

TAACPRDBF File
--------------

The TAACPRDBF  file is  automatically created by  either the  CPRDBF or
CPRLIBDBF  command if it  does not exist.   The file is  defined with a
200 byte record length and does not have any keys.

The file  has  only  a single  member  but  may contain  the  data  for
multiple  physical file  members or  save files.   Heading  information
exists  within the single  member to  logically separate the  data from
multiple physical file members or save files.

The PRTCPRDBF command may be used to  print a list of the members  that
are contained within the TAACPRDBF file.

Internal  checking ensures  that  every  record that  is  de-compressed
matches  the  record  length of  the  file  where  the  data is  to  be
written.

Therefore,  you must ensure that the file  to be de-compressed to has a
duplicate  definition (in  terms  of  record length)  as  the  original
file.

Spooled file output
-------------------

When  CPRDBF  is  run,  an   optional  spooled  file  will  be  created
describing  the results.   The spooled file  will contain  one line per
member that  has been  compressed.   A comparison  is  included of  the
original data  space size and the  compressed size.  Data  space values
are  determined by  multiplying the number  of records  in the  file by
the record length.  Deleted records are not considered.

You may optionally  request that  a comparison  be made  of the  object
size  of  the  two  objects.    This   level  of  information  is  only
meaningful if  you compress the  data from a  single file (it  may have
multiple  members)  or  save  file.   The  information  is  intended to
assist  you  in  evaluating   the  effectiveness  of  the   compression
technique.

When  CPRLIBDBF runs, it  uses CPRDBF  for each  physical file  or save
file  to be compressed.   A separate spooled file  is created for each.
An option on CPRLIBDBF can be used to delete the spooled files.

The DCPDBF  command also  produces an  optional spooled  file for  each
file that was de-compressed.

The PRTCPRDBF  command produces a spooled  file with one line  for each
member or save file in the TAACPRDBF file.

Save file support
-----------------

CPRDBF can  be used to compress a save  file.  However, the compression
technique used  by CPRDBF  is not  effective on  save files  that  have
been saved  with DTACPR(*YES).   For the  best compression, use  CPRDBF
directly on physical files.

CPRDBF  can be  used  to compress  a  save file  containing  any object
type.   The  save file  could have  been created  by any  save command.
The compression results are  very data dependent, but you  should avoid
using the  DTACPR(*YES) function  when saving to  the save file  if you
are going to use CPRDBF on the save file.

See the later discussion on 'Compression results'.

Using multiple libraries
------------------------

The  simplest use of the CPRDBF  tool is to use  one TAACPRDBF file for
each library  containing  data  to  be compressed.    This  allows  the
DCPDBF  command  to  default  to  de-compress  all  data  to  the  same
library.   The library  may differ from  the original library,  but the
files must exist.

You can add  to an  existing TAACPRDBF  file with  data from  different
libraries.   When  you  de-compress the  data,  you must  use  separate
DCPDBF commands for each library.


Testing
-------

To test the technique used, you could do the following:

  **   Use CPRDBF  to compress a  file into QTEMP.   The TAACPRDBF file
       will be automatically created.

  **   Use  CRTDUPOBJ  to  create  a  duplicate  of  the  original file
       (without any data) into QTEMP.

  **   Use DCPDBF to  de-compress the data  in the TAACPRDBF file  into
       the newly created duplicate in QTEMP.

  **   Use a function  like the CMPDBF or CMPSRC2  TAA Tools to compare
       the two versions of the file.

  **   To  compare save files, you  must first convert  the data in the
       save  file  to  a  data  base  file  (such  as  by  use  of  the
       CPYFRMSAVF TAA Tool) and then use CMPDBF.


Compression technique used
--------------------------

The  compression  technique  used  is RPG  code  which  looks  for  the
following types of data:

  **   A string of blanks or X'00's.

  **   Packed fields which are all zeros (such as X'000F')

  **   Packed fields which have a value of 1 (such as X'00001F')

  **   A string of characters repeated from the previous record

  **   A repeated character (such as a string of asterisks)

Special  characters in  the range of  X'EA' -  X'EF' and X'FA'  - X'FF'
are placed in the  compressed data to  signal a compression  technique.
The special  characters may be  followed by a  binary count (one  byte)
or a  character or both.  There  is no restriction on the  use of these
special  characters within the data  file to be compressed.   If one of
the special characters  is found in your  data, a special character  is
output followed by your byte of data.

Compression results
-------------------

The results from CPRDBF are very data dependent.

CPRDBF  works  best  on  typical  commercial  data  in  physical  files
(either data or source).

Typical  commercial data or source  compresses very well.   It would be
normal to see results that  reduce the size by 50  to 80%.  Files  with
a good deal of text data (non-blanks) do not compress well.

A single small source  member as the only member  in the TAACPRDBF file
does  not  compress well  because of  the  amount of  overhead  for the
TAACPRDBF member  and file  control blocks.   Multiple  source  members
normally compress well.

CPRDBF is generally not  as effective when compressing save  file data.
CPRDBF  may produce a  larger size  if you  attempt to compress  a save
file saved with DTACPR(*YES).

  **   You  will generally  see a reasonable  gain if a  data base file
       is saved  to a save  file with  DTACPR(*NO) and then  compressed
       with  CPRDBF.   However,  the results  will  not be  as good  as
       using the CPRDBF command directly on the physical files.

  **   Objects  which are not  physical files  and are saved  to a save
       file with  DTACPR(*NO)  generally do  not  compress as  well  as
       physical  file  data.     The  results  can  vary  significantly
       depending on the the type of object and attribute.

       For example,  the makeup of a CL  program object differs from an
       RPG program  object.   Different  internal approaches  are  used
       which can lead  to different compression results.   The CRTCLPGM
       command   also    defaults   to   create    the   program   with
       ALWRTVSRC(*YES)  which means the CL  source (minus the comments)
       is stored with the program.   Since this appears as a  string of
       unique  text, it does  not compress  very well.   A  program may
       also  include observability which  has different characteristics
       from the instruction stream of a program.

The best solution is  generally to try some  of your own typical  files
and test the differences.

CPRDBF Command parameters                             *CMD
-------------------------

   FROMFILE      The   qualified  file   name  of   the   file  to   be
                 compressed.    The library  value  defaults  to *LIBL.
                 *CURLIB may also be used.

                 A physical file (data  or source) or  a save file  may
                 be specified.  A logical file may not be used.

   CPRDBFLIB     The library  which  will contain  the TAACPRDBF  file.
                 *LIBL  is the  default,  but may  not be  used  if the
                 TAACPRDBF  file  is  not found  on  the  library list.
                 *CURLIB may be specified.

                 If a  library is  named  and the  TAACPRDBF file  does
                 not exist,  the file is  created with a length  of 200
                 bytes.   The file  will contain only  a single member.

                 If a  TAACPRDBF  file does  exist,  it will  be  used.
                 You must  consider the  REPLACE option  for the  first
                 use of the file.

   FROMMBR       The  member to be  compressed from the  FROMFILE.  The
                 default is *FIRST.

                 A single  member may  be specified,  a generic  member
                 name, or *ALL for  all members.  If a file  has only a
                 single member,  there is no  difference between *FIRST
                 and *ALL.

                 If  the file  to be saved  is a save  file, a specific
                 member name may not be used.

   REPLACE       Whether  to   replace   the  data   in  the   existing
                 TAACPRDBF file.   The default is *NO which  is used to
                 make  it convenient  to  compress multiple  files into
                 the  same   TAACPRDBF   file  with   separate   CPRDBF
                 commands.

                 REPLACE(*YES)  may   be  specified  if  you   have  an
                 existing  TAACPRDBF file  containing data and  want to
                 start a new set of files to be compressed.

   DLTSPLF       Whether to delete  the spooled file.   A spooled  file
                 is created  with  one line  per member  that has  been
                 compressed.   It describes  the number of  records and
                 the amount of compression achieved.

                 *NO is the default to retain the spooled file.

                 *YES may be specified to delete the spooled file.

   CMPSAVINGS    Whether  to print  a comparison  of the  savings.  The
                 default is *NO meaning no comparison is made.

                 *YES may  be specified  to  cause a  comparison.   The
                 DSPOBJD size  of the specified file and  the same size
                 of the TAACPRDBF file are compared.

                 This   comparison  is  a   better  indication  of  the
                 savings than that  which appears  with each member  in
                 the spooled  output.   Because of this  technique, the
                 comparison  is only meaningful  when all  members of a
                 file are compressed.

                 The intent of the parameter  is to allow for a  simple
                 means    of   comparisons    when    evaluating    the
                 effectiveness of the CPRDBF tool.

DCPDBF Command parameters                             *CMD
-------------------------

   TOLIB         The  library where the  files are to  be de-compressed
                 to.   Any  files to  be de-compressed  into must exist
                 in the  library.    If the  files  do not  exist,  you
                 should  consider creating  them with  a function  like
                 CRTDUPOBJ or the DUPFILFMT TAA Tool.

                 The  members in  the file  do not need  to exist.   If
                 the member  exists,  it  will be  cleared  before  any
                 data is added  to the member.  If the  member does not
                 exist, it  will be added using  the information stored
                 within the member header from the TAACPRDBF file.

                 The approach of  adding the members  is used to  allow
                 a convenient method  of moving source  with compressed
                 files.

                 If you  want to add  records to an  existing file, you
                 must  create a work  version of the  file (without any
                 records) and de-compress  to the  work version.   Then
                 use a  function such as CPYF to  add the de-compressed
                 records to your existing file.

   DCPFILE       The  file name of  the file which will  be written to.
                 The default is  *ALL meaning  all files  found in  the
                 TAACPRDBF file.

                 If  *ALL   is  used,   you  may  specify   the  DCPMBR
                 parameter  to name a specific  member which may appear
                 in multiple files or a  generic member name which  may
                 appear in multiple files.

                 Any files  to be  de-compressed to  must exist in  the
                 library named in the TOLIB parameter.

                 The  file to  be  de-compressed  to must  be  the same
                 length as the file that was originally compressed.

   DCPMBR        The  member name  of the file  to be written  to.  The
                 default is  *ALL  for all  members  of the  file  that
                 exist in the TAACPRDBF compressed file.

                 A  specific  member, a  generic  member  name, or  the
                 special value *SAVF may be entered.

                 The  value entered  applies to the  DCPFILE parameter.
                 For example,  you  could de-compress  all  generically
                 named  members  from  different  files  by  specifying
                 DCPFILE(*ALL) and DCPMBR(xxx*).

                 If  DCPFILE(*ALL) is  specified, DCPMBR(*SAVF)  may be
                 used to de-compress  only the save  files that  exist.

                 See  the DCPSAVF  parameter to  omit  save files  from
                 being de-compressed.

   CPRDBFLIB     The  library   which  contains  the   TAACPRDBF  file.
                 *LIBL  is  the default.   *CURLIB  may  be used.   The
                 data  in the  file  must  have  been  written  by  the
                 CPRDBF command.

   DLTSPLF       A *YES/*NO  option for  whether the spooled  file will
                 be  deleted.   The  default is  *NO which  retains the
                 spooled file.   The spooled  file lists  one line  per
                 member that was de-compressed.

                 *YES may be specified to delete the spooled file.

   DCPSAVF       A *YES/*NO  option for  whether any save  files should
                 be  de-compressed.  The  default is  *YES which allows
                 save files to be de-compressed.

                 *NO may be specified to omit save files.

CPRLIBDBF Command parameters                          *CMD
----------------------------

   LIB           The library  where  the files  exist  that are  to  be
                 compressed.

   CPRDBFLIB     The library  where the TAACPRDBF  file exists  or will
                 be  created.   The default  is  *LIB meaning  the same
                 library  as  named in  the  LIB parameter.    *LIBL or
                 *CURLIB may  be  used if  the  TAACPRDBF file  already
                 exists  and  can be  found  with  the special  library
                 value.

   REPLACE       A  *YES/*NO  option  for whether  the  data  should be
                 replaced in the TAACPRDBF  file before writing to  it.
                 The default  is *YES which  would cause the  member of
                 the TAACPRDBF  file to be cleared.   If multiple files
                 or  members  are   compressed  using  CPRLIBDBF,   the
                 TAACPRDBF file  is only cleared  for the  first member
                 written.

                 *NO may  be specified to add records  to the TAACPRDBF
                 file.

   SAVF          A  *YES/*NO  option  for whether  the  data  should be
                 compressed from any save  files found in the  library.
                 The default is *NO which bypasses any save files.

                 *YES may  be specified to  compress the data  from any
                 save files.

                 If  the  save file  was saved  with  DTACPR(*NO), some
                 reasonable savings  can be  made by  compressing  save
                 file  data.     If  the  save  file   was  saved  with
                 DTACPR(*YES),  you should  avoid compressing  the save
                 file as there will  be either little  gain or a  loss.

   DLTSPLF       A  *YES/*NO  option  for  whether   the  spooled  file
                 created  for each  file that  is compressed  should be
                 deleted.   *NO  is the  default to  retain the spooled
                 file.

                 *YES may be specified to delete each spooled file.

PRTCPRDBF Command parameters                          *CMD
----------------------------

   CPRDBFLIB     The library  where  the TAACPRDBF  file  exists.   The
                 default  is *LIBL.   A  specific name  or  *CURLIB may
                 also be used.

Restrictions
------------

Only physical files or save files may be compressed.

Any  file to be compressed must  have a record length  of 9997 or less.

The file  to be  used for  the de-compressed  data must  have the  same
record length as the file where the data was compressed from.

The algorithm used is  unique for CPRDBF.  The  DCPDBF command may only
be used with a file that was compressed by CPRDBF or CPRLIBDBF.

Prerequisites
-------------

The following TAA Tools must be on your system:

     CHKGENERC       Check generic
     EDTVAR          Edit variable
     FILEFDBCK       File feedback
     HLRMVMSG        HLL Remove message
     RTVMBRLST       Retrieve member list
     RTVSAVFD        Retrieve save file description
     RTVSYSVAL3      Retrieve system value 3
     SNDCOMPMSG      Send completion message
     SNDESCMSG       Send escape message
     SNDSTSMSG       Send status message

Implementation
--------------

None, the tool is ready to use.

Objects used by the tool
------------------------

   Object        Type    Attribute      Src member    Src file
   ------        ----    ---------      ----------    ----------

   CPRDBF        *CMD                   TAADBIC       QATTCMD
   DCPDBF        *CMD                   TAADBIC2      QATTCMD
   PRTCPRDBF     *CMD                   TAADBIC3      QATTCMD
   CPRLIBDBF     *CMD                   TAADBIC4      QATTCMD
   TAADBICC      *PGM       CLP         TAADBICC      QATTCL
   TAADBICC2     *PGM       CLP         TAADBICC2     QATTCL
   TAADBICC3     *PGM       CLP         TAADBICC3     QATTCL
   TAADBICC4     *PGM       CLP         TAADBICC4     QATTCL
   TAADBICC12    *PGM       CLP         TAADBICC12    QATTCL
   TAADBICR      *PGM       RPG         TAADBICR      QATTRPG
   TAADBICR2     *PGM       RPG         TAADBICR2     QATTRPG
   TAADBICR3     *PGM       RPG         TAADBICR3     QATTRPG

Structure
---------

CPRDBF      Cmd
   TAADBICC   CL pgm
     TAADBICR   RPG pgm

DCPDBF      Cmd
   TAADBICC2  CL pgm
     TAADBICR2  RPG pgm
       TAADBICC12  CL pgm

CPRLIBDBF   Cmd
   TAADBICC4  CL pgm

PRTCPRDBF   Cmd
   TAADBICC3  CL pgm
     TAADBICR3  RPG pgm
					

Added to TAA Productivity tools May 1, 1998


Home Page Up to Top