The Calculate Data Base File Hash command determines a hash value for
the data in a data base member. The intent of the command is to
provide a comparison method for large files on different systems
without transporting the entire file and making a comparison. An
optional outfile HASHP may be written. The CMPDBFHSH command is
supported to compare HASHP files in different libraries.
The model file for the outfile is TAAHSHAP with a format name of
HSHRCD.
To try out the command, use it on a reasonably small file (such as
1000 record or less).
CLCDBFHSH FILE(xxx)
Messages describe the results. You can compare the hash value
manually using CLCDBFHSH on what is supposed to be a duplicate file.
However, the real power of the function is to use the 'block count'
function (described later) and the outfile capability and make the
comparison with the CMPDBFHSH command.
Assume you have a large FILEA on two systems and want to ensure that
the data matches 100%. On the From system you would enter the
following:
CLCDBFHSH FILE(FILEA) OUTPUT(*OUTFILE)
OUTLIB(xxx)
This creates the HASHP outfile in the named library. The member has
only a single record with the count of records found and the hash
value.
You can review the data in the file with the command:
PRTDB2 FILE(xxx/HASHP)
You would then transfer the HASHP file to the 2nd system that has the
duplicate FILEA. On the 2nd system you would issue the same
CLCDBFHSH command and output the HASHP file
Then enter the CMPDBFHSH command as:
CMPDBFHSH FROMLIB(xxx) TOLIB(yyy)
An escape message would be sent if the hash values do not match. A
listing is always produced.
Assume that the hash values do not agree meaning the data is not
identical and there is one record in a million record file that
differs. The 'block count' function can be used to produce a
separate hash value for 'blocks of records'. For example, you could
specify blocks of 50,000:
CLCDBFHSH FILE(FILEA) OUTPUT(*OUTFILE)
OUTLIB(xxx) BLOCKCNT(50000)
A separate record would be written to the outfile for each block of
50,000 records (the last record is probably less than 50,000). By
doing the same function on the 2nd system and then using CMPDBFHSH,
you can determine which block of 50,000 is not identical.
Assume it was the 300,001 to 350,000 block. You can then reduce the
block size (assume 5,000) and describe a specific block of records
using the FROMRCD and TORCD parameters such as:
CLCDBFHSH FILE(FILEA) OUTPUT(*OUTFILE)
FROMRCD(300001) TORCD(350000)
OUTLIB(xxx) BLOCKCNT(5000)
This would output 10 records to the HASHP file. Assuming you did the
same on the 2nd system and then used CMPDBFHSH, you could determine
which block of 5,000 records differed.
On each iteration, you could ask for smaller and smaller block counts
as you identify where the problem is. Assume you were able to narrow
the problem area to a block of 100 records. You can then request a
block of 1 and identify the record which is not identical.
CLCDBFHSH utilizes a good deal of CPU time. If you have large files,
you should use the command at offpeak times of the day. Specifying a
'from' and 'to' record will allow you to perform the hash using
multiple steps.
Members without data
--------------------
It is valid to use CLCDBFHSH on a member without any data, but the
defaults must be used for FROMRCD and TORCD. TAA9894 is sent as an
escape message if the defaults are not used.
HASHP file
----------
The HASHP file will contain one record for each block that is
requested. The default for BLOCKCNT is *ALL meaning the entire
member is considered as one block and one record would be output.
The key structure for the HASHP file is:
Library
File
Member
Key
The key field is taken from the KEY parameter on CLCDBFHSH. The
default is *GEN meaning the command will generate a key for you using
the naming convention of NBR0000001, NBR0000002, etc. If you request
multiple blocks, each block record would receive a unique key value.
While you can name your own key value, the default should be used in
most cases.
When CMPDBFHSH is used, the records in the From file are used to
chain to the records in the To file. Using the same key convention
is required. If the To file record does not exist, an error is
noted. Both the hash value and the record counts are compared by
CMPDBFHSH.
CMPDBFHSH allows you to compare one or all records in the HASHP file.
You can output multiple files/members to the same HASHP file and
request a comparison on one or all of the files/members.
The default on CLCDBFHSH is REPLACE(*YES) for the output member.
This means the member is cleared first before any output records are
written. When you are only using HASHP to compare one file at a
time, the default works properly.
If you want records from multiple files/members in the same HASHP
file, you do not want the default to clear the file when CLCDBFHSH
begins.
The special value *MTN should be considered. This invokes the MTNHSH
command which deletes any records in the HASHP file for the FILE and
MBR parameters specified on CLCDBFHSH. This allows you to add new
records with the default generated key of NBR0000001, etc without
causing duplicate key errors.
MTNHSH Command
--------------
The MTNHSH command is normally requested by using REPLACE(*MTN) on
the CLCDBFHSH command.
However, you can use MTNHSH at any time to cleanup old records in a
HASHP file. You must identify the file and member that you want to
delete records for.
MTNHSH must allocate the HASHP file member. The library/file/member
records specified are deleted, the file is copied to the temporary
file HASHP2 (created in the same library as HASHP), and then the
records are copied back. The HASHP2 file is deleted, and the HASHP
file is de-allocated.
If the HASHP2 file exists when the command starts, it indicates that
the previous use of the command did not complete successfully.
Technique used
--------------
The 'hash' technique is not a CRC (Cyclic Redundancy Check) such as
used by the system for a 'hashing' technique. CRC provides an 8 byte
value. Instead, the RIPEMD-160 hash functions are used as provided
by K.U. Leuven of the Dept of Electrical Engineering - ESAT/COSIC.
C Language is used to provide a 16 byte return value.
For more information about the technique refer to the TAAHSHAE1
source member and the RIPEMD-160 software written by Antoon
Bosselaers available at
http://www.esat.kuleuven.ac.be/-cosicart/ps/AB-9601/ (the character
before 'cosicart' should be a 'tilde'.
CLCDBFHSH escape messages you can monitor for
---------------------------------------------
TAA9892 The assigned key is not unique in the file
TAA9894 If no records exist, the defaults must be used
Escape messages from based on functions will be re-sent.
CMPDBFHSH escape messages you can monitor for
---------------------------------------------
TAA9893 Differences were found
TAA9895 No records were found to compare
Check the CMPxxx parameters
Escape messages from based on functions will be re-sent.
MTNHSH escape messages you can monitor for
------------------------------------------
None. Escape messages from based on functions will be re-sent.
CLCDBFHSH Command parameters *CMD
----------------------------
FILE The qualified name of the file to generate the hash
value for. The library value defaults to *LIBL.
*CURLIB may also be used.
MBR The member to generate the hash value for. The
default is *FIRST for the first member of the file.
A specific member name may be entered.
FROMRCD The 'from' record in the member to start reading
from. The default is *START meaning the first
record in the file.
A specific relative number may be entered up to a
maximum of 9,999,999,999. If a specific value is
entered, it must be *LE to the TORCD value and must
be *LE to the number of records in the member.
The file is read in arrival sequence. The value
entered (*START = 1) is used on an OVRDBF command to
begin the reading of the member. The first
non-deleted record is read from that point.
If the BLOCKCNT parameter is other than *ALL, you
are identifying the block size within the FROMRCD
and TORCD.
TORCD The 'to' record in the member to end reading on.
The default is *END meaning to the 'end of file'.
If *END is used, the 'end of file' is determined by
the number of records in the member when the command
starts processing. This value determines the last
record to be read. If additional records are added
to the end of file while CLCDBFHSH is in process,
they are not considered.
A specific number may be entered that is *GE to the
value of the FROMRCD parameter. The number entered
will be the last record read unless 'end of file'
occurs prior to that value.
OUTPUT The type of output to be performed. * is the
default meaning that messages are sent to describe
the results.
*OUTFILE may be specified to mean that both messages
and an outfile with the results will be output. If
*OUTFILE is specified, you may also enter the
parameters OUTLIB, OUTMBR, REPLACE, KEY, and
BLOCKCNT.
OUTLIB The library in which the file HASHP will be placed.
The default is *LIBL. If the HASHP file does not
already exist, a library must be named.
OUTMBR The member of the HASHP file to be used. The
default is HASHP. If the member does not exist it
is added.
REPLACE A *YES/*NO value for whether the member of the HASHP
file should be cleared before writing records into
it. The default is *YES.
*NO may be specified to add records to any existing
data.
*MTN may be specified to invoke the MTNHSH command.
This will cause a deletion of any existing records
for the same file/library/member before adding any
new records to the file.
KEY The key assigned to the record in the output file.
The default is *GEN meaning a naming convention is
used of NBR0000001, NBR0000002 ... Using the
default is usually the best solution.
The total key in the HASHP file is made up of the
LIB, FILE, MBR, and KEY parameters (this generates
Library/File/Member/Key). Unique keys are required.
If you use the CMPDBFHSH command, the key structure
is used to access the corresponding record in the
file being compared. You must be consistent (the
default normally provides the best approach). If a
BLOCKCNT is specified, you must use the *GEN
default.
BLOCKCNT The block count used. An entry is only valid when
an OUTPUT(*OUTFILE) is specified.
The default is *ALL meaning that one record will be
output with a hash value for all records specified
between the FROMRCD and TORCD values.
A block size may be entered (such as 50000) meaning
that a record will be output for each block of
50,000 records that exist between the FROMRCD and
TORCD values (the last block would normally not have
the number specified).
Using a block count can assist you in allowing a
comparison of smaller and smaller segments of the
file in attempting to identify those records which
are not the same.
A block of 1 is valid. The block size cannot exceed
the number of records between the FROMRCD and TORCD
values nor can it exceed the number of records in
the file.
If 1) defaults for FROMRCD/TORCD are used and 2) a
block count is specified and 3) deleted records
exist in the file, the number of records to process
will be the sum of the active and deleted records in
the file.
If a block contains only deleted records, X'00's
will be returned as the hash value.
CMPDBFHSH Command parameters *CMD
----------------------------
FROMLIB The library containing the HASHP file created by
CLCDBFHSH that has the 'from' data to be compared.
*LIBL or *CURLIB may be used as the library value.
TOLIB The library containing the HASHP file created by
CLCDBFHSH that has the 'to' data to be compared.
*LIBL or *CURLIB may be used as the library value.
FROMMBR The member of the 'from' HASHP file to be used. The
default is *FIRST.
TOMBR The member of the 'to' HASHP file to be used. The
default is *FIRST.
CMPFILE The qualified object name of the file to be
compared. The file name defaults to *ALL meaning
any file name will be compared.
The library defaults to *ALL meaning any library
name will be compared.
CMPMBR The member name of the file to be compared. The
default is *ALL meaning any member name will be
compared.
CMPKEY The assigned key to be compared. The default is
*ALL meaning any assigned key. Either the default
should be used or the value for the KEY parameter
you entered on CLCDBFHSH (assuming you did not take
the default).
MTNHSH Command parameters *CMD
-------------------------
FILE The qualified name of the file to delete records for
in the HASHP file. The library value defaults to
*LIBL. *CURLIB may also be used.
If a special value is used for the library
qualifier, the file must exist and its library name
is used to determine the records to be deleted in
HASHP.
MBR The member to delete records for in HASHP. The
default is *FIRST for the first member of the file.
If the default is used, the file must exist and the
name of the first member will be used to determine
the records to be deleted.
A specific member name may be entered.
HASHPLIB The library containing the HASHP file. The default
is *LIBL. *CURLIB may be specified.
HASHPMBR The member of the HASHP file that contains the
records to be deleted. The default is *FIRST. A
specific member name may be entered.
Restrictions
------------
The maximum record length supported is 32,000.
Prerequisites
-------------
The following TAA Tools must be on your system:
CHKOBJ3 Check object 3
CVTHEX Convert hex
CVTTIM Convert time
EDTVAR Edit variable
RTVDAT Retrieve date
RTVDBFA Retrieve data base file attributes
RTVSYSVAL3 Retrieve system value 3
SNDCOMPMSG Send completion message
SNDESCMSG Send escape message
SNDHEXMSG Send hex message
SNDSTSMSG Send status message
Implementation
--------------
None, the tool is ready to use.
Objects used by the tool
------------------------
Object Type Attribute Src member Src file
------ ---- --------- ---------- ----------
CLCDBFHSH *CMD TAAHSHA QATTCMD
CMPDBFHSH *CMD TAAHSHA2 QATTCMD
MTNHSH *CMD TAAHSHA3 QATTCMD
TAAHSHAC *PGM CLLE TAAHSHAC QATTCL
TAAHSHAC2 *PGM CLP TAAHSHAC2 QATTCL
TAAHSHAC3 *PGM CLP TAAHSHAC3 QATTCL
TAAHSHAR *PGM
TAAHSHAR3 *PGM TAAHSHAR3 QATTRPG
TAAHSHAR11 *PGM RPGLE TAAHSHAR11 QATTRPG
TAAHSHAR *MODULE RPGLE TAAHSHAR QATTRPG
TAAHSHAE1 *MODULE CLE TAAHSHAE1 QATTPL1
TAAHSHAE2 *MODULE CLE TAAHSHAE2 QATTPL1
TAAHSHAP *FILE PF TAAHSHAP QATTDDS
TAAHSHAQ *FILE PF
TAAHSHAQ is created from the TAAHSHAP source.
Structure
---------
CLCDBFHSH Cmd
TAAHSHAC CL Pgm
TAAHSHAR11 RPG Pgm - Checks for duplicate key
TAAHSHAR RPG Pgm - Does hash function
TAAHSHAR RPGLE *MODULE
TAAHSHAE1 CLE *MODULE
TAAHSHAE2 CLE *MODULE
TAAHSHAR11 RPG Pgm - 2nd use to write to the HASHP file
CMPDBFHSH Cmd
TAAHSHAC2 CL Pgm
TAAHSHAR2 RPG Pgm
MTNHSH Cmd
TAAHSHAC3 CL Pgm
TAAHSHAR3 RPG Pgm
|