TAA Tools
RPGALTLKP       RPG ALTERNATE LOOKUP                   TAARPGF

The  RPG  Alternate  Lookup  Tool   describes  a  performance  oriented
solution  for the  typical case  where a  batch job  reads a  data base
file,   loads  an  array  using  unique   values  from  the  data,  and
accumulates amounts.  At the  end of the program, the array  is usually
sorted and a report prepared with one line per unique value.

In general, this is a good technique because:

  **   The program  can process the  file in arrival sequence  which is
       much faster than processing by keys.

  **   It avoids physically sorting the data.

  **   It   avoids  many   internal  compares   caused  by   the  LOKUP
       operation.

The sample code  allows you  to produce  a working program  in a  short
period of time.

If you  are only interested  in summary data  and the number  of unique
values is 2000 or less, you should be considering this technique.

Sample code
-----------

The  sample RPG  code  is shown  in the  member TAARPGFR3  in  the file
QATTRPG.  The normal solution would be  to copy all of the source to  a
new member and  then begin making  modifications.  Instructions  in the
source will assist you in how to modify the program.

The  TAARPGFR3 source is  a working program  that is compiled  when the
RPGALTLKP  tool is created.   The program works  on the outfile created
by the DSPOBJD  command.  The program  creates a report in  object type
sequence  for each  unique object  type.   A count  by object  type and
average size is also included along with percentages.

You  can try  the program  by specifying DSPOBJD  on a  typical library
(the outfile name must be DSPOBJDP).

          DSPOBJD     OBJ(xxx/*ALL) OBJTYPE(*ALL) +
                        OUTPUT(*OUTFILE) OUTFILE(DSPOBJDP)
          CALL        PGM(TAATOOL/TAARPGFR3)

Then look at the spooled output for the QPRINT file.

ANZRPGLKP Command
-----------------

To help you in determining  if the Alternate Lookup solution will  have
significant payoff,  an Analyze command is  available that can  be used
on any  externally described data base file.   The Analyze command will
perform the normal LOKUP  operation approach as  well as the  alternate
method and describe both results in a spooled file.

Assume you  created the  DSPOBJDP file  as in  the prior  instructions.
You want  a report that uses  'object type' (the field  name is ODOBTP)
as a unique value.  You could analyze the file with the command:

           ANZRPGLKP   FILE(DSPOBJDP) FIELD1(ODOBTP) +
                         FSTPOS(3) SECPOS(4) THRPOS(5)

The  FSTPOS, SECPOS,  and THRPOS  fields describe  the positions within
the field  that  will be  used to  provide  a random  value  to do  the
lookup.   Because the 'object type'  field values all start  with * and
the  second position  value has several  duplicates, the  3rd, 4th, and
5th positions are used to provide the best random solution.

You  can  try  the  ANZRPGLKP  command  using  different  positions  to
determine  the  best  solution.   The  closer  the  average  number  of
comparisons is to 1.0 per record, the better the solution is.

The  Alternate method  is only  worth  pursuing if  you can  reduce the
number of comparisons by either hundreds of thousands or millions.

The ANZRPGLKP command  is general purpose  in order to  operate on  any
externally described data.   Because of  this, the performance  of both
the  normal lookup  and  the alternate  method  will be  slower  than a
tailored  solution  using  the  sample  provided  by  TAARPGFR3.    The
advantage  of  ANZRPGLKP is  that  you  can  easily  determine  if  the
technique will be effective.

Comments in the sample code
---------------------------

The TAARPGFR3 source has two forms of comments:

  **   Permanent comments.  These define the logic of the program.

  **   Temporary  comments.   These  provide  instructions  for how  to
       modify  the program.   You  scan the source  for +++Req  to find
       these temporary comments

You should not  need to read  or understand all  of the code  provided.
The temporary comments should  tell you what you need  to modify to get
a working result.

The  best solution is  usually to leave  the temporary comments  in the
source until  you  have the  program  working.   When  you  are  ready,
delete the  temporary  comments with  the TAA  tool RMVSRCCMT.   It  is
specifically  provided  to eliminate  all  of the  comments  that begin
with +++.

Other uses of the technique
---------------------------

The  code  provided  for  the  Alternate  method  sorts  the  array  in
ascending sequence.  A  simple change allows descending sequence.   You
can sequence  on the  unique value or  a count you  are keeping  of the
unique  values  or  any other  summary  value.   Your  code  could also
process the  array  when all  the  records  have been  read,  determine
percentages  or averages  and then  sort on  a calculated  field.   The
sort function is optional.

You  can  also use  the  same alternate  lookup  method  for additional
applications.  For  example, an array  is often used  to help  validate
data being input.

Assume  you have  an array  of  the valid  US  State abbreviations  (an
array  of 50+ elements).   If you  check every transaction  against the
array,  you would average 25 or so  compares against the state array to
determine whether a valid state had been specified.

To lower  the number of  compares, you  could have initialization  code
in the  program that loads the  alternate array from a  standard array.
The  TAARPGFR3 program supports the  use of the ZAEXST  field.  If this
is set to *ON, the subsequent lookups  will return a value of EQ or  NE
in the ZASTS  field and the array  is not updated (it  becomes a normal
lookup).

Because  the alternate  method requires large  arrays (it  may approach
65,000  bytes),  the  technique  is  not  appropriate  for  interactive
programs if your interactive pool is memory constrained.

Alternate method technique
--------------------------

The Alternate  Method requires that  you extract three  characters from
the  field that  you want  for a  unique value.   These  characters are
used to provide a 'random value'.

For  example, if you  are going to  process the outfile  of the DSPOBJD
command and want  to summarize  by the unique  value of 'object  type',
you would need to specify three characters from the field ODOBTP.

These  three   characters  should  be   chosen  to  provide   the  best
randomness  in  the  data.    Normally,  the  first  three  bytes of  a
character field or the  last three bytes in  a decimal field provide  a
good solution, but it  is very data dependent.  In the  case of ODOBTP,
every  value  begins with  an  *  so you  would  probably  choose other
characters  (the   3rd,   4th,   and  5th   characters   provide   good
randomness).    You  can determine  the  best  approach  by  using  the
ANZRPGLKP  command.  Your  program must  extract three  characters from
the unique value and place them in standard fields.

The   alternate  method  forces  an  F   zone  on  each  of  the  three
characters, ensures a valid digit  portion, and adds a 1 to the  end of
the  value to  form an  index  value.   For example,  ABC would  become
1231, DEF would become 4561, JKL would become 1231, etc.

Avoiding Exception Address Overflows (EAOs)
-------------------------------------------

The  EAO discussion  should only  be considered for  CISC systems.   If
you are on a RISC system, you may ignore the discussion.

There is a limit on how big  the second array used in the program  (the
ZB array)  can be in  order to avoid  the internal exception  caused by
an  address overflow  (internally these  are  known as  EAOs).   The ZB
array contains both the unique value  to compare against and any  count
or total fields you are accumulating.

You should  limit  the size  of the  second  array to  40,000 bytes  or
less.   For example, this  could be 2000  elements of 20  bytes each or
500  elements of  80 bytes  each.  As  long as  the total  is less than
40,000 bytes and  you do not  have many other fields  or arrays in  the
program, the program should not cause any EAOs.

Internal approach
-----------------

The  following chart  shows the  general concept  of the  approach (the
implementation  specifics  are  slightly  different).    The  generated
random value is used as  a direct index into  the ZA array.  The  value
in the ZA element  is assigned to the next available element  in the ZB
array.    The  ZB  array contains  both  the  unique  value to  compare
against and any count or total fields you are accumulating.

           ZA Array                   ZB Array
       ----------------      ------------------------------
                                         Array value in DS
                                        -------------------
                                        Unique
       Element    Value      Element    Value   Count Total
       -------    -----      -------    -----   ----- -----
           1      Index         1
           2       to           2
           .       ZB           .
           .                 Up to
        9999                  2000

Assume the first value read  is ABC and you  are using the first  three
characters of  the value.   This would randomize  to an index  of 1231.
The  1231st element  in the  ZA array  would be  accessed.   If  it was
zero,  the element has not been used  and the next available element in
ZB would be  assigned (this is  the first record  so the first  element
of ZB would  be assigned).  The assigned element of  the ZB array would
be  moved to a  data structure.   The unique value ABC  would be placed
in one of the sub fields,  a count added, and totals accumulated.   The
data structure is then written back to the ZB array.

Assume the second value  read is DEF.  This would randomize  to a value
of  4561.   The  4561st  element in  the  ZA array  would  be accessed.
Since it is zero, the  next element for the ZB  array (Nbr 2) would  be
assigned and updated.

Assume the  next value  read is  ABC.   This would  randomize to  1231.
When the  1231st element was  read from the  ZA array, it  would have a
value  of 1 so  the 1st element of  the ZB array would  be moved to the
data structure.    A comparison  would  be made  of the  unique  value.
Since they  are both  ABC, the  count and totals  would be  updated and
the data structure written back to the ZB array.

Assume  the next  value read  is JKL.   This  also randomizes  to 1231.
When the unique value is compared  in the ZB array, they are not  equal
so the  next element  in the ZA  array is accessed  (Nbr 1232).   Since
the  value is zero,  it would be  assigned the next  available value in
the ZB array (Nbr 3).

If many unique  values randomize to  the same value, there  would be  a
lot of 'clustering'  and this would  result in more compares  trying to
find  the correct  value.   If the  data does  not  form into  too many
large clusters, the performance will generally be very good.

At the end of these 4 records, the two arrays would look as follows:

           ZA Array                    ZB Array
       ----------------      ------------------------------
                                         Array value in DS
                                        -------------------
                                        Unique
       Element    Value      Element    Value   Count Total
       -------    -----      -------    -----   ----- -----
           1                    1        ABC      2    nnn
           2                    2        DEF      1    nnn
           .                    3        JKL      1    nnn
        1231       1            .
        1232       3         2000
           .
        1241
           .
        4561       2
           .
        9998
        9999

Note  that the ZA array is not used at  the end of the program when you
want to  print the results.   Normally you  would want  to sort the  ZB
array.  If you  want to sort on the unique value, it  must be the first
field  in the  data structure.   The RPG  SORTA operation  sorts on the
entire element value so  you need to define  the data structure  fields
in the sequence you want the array sorted.

You  may want  to sort  on the  unique value,  a count  field, a  total
field,  or an average  field kept in  the array.   If you  are going to
sort on  an  'average'  field,  you are  normally  better  off  from  a
performance  viewpoint to  make  an extra  pass  through the  ZB  array
before  sorting  to  calculate the  average  (or  percent) rather  than
calculating each time the data structure is updated.

Note  that  if you  have a  blank value  as the  unique value,  it will
randomize to 0001.  Blanks will  be written to the unique value  in the
data structure.   When you print  the accumulated values at  the end of
the  program,  you want  to  avoid printing  if the  entire  element is
blank.   A  blank  unique  value  will have  a  count  field  or  other
accumulated values that should be printed.

To reduce  the total number  of bytes in  the first array  (ZA), values
-999  to +999  are used  (this allows a  2 byte  packed value  for 9999
elements).  The values placed in ZA  begin with -999 rather than the  1
shown in  the  example.   A value  of 1000  is  added to  the value  to
determine the element number in the second array (ZB).

ANZRPGLKP Command parameters                          *CMD
----------------------------

   FILE          The  qualified file  name of  the  file.   The library
                 value  defaults to  *LIBL.  *CURLIB  may be specified.

   FIELD1        The first (or  only) field to  be used for the  unique
                 value.   It must be either  character, packed decimal,
                 or  zoned decimal.    The length  of the  field cannot
                 exceed  20  bytes  (this  is  a  restriction   of  the
                 ANZRPGLKP  command and  not  of  the Alternate  Lookup
                 technique).

                 The   parameters  FIELD2  and  FIELD3   let  you  name
                 additional fields that make  up the value to  compare.
                 If different  field  names are  used, the  sum of  the
                 lengths   cannot   exceed   20   bytes  (this   is   a
                 restriction  of the  ANZRPGLKP command and  not of the
                 Alternate Lookup technique).

   FSTPOS        The first  (leftmost)  position  to be  used  for  the
                 random  value.   The  default is  1.   If  you have  a
                 typical  character  field,  you  should probably  take
                 the default for this  parameter and the  corresponding
                 parameters for your  first attempt.  Then  you can try
                 alternative solutions depending on your data.

                 If  you have a  packed decimal  field of LEN(5  0) and
                 you want the low order  positions (3-4-5) to be  used,
                 specify 3.

   FIELD2        The second  field  to be  used for  the unique  value.
                 The default  is *FIELD1 meaning  it is the  same field
                 as  FIELD1.   Specifying a different  field allows you
                 to  have a  combination  of  fields which  make  up  a
                 unique  value.    FIELD2  must  be  either  character,
                 packed decimal, or zoned decimal.

   SECPOS        The  second position to be used  for the random value.
                 The default  is  2.   If  you  have a  packed  decimal
                 field  of  LEN(5  0)  and   you  want  the  low  order
                 positions  (3-4-5) to be  used, specify 4.   The entry
                 refers  to  where  in  FIELD2  you  want  the   second
                 character used for the random value.

   FIELD3        The  third field  to  be used  for  the unique  value.
                 The default  is *FIELD1 meaning  it is the  same field
                 as FIELD1.

   THRPOS        The  third position to  be used for  the random value.
                 The default  is  3.   If  you  have a  packed  decimal
                 field  of  LEN(5  0)   and  you  want  the  low  order
                 positions (3-4-5) to be used, specify 5.

   NBRRCDS       The  number of  records in  the file to  be processed.
                 The default  is 10,000.   You  can normally  determine
                 the value of  the RPG Alternate Lookup  method without
                 processing  an   entire  large  file.    *ALL  may  be
                 specified to process all of  the records in the  file.

   MBR           The member  to be processed.   The default  is *FIRST.

   LOOKUP        Whether  the  normal  lookup  should  be  used.    The
                 default  is *YES.   The normal  lookup is  done by use
                 of a separate program  to provide a comparison.   Once
                 you  determine you  want to  use the  alternate lookup
                 technique  and  are  using  the  ANZRPGLKP command  to
                 determine the best random  characters to use, you  can
                 avoid the separate program by specifying *NO.

Restrictions
------------

The number of unique  values that can be processed  has a maximum limit
of  2000.   The program  may report an  error at  some point  less than
2000 (it is  data dependent).   Use the ANZRPGLKP  command to help  you
determine if the technique will be effective on your data.

There are  restrictions described  for the  ANZRPGLKP command  relative
to  the length of  the field to  compare (20 bytes  or less).   This is
not a restriction relative to the Alternate Method.

You  should limit  the size  of the  second array  (ZB) to  avoid EAOs.
See the previous discussion.

Prerequisites
-------------

The following TAA Tools must be on your system:

     FILEFDBCK    File feedback
     HLRMVMSG     HLL Remove message
     RTVSYSVAL3   Retrieve system value 3
     SNDCOMPMSG   Send completion message
     SNDESCMSG    Send escape message
     SNDSTSMSG    Send status message

Implementation
--------------

None, the tool is ready to use:

Objects used by the tool
------------------------

   Object        Type    Attribute      Src member    Src file
   ------        ----    ---------      ----------    ----------

   ANZRPGLKP     *CMD                   TAARPGF       QATTCMD
   TAARPGFC      *PGM       CLP         TAARPGFC      QATTCL
   TAARPGFR      *PGM       RPG         TAARPGFR      QATTRPG
   TAARPGFR2     *PGM       RPG         TAARPGFR2     QATTRPG
   TAARPGFR3     *PGM       RPG         TAARPGFR3     QATTRPG

Structure
---------

ANZRPGLKP    CMD
  TAARPGFC     CLP
    TAARPGFR      RPG
    TAARPGFR2     RPG

TAARPGFR3   RPG  is the sample code that you should copy
  to one of your members and then modify to use the technique.
					

Added to TAA Productivity tools April 1, 1995


Home Page Up to Top