The UPS Monitor tool provides a solution for achieving a normal
system power down after a power outage occurs and a delay time has
been reached. You must have a UPS attached to the system and
understand its capabilities. You must ensure the UPSMON job is
running.
The system supports several UPS (Uninterruptible Power Supply)
functions. If the defaults are taken, you will generally see an
abnormal system termination when a power outage occurs and thus an
abnormal IPL when the system is powered up.
The UPSMON tool allows for a normal system termination. While this
will generally result in many abnormal job terminations, the tool
provides a solution for allowing a normal IPL and avoids potential
object damage situations.
Testing commands exist to help you test your options. See the
section on 'Testing UPSMON'.
Understanding the support
-------------------------
Determining the delay times to be used
--------------------------------------
You need to know the capabilities of your UPS and how long it will
take to do a PWRDWNSYS function. The difference between the two
times (plus a safety margin) is the number of seconds you can delay
when a power outage occurs.
When determining how long it takes to do PWRDWNSYS, you must consider
how long it will take to abnormally end many jobs all at the same
time. This can be considerably longer than ending jobs normally
spread over some time period.
Included in your safety margin should be some consideration for what
happens when power is fluctuating. Each time power is lost, the UPS
device will lose some of its charge. When power is restored, the UPS
device will start recharging, but may not be fully charged for some
time. If power is fluctuating, you may not have as much battery life
as in a normal power outage situation.
UPS System Values
-----------------
There are two system values associated with the system UPS support.
** QUPSDLYTIM. The number of seconds to delay before the system
powers down. The shipped default is *CALC which means the
system calculates the value. A specific time value or *NOMAX
may also be specified.
If you are using LPAR, you may need to to use the system value
only on the primary partition or on all partitions. This
applies only to QUPSDLYTIM and not QUPSMSGQ. Consult with IBM
to determine or see the Logical Partitions topic in the
Information Center.
** QUPSMSGQ. This allows you to name a message queue that will
receive messages from the system relative to a power outage.
The default is QSYSOPR.
The system has some unique rules when it is running on auxiliary
power. The two major rules are:
** If QUPSDLYTIM is *NOMAX and the message queue specified for
QUPSMSG is not allocated to a job, the system is powered down
immediately.
** If PWRDWNSYS *IMMED is specified, the system shuts down as
fast it can which will cause an abnormal IPL.
Intent of UPSMON tool
---------------------
The intent of the UPSMON tool is to:
** 'Ride out' temporary outages.
** Provide options to allow you to tailor your specific needs.
** Ensure and orderly shutdown.
** Prevent an abnormal termination which requires a significantly
longer IPL.
To achieve this, the UPSMON tool requires that the QUPSDLYTIM value
be a specific time value. *NOMAX is not supported because of the
exposure of the message queue not being allocated which will cause
the system to power down immediately.
Even if you expect the UPSMON tool to be active all of the time,
there are still situations where the tool cannot be running such as
when you IPL the system or shutdown to the restricted state. The
design of the UPSMON tool is to set the QUPSMSGQ to the TAA
requirement only when the UPSMON tool is running. If ENDUPSMON is
used to end the UPSMON job, the message queue is reset to QSYSOPR.
See the later sections such as 'Power outages during IPL'.
CHGUPSMON Command
-----------------
The CHGUPSMON command must be used to establish the environment. An
*ALLOBJ special authority user must enter the command.
** The UPSDLYTIM prompt (default is 99998 seconds) will be used
to set the QUPSDLYTIM system value. A value other than *NOMAX
or *CALC must be entered.
The 99998 default is used because the other CHGUPSMON values
are used to control powering down rather than let the system
value cause the power down.
A diagnostic occurs if the command is run in a secondary LPAR
changed in the primary partition. Newer hardware does not
have a primary or secondary connotation and any partition may
set the system value.
** The TAAUPSMON data area in TAASECURE is set to the values you
specify:
Parameter Default Description
--------- ------- -----------
PWRDLY 180 Number of seconds to delay after a
power outage occurs. See the later
comments about LPAR systems.
ENDSBSDLY 15 Number of seconds to specify for the
ENDSBS DELAY parameter
PWRDWNDLY 180 Number of seconds to specify for the
PWRDWNSYS DELAY parameter
FLUXMAX 3 Number of times power is allowed to
fluctuate (utility power lost)
within the FLUXINT number of seconds
FLUXINT 900 Number of seconds for the interval
that power is allowed to fluctuate
UPSDLYTIM 99998 Number of seconds to specify for the
QUPSDLYTIM system value.
EXIT1DLY 0 Number of seconds to delay before
invoking the Exit 1 program
EXITPGM1 *NONE The program to be called for each
power loss message received
EXITPGM2 *NONE The program to be called prior to
ending subsystems
EXITPGM3 *NONE The program to be called after power
has been restored
EXITPGM4 *NONE The program to be called if an ENDxxx
OPTION(*CNTRLD) is used to end
the UPSMON job
For a more detailed discussion of these parameters, see other
sections such as 'CHGUPSMON Parameters' and 'UPSMON job'.
CHGUPSMON uses a prompt override program to extract the values from
the TAAUPSMON data area in TAASECURE. This allows you to key over
the existing values.
Any use of CHGUPSMON will take effect when the UPSMON job starts or
receives an expected message (CPF1816 or CPF1817).
Exit Programs
-------------
Exit programs may be specified to allow you to perform some unique
function. For example, you may want a program to send warning
messages or begin an orderly shutdown of some function. If you have
a high availability solution, check with your vendor on what should
be shutdown.
The default is that no exit programs will be invoked. You may take
the default or use one or more of the exit points.
** The first exit occurs anytime a power loss message (CPF1816)
occurs. See the EXIT1DLY parameter for specifying the number
of seconds to wait before calling the program. Note that if
power is fluctuating, you may want to set some indication the
first time the program runs so that you do not repeat the
function being performed.
** The second exit program will be called just before ending the
subsystems. At that point the system will definitely be
shutdown.
** The third exit program will be called after power has been
received (the system sent the CPF1817 message).
** The fourth exit program will be called if the UPSMON job is
ended by a command such as ENDJOB OPTION(*CNTRLD) DELAY(nn).
A DELAY value of at least 30 seconds which is default must be
used to allow the UPSMON job to end normally.
Using OPTION(*CNTRLD) allows the UPSMON job to check for the
'end status' indicator within the job. Several ENDxxx
commands such as ENDSBS and PWRDWNSYS will also set the 'end
status' indicator if OPTION(*CNTRLD) is specified.
If you use the first two exit points, you should consider how long
the programs will take as you calculate what your values should be
for the CHGUPSMON delay parameters.
See the section on 'Testing UPSMON' for how you can include a test
function within your exit programs.
UPSMON job
----------
A job must be continuously running to listen for messages being sent
to the TAAUPSMON message queue. The STRUPSMON command will submit
the UPSMON job to listen for the messages. There are two typical
solutions for submitting the job:
** Place the TAATOOL/STRUPSMON command into your startup job.
** As an *ALLOBJ user, run the STRUPSMON2 command which will add
an autostart job to your controlling subsystem. This uses the
UPSMON job description which will run the TAATOOL/STRUPSMON
command. The autostart job will run under the QPGMR profile.
QPGMR is not authorized to the UPSMON job description until
the STRUPSMON2 command is run.
STRUPSMON has no parameters and submits the UPSMON job to the
TAAUPSMON job queue to run in the TAAUPSMON subsystem. The special
subsystem and job queue are provided by the tool and should not be
used for any other purpose. The UPSMON job can be seen using
WRKACTJOB and will normally be in a MSGW status.
The UPSMON job runs the TAASYTLC12 program. This program adopts the
QSECOFR profile so it is assured of having CHGSYSVAL authority and
*JOBCTL authority to end subsystems, jobs, and to power down the
system.
If you are not familiar with how to modify the startup program to
include the TAATOOL/STRUPSMON command, see the later section 'The
system startup program'.
The UPSMON job retrieves the information from the TAAUPSMON data area
in TAASECURE. If the value for the QUPSDLYTIM system value differs
from what is specified with CHGUPSMON, CHGSYSVAL is used.
The UPSMON job then ensures that the QUPSMSGQ message queue value is
TAAUPSMON in TAATOOL and allocates the message queue. This does not
prevent other jobs from sending messages to the queue.
The message queue is cleared of any existing messages. Note that if
a power loss occurs before the time the clear of the message queue,
the UPSMON job will not detect the problem. Clearing of the message
queue occurs to prevent the job taking action on a message that
occurred in the past.
The job waits for 15 seconds or for a message to arrive on the
message queue.
If no message is received a timeout occurs. A check is made to see
if the job 'end status' indicator has been set such as by a function
like ENDJOB OPTION(*CNTRLD). If so, the program specified for
EXITPGM4 will be called if one exists (default is *NONE) and the
UPSMON job will then end normally. Note that the 'end status'
indicator is also set if you end the TAAUPSMON subsystem while the
UPSMON job is active.
If the 'end status' indicator is not set, the program loops to wait
again for a message or 15 seconds.
If a message arrives, the message ID and the 'sender' information are
sent to the job log.
If a message other than CPF1816 (power outage) arrives, the message
is ignored.
If CPF1816 arrives, the UPSMON retrieves the current values for the
TAAUPSMON data area in TAASECURE. This allows you to change the
values with CHGUPSMON while the UPSMON job is running.
The UPSMON job then checks to see if a value has been specified for
the EXIT1DLY parameter. If so, the job delays for that amount of
time. At the end of the time or if EXIT1DLY(0) is specified, the
UPSMON job will call the program specified for the EXITPGM1 parameter
if one exists. The default is *NONE meaning no program is called.
If you have an Exit 1 program, it is called each time CPF1816 is
received after the optional time delay.
After calling your program (if any), the program waits on the message
queue for the time specified for the PWRDLY parameter on CHGUPSMON.
If the CPF1817 (power restored) message does not appear before the
delay time has expired, the system begins to power down.
A check is made for 'end status' each time a message is received or a
timeout occurs. A timeout occurs every 15 seconds within the PWRDLY
time. This allows for a command such as ENDJOB OPTION(*CNTRLD) to
end the job. Prior to ending the job, the QUPSMSGQ value is reset to
QSYSOPR.
Each time a timeout occurs or a message is received that is not
CPF1817, a re-calculation is made of the remaining seconds for the
PWRDLY value.
Multiple steps are used for powering down. PWRDWNSYS *IMMED cannot
be used because the system interprets this differently when operating
on auxiliary power.
** The RUNPTY of the job is set to 1.
** The program named on the EXITPGM2 parameter is called if it
has been specified. The default is *NONE. This allows you to
end certain functions before subsystems are ended.
** All active subsystems are ended except TAAUPSMON and the
Controlling Subsystem (the Controlling Subsystem cannot be
completely ended by ENDSBS). For each active subsystem, the
command used is:
ENDSBS SBS(xxx) OPTION(*CNTRLD) DELAY(nnn)
where nnn is the value you specified for the CHGUPSMON command
ENDSBSDLY parameter.
** Each active job in the controlling subsystem is ended by:
ENDJOB JOB(xxx) OPTION(*CNTRLD) DELAY(nnn)
where nnn is the value you specified for the CHGUPSMON command
ENDSBSDLY parameter.
** The PWRDWNSYS command is then issued with OPTION(*CNTRLD) and
the PWRDWNDLY value as specified on the CHGUPSMON command.
To account for power fluctuations, the tool will power down the
system if utility power has been lost and restored a specified number
of times within a specified interval. See the section on 'Power
fluctuations'.
A job log is always produced by the UPSMON job that will describe all
the messages received and the action taken.
Messages are sent to the QSYSOPR message queue when utility power is
lost or restored. If a power down will occur, a message is also sent
to QSYSOPR stating the reason.
CHGUPSMON Parameters
--------------------
CHGUPSMON supports several parameters which may best be described by
a time chart:
- - - - - - - - - - - UPS Battery Time - - - - - - - - - - -
*************************************************************
- - PWRDLY - - EndSbs ENDSBSDLY JobsEnd
************** ****** ********* *******
Optional delay
*
EXITPGM1 EXITPGM2
* *
- - PWRDWNDLY - - JobsEnd
****************** *******
The UPS Battery Time describes the power capability of your UPS. You
want to provide a safety margin (the full capability of the UPS
should not be used).
The PWRDLY value allows you to 'ride out' a power outage for the
length of time specified. If power is not restored during this
period, the tool will begin to power down the system.
If you have devices or controllers that are not powered by the UPS,
you will probably see other jobs failing during the 'ride out' time.
If you have specified an EXITPGM1 program, it is called each time a
CPF1816 power loss message is received. A delay time EXIT1DLY may be
specified prior to calling the program (the default is 0 seconds).
The intent of this delay is to allow you to 'ride out' some temporary
outages before your exit program takes control. For example, you may
want to hold job queues in the Exit 1 program. The default for Exit
program 1 is *NONE.
The power down sequence begins at the 'EndSbs' label in the time
chart by 1) Calling the program named for the EXITPGM2 parameter.
The default is *NONE. This allows you to end certain critical
functions. 2) Ending subsystems (except for the controlling
subsystem and TAAUPSMON) and 3) Ending jobs in the controlling
subsystem (the system does not allow the controlling subsystem to be
ended, but jobs may be ended). It takes a small amount of time to
issue the required commands, but this does cause the abnormal
termination of jobs at that point.
When the 'EndSbs' function is complete, the tool issues the PWRDWNSYS
command with OPTION(*CNTRLD) and a DELAY time as specified by the
PWRDWNDLY parameter. Note that PWRDWNSYS OPTION(*IMMED) is not used
because the system intercepts this when operating on auxiliary power
and causes a system abnormal termination.
Both the ENDSBS and ENDJOB commands specify OPTION(*CNTRLD) with a
DELAY time as specified by the ENDSBSDLY value. This allows jobs
that are listening for a shutdown to terminate normally. In most
cases, jobs will not be listening and will be ended abnormally when
the ENDSBSDLY time ends. This can take a considerable amount of time
if many jobs are active.
Your ENDSBSDLY time should be reasonably small (such as 15 to 30
seconds) and should not exceed the PWRDWNDLY time. The best solution
is to ensure that most jobs are ended before the system PWRDWNSYS
delay time has expired. Some system supplied subsystems such as
QSERVER and QSYSWRK may have jobs that will remain active even though
you have requested to end the subsystem. The PWRDWNSYS function will
end these remaining jobs.
See the section on 'Power fluctuation' for a discussion of the
FLUXMAX and FLUXINT values.
LPAR systems
------------
In an LPAR system, the UPSMON function needs to be running in each
partition.
For older hardware, a power down in the primary partition will cause
a signal to be sent to the secondary partitions, the PWRDLY time for
the primary partition should be a longer time than for the
secondaries. You should allow for enough time for the secondaries to
issue the PWRDWNSYS *CNTRLD command before the primary partition
issues the command. primary partition.
For newer hardware, each partition stands alone. For older hardware,
the QUPSDLYTIM cannot be changed in a secondary partition.
Example 1
---------
Assume you have no exit programs and have specified:
PWRDLY 20
ENDSBSDLY 25
PWRDWNDLY 30
If a power outage message was received, no action would be taken for
20 seconds. If the power restored message was not received, the
system would begin powering down.
All subsystems would be ended with a delay time of 25 seconds. Any
jobs that are listening for a 'controlled cancel' would go into their
ending function. If a job was not listening for a 'controlled
cancel' (most jobs do not), the jobs would begin to end after 25
seconds.
The UPSMON job does not wait for either the subsystems to end or the
subsystem jobs to end. Using ENDSBS causes the system to send
signals to the subsystem and the subsystem jobs. A minimal amount of
processing occurs here. The processing of ending a job abnormally
does not occur within the UPSMON job. As soon as the ENDSBS commands
have completed, the system proceeds with the next CL command.
Any jobs in the controlling subsystem would be ended with a delay
time of 25 seconds. As with ENDSBS, ENDJOB only signals jobs to end.
As soon as the ENDJOB commands (if any) have been issued, the system
proceeds with the next CL command.
A PWRDWNSYS DELAY(30) would be issued. Any jobs that had not been
scheduled for shutdown, would be notified. Once again the system
does not wait for jobs to be ended. It sends signals to jobs and
then proceeds with the next CL command (RETURN) which would end the
UPSMON job.
Thus the time line would appear as:
- - - - - - - - - - - UPS Battery Time - - - - - - - - - - -
*************************************************************
No action
for 20 Seconds
EndSbs DELAY(25)
End Controlling Sbs Jobs DELAY(25)
PWRDWNSYS DELAY(30)
As soon as the EndSbs function is started, any jobs listening for a
'controlled cancel' would begin to shutdown. Since most jobs do not
listen, the jobs in these subsystems would begin to end after 25
seconds.
The same would occur for the jobs in the controlling subsystem.
The PWRDWNSYS DELAY(30) command occurs next and would signal an end
to any jobs that were still active and had not already been told to
end. The UPSMON job should end normally after issuing PWRDWNSYS.
Note that the 3 steps of ending occur in sequence, but do not wait
for a completion of the previous step. After the command is accepted
by the system, the appropriate jobs are informed of the shutdown.
The individual jobs would probably not end at that instant. It can
be considered as all 3 steps occurring simultaneously. Because many
jobs will be signalled to end at approximately the same time, the
system will be very busy if many jobs are active.
If power is restored during the wait for 20 seconds, the UPSMON job
considers the FLUXACT and FLUXINT parameter values. If this is the
first outage, the program just returns to wait for another power
outage message.
If power is restored during the 3 step shutdown process, the message
is ignored and powering down continues.
Example 2
---------
Assume you have specified:
PWRDLY 10
ENDSBSDLY 25
PWRDWNDLY 30
EXIT1DLY 15
EXITPGM1 ABC
EXITPGM2 DEF
The time line would appear as:
- - - - - - - - - - - UPS Battery Time - - - - - - - - - - -
*************************************************************
No action
for 15 Secs
Exit 1
Pgm
PWRDLY
10 Secs
Exit 2
Pgm
3 Steps of powering down
If a power outage message was received, no action would be taken for
for 15 seconds (Exit 1 DLY time). Then program ABC would be run.
The PWRDLY wait time of 10 seconds would then occur.
If power is not restored, the system would begin to power down. The
first step would be to call program DEF (the Exit 2 program). When
the program completes, power down begins as in Example 1.
If power is restored any time prior to calling the Exit 2 program
(DEF), the UPSMON job considers the FLUXACT and FLUXINT parameter
values. If this is the first outage, the program just returns to
wait for another power outage message. If an Exit 3 program had been
specified and the power restored message occurs during the PWRDLY
time, the Exit 3 program would be called before waiting for another
power outage message.
If power is restored after the Exit 2 program (DEF), the message is
ignored and powering down continues.
Getting started
---------------
** As an *ALLOBJ special authority user, prompt for the CHGUPSMON
command and change any of the values to fit your environment.
The values are placed in the TAAUPSMON data area in TAASECURE
and are used by the UPSMON job. The QUPSDLYTIM system value
is set to the value you specify.
** Enter the TAATOOL/STRUPSMON command in your startup job or use
STRUPSMON2 to add an autostart job to the controlling
subsystem. The STRUPSMON command has no parameters.
DSPUPSMON Command
-----------------
The DSPUPSMON command allows you to review the settings of the
QUPSxxx system values and the values specified on CHGUPSMON.
Testing UPSMON
--------------
A test function exists to assist you in testing the various UPSMON
conditions. Only an *ALLOBJ user may enter the commands associated
with testing.
Two functions are available:
** The STRUPSTST command will create the TSTUPSMON data area in
TAATOOL. If the data area exists when the UPSMON batch job
runs, it will not power down the system if a power outage or a
simulated power outage occurs. Instead, it ends the job
normally just prior to entering the power down phase of the
job. All exit programs you have specified would be run
depending on the messages received. This allows you to test
with the SNDUPSTST function described later.
When you are finished testing, enter:
ENDUPSTST
which deletes the data area and causes the UPSMON job to act
normally.
The power down phase of the UPSMON job checks for the data
area just prior to powering down. An appropriate message is
sent to the job log if the data area exists.
** SNDUPSTST allows you to send a message to the TAAUPSMON
message queue as a simulation of the system sending a power
related message. The two message IDs that you would normally
send are:
- CPF1816 Utility power has been lost
- CPF1817 Utility power has been restored
For example, to simulate the system sending the CPF1816
message, you would enter:
SNDUPSTST MSGID(CPF1816)
To test the affect of a non-power related message, use the
TAA9891 message ID.
You may use SNDUPSTST with or without the STRUPSTST command
function. If you run without a prior STRUPSTST command, you
could cause the system to be powered down.
Using the STRUPSTST, ENDUPSTST, SNDUPSTST commands will allow you to
see the actions that the UPSMON job takes. A job log is always
produced.
You may also use the same technique in your Exit programs. For
example, to bypass your normal code for an Exit program, you could
enter:
DCL &PGMNAM *CHAR LEN(10)
DCL &PGMLIB *CHAR LEN(10)
.
CHKOBJ OBJ(TAATOOL/TSTUPSMON) OBJTYPE(*DTAARA)
MONMSG MSGID(CPF9801) EXEC(DO) /* No test *DTAARA */
RCVMSG MSGTYPE(*EXCP)
GOTO PROCESS
ENDDO /* No test *DTAARA */
RTVPGMNAM PGMNAM(&PGMNAM) PGMLIB(&PGMLIB)
SNDPGMMSG MSG('The Exit program ' *CAT &PGMNAM +
*TCAT ' in ' *CAT &PGMLIB *TCAT +
' functions would have been run.')
RETURN
PROCESS: /* Your normal processing */
The message that is sent will appear in the job log of the UPSMON
job.
End Job Abnormal (ENDJOBABN)
----------------------------
If the ENDJOBABN command is run, the next IPL will be considered
abnormal with or without a power failure. The system forces an
abnormal termination to cause the running of several recovery
programs at IPL. A message will exist in QHST (CPI0990) describing
what happened.
Assume an ENDJOBABN command was followed later by a power outage that
was handled by UPSMON so that the the system remained active. On the
subsequent IPL, an abnormal termination would occur.
Assume an ENDJOBABN command was followed later by a power outage that
was handled by UPSMON which caused the the system to power down. On
the next IPL, an abnormal termination would occur.
The system startup program
--------------------------
If you are using the system default startup program, the system value
QSTRUPPGM will specify QSTRUP in QSYS. The simplest solution would
be to tailor this program to include your unique startup functions.
Use RTVCLSRC to retrieve the source. Add the TAATOOL/STRUPSMON
command before the RETURN command. Then create your own version of
the program in your own library and change the QSTRUPPGM system value
to refer to your program and library. Note that you should not
replace the QSTRUP program in QSYS.
If you already have a unique startup program, add TAATOOL/STRUPSMON
at a location where it is sure to run.
If you end all subsystems to reach the restricted state, the
TAAUPSMON subsystem and UPSMON job will be shutdown. When you start
the controlling subsystem, the startup program will automatically
run.
Note that the system supplied startup program may change on each
release. A good tool to compare your version with the shipped system
version is the TAA Tool CMPSTRUP.
An alternative to using the startup program is to use the STRUPSMON2
command to add an autostart job to the controlling subsystem.
Impact of installing a new version of the TAA Productivity Tools
----------------------------------------------------------------
When a new version of the TAA Productivity Tools is installed, the
TAATOOL library will be cleared. Before clearing the library, the
install process will check if the TAAUPSMON subsystem is active. If
so, the TAAUPSMON subsystem will be ended and is automatically
restarted when the install is complete.
The objects in the TAASECURE library (TAAUPSMON data area containing
the delay times) is not cleared. The existing information is
retained.
Ending and Starting the UPSMON job
----------------------------------
If you need to end the UPSMON job, use the ENDUPSMON command. This
will issue an ENDJOB with a delay of 30 seconds and allow the UPSMON
job to end properly. This causes the QUPSMSGQ system value to be
reset to QSYSOPR. ENDUPSMON will also end the TAAUPSMON subsystem.
If you want to resubmit the UPSMON job, enter:
STRUPSMON
If the TAAUPSMON subsystem is not active it will be started.
Power Fluctuations
------------------
The UPSMON job senses when utility power has been restored (CPF1817
message) after a power outage has occurred (CPF1816 message). To
prevent the battery from being depleted, the job checks the number of
times power has been restored. If power is lost n times within a
specified interval, the system is powered down.
You control the number of times and the interval with the FLUXMAX and
FLUXINT parameters on CHGUPSMON. The defaults will cause a power
down if power is lost 3 times within a 15 minute interval. If the
interval has expired since the first power outage, the count value is
reset.
This solution is designed to protect the battery from being depleted
by a series of interruptions. Although the battery will start
recharging when utility power is restored, it may not be at its full
capacity if power is lost again within a few minutes.
Power Outages during IPL, the Restricted State, etc.
----------------------------------------------------
If *NOMAX is specified for the QUPSDLYTIM system value and a power
outage occurs, the system checks the message queue specified for the
QUPSMSGQ system value. If the message queue is not allocated to a
job, the system invokes an immediate power down.
For this reason, *NOMAX is not a good choice. The CHGUPSMON default
of 99998 prevents this immediate power down.
If the UPSMON job is running, the message queue (TAAUPSMON in
TAATOOL) will be allocated. However, there are times when the job
will not be running.
** IPL. The startup program will not submit the UPSMON job until
late in the IPL sequence.
** Restricted State. Since all jobs are ended, the UPSMON job
will not be running.
** When you have ended the UPSMON job on purpose or by mistake.
** When the TAA Productivity Tools are being installed.
It is possible to specify a value for QUPSDLYTIM which will allow the
system to 'ride out' a power outage during these periods when UPSMON
is not running.
You may want to consider specifying a time value (or *CALC) when the
UPSMON job is not running. When the UPSMON job is started, it will
reset the QUPSDLYTIM value according to the value specified on
CHGUPSMON.
Ensuring the UPSMON job is running
----------------------------------
The TAA JOBACT tool may be used to ensure the UPSMON job
is running whenever the normal subsystems are up.
You can send a message or cause a command to be run if
the UPSMON job is not active.
Other tools of interest
-----------------------
You may want to consider the following tools in your shutdown and
startup functions:
- ENDJOB2 End job 2
- HLDALLJOBQ Hold all job queues
- HLDALLOUTQ Hold all output queues
- HLDALLWTR Hold all writers
- RLSALLJOBQ Release all job queues
- RLSALLOUTQ Release all output queues
- RLSALLWTR Release all writers
ENDUPSMON escape messages you can monitor for
---------------------------------------------
TAA9891 The UPSMON job is not active
Escape messages from based on functions will be re-sent.
CHGUPSMON Command *CMD
-----------------
PWRDLY The number of seconds to delay after a power outage
occurs and before powering down the system. The
default is 180 seconds. A number of seconds must be
entered in a range of 15 to 99999.
You must provide enough time to allow the system to
power down normally.
Note that the QUPSDLYTIM system value is set to a
specific time value when CHGUPSMON is run. The
default is 99998, but the other options you specify
on CHGUPSMON control when power down occurs. The
delay time value you enter for PWRDLY is used during
the running of the UPSMON job.
For LPAR systems, the PWRDLY time for the primary
partition should be longer than that used for the
secondary partitions.
See the previous discussions for full details.
ENDSBSDLY The number of seconds to delay when ending
subsystems. The default is 15 seconds. The value
is specified for the ENDSBS command DELAY parameter.
The value is also used for the ENDJOB DELAY
parameter for those jobs ended from the controlling
subsystem.
See the previous discussions for full details.
PWRDWNDLY The number of seconds to use on the PWRDWNSYS DELAY
parameter. The default is 180 seconds.
See the previous discussions for full details.
FLUXMAX The number of times that power may be lost within
the FLUXINT value before powering down the system.
The default is 3.
See the previous discussions for full details.
FLUXINT The interval of time in seconds that will be
considered for the FLUXMAX count before powering
down the system. The default is 900 seconds (15
minutes).
See the previous discussions for full details.
UPSDLYTIM The amount of time to be specified for the
QUPSDLYTIM system value. 99998 is the default to
allow the other options to control shutdown.
A number between 0 and 99999 may be specified. Note
that *NOMAX or *CALC may not be specified (see
previous discussion).
EXIT1DLY The amount of time to delay after the CPF1816 power
has been received and before the Exit 1 program is
called. The default is 0 seconds. The intent of
this parameter is to allow you to wait out a brief
outage before taking any action in your exit
program. If no Exit 1 program exists, no error
occurs.
If a delay time is specified, the PWRDLY time does
not begin until the Exit 1 program has completed.
EXITPGM1 The exit program to be called for every CPF1816
power loss message received. The default is *NONE
meaning no program is called.
A program and qualified library may be described to
allow you to perform some action for each power loss
message received. If power is fluctuating, the
program may be called multiple times.
EXITPGM2 The exit program to be called before ending
subsystems. The default is *NONE meaning no program
is called.
A program and qualified library may be described to
allow you to perform some action prior to ending
subsystems.
EXITPGM3 The exit program to be called after power has been
restored and the CPF1817 message is received. The
default is *NONE meaning no program is called.
A program and qualified library may be described to
allow you to perform some action after power has
been restored. The program is called before
checking the FLUXMAX value. If power is
fluctuating, the program may be called multiple
times.
EXITPGM4 The exit program to be called if an ENDxxx command
such as ENDJOB OPTION(*CNTRLD) DELAY(nn) is used to
end the UPSMON job. The DELAY value must be at
least 30 seconds which is the default to allow the
job 'end status' indicator to be set and the job to
end normally. The default is *NONE meaning no
program is called.
A program and qualified library may be described to
allow you to perform some action if the job is being
ended in a controlled manner.
STRUPSMON Command *CMD
-----------------
The command has no parameters.
STRUPSMON2 Command *CMD
------------------
The command has no parameters.
ENDUPSMON Command *CMD
-----------------
The command has no parameters.
DSPUPSMON Command *CMD
-----------------
The command has no parameters.
STRUPSTST Command *CMD
-----------------
The command has no parameters.
ENDUPSTST Command *CMD
-----------------
The command has no parameters.
SNDUPSTST Command *CMD
-----------------
MSGID The message ID to be sent to the TAAUPSMON message
queue. The typical messages to be sent are CPF1816
(Utility power has been lost) and CPF1817 (Utility
power has been restored). The messages are in the
QCPFMSG file in QSYS.
Any message ID from any message file may be sent to
see the affect of a non-power related message in the
UPSMON job. For a test of a non-power related
message, use message ID TAA9891 in message file
TAAMSGF in TAATOOL.
If a message other than CPF1816, CPF1817, or TAA9891
is sent, no message data will exist when the message
is sent. Therefore, the message in the UPSMON job
log will appear without any replacement variables.
MSGF The qualified name of the message queue to send the
message from. The default is QCPFMSG in *LIBL.
Restrictions
------------
Only an *ALLOBJ authority user can use CHGUPSMON, STRUPSTST,
ENDUPSTST, and SNDUPSTST.
Prerequisites
-------------
The following TAA Tools must be on your system:
ADDTIM Add time
CHKALLOBJ Check *ALLOBJ special authority
CHKJOBCTL Check *JOBCTL special authority
CLCTIMDIF Calculate time difference
CVTTIM Convert time
CVTWRKACT Convert WRKACTJOB
CVTWRKSBS Convert WRKSBS
EDTVAR Edit variable
RSNLSTMSG Resend last message
RTVDAT Retrieve date
RTVIPLSTS Retrieve IPL status
SNDCOMPMSG Send completion message
SNDESCINF Send escape information
SNDESCMSG Send escape message
Implementation
--------------
The CHGUPSMON command must be used to describe the parameters for
your environment. The command also sets the QUPSxxx system values.
The STRUPSMON command must be entered in your startup program or an
autostart job entry made in the controlling subsystem (see the
STRUPSMON2 command).
Objects used by the tool
------------------------
Object Type Attribute Src member Src file
------ ---- --------- ---------- ----------
CHGUPSMON *CMD TAASYTL QATTCMD
STRUPSMON *CMD TAASYTL2 QATTCMD
DSPUPSMON *CMD TAASYTL3 QATTCMD
STRUPSMON2 *CMD TAASYTL4 QATTCMD
SNDUPSTST *CMD TAASYTL5 QATTCMD
STRUPSTST *CMD TAASYTL6 QATTCMD
ENDUPSTST *CMD TAASYTL7 QATTCMD
ENDUPSMON *CMD TAASYTL8 QATTCMD
TAASYTLC *PGM CLP TAASYTLC QATTCL
TAASYTLC2 *PGM CLP TAASYTLC2 QATTCL
TAASYTLC3 *PGM CLP TAASYTLC3 QATTCL
TAASYTLC4 *PGM CLP TAASYTLC4 QATTCL
TAASYTLC5 *PGM CLP TAASYTLC5 QATTCL
TAASYTLC6 *PGM CLP TAASYTLC6 QATTCL
TAASYTLC7 *PGM CLP TAASYTLC7 QATTCL
TAASYTLC8 *PGM CLP TAASYTLC8 QATTCL
TAASYTLC11 *PGM CLP TAASYTLC11 QATTCL
TAASYTLC12 *PGM CLP TAASYTLC12 QATTCL
TAASYTLC13 *PGM CLP TAASYTLC13 QATTCL
TAASYTLC22 *PGM CLP TAASYTLC22 QATTCL
TAASYTLC23 *PGM CLP TAASYTLC23 QATTCL
TAASYTLD *FILE DSPF TAASYTLD QATTDDS
TAAUPSMON *SBSD
TAAUPSMON *JOBQ
TAAUPSMON *MSGQ
TSTUPSMON *DTAARA
UPSMON *JOBD
The TAAUPSMON data area will exist in TAASECURE.
The TSTUPSMON data area is created by STRUPSTST and deleted by
ENDUPSTST.
Structure
---------
CHGUPSMON Cmd
TAASYTLC CL Pgm
TAASYTLC11 CL Pgm for Command prompt override
STRUPSMON Cmd
TAASYTLC2 CL Pgm that submits UPSMON job
TAASYTLC12 CL Pgm in batch for UPSMON job
TAASYTLC22 CL Pgm to do ENDSBS
TAASYTLC23 CL Pgm to do ENDJOB for controlling sbs jobs
DSPUPSMON Cmd
TAASYTLC3 CL Pgm
TAASYTD Display file
TAASYTLC13 CL Pgm Extracts values from TAAUPSMON in TAASECURE
STRUPSMON2 Cmd
TAASYTLC4 CL Pgm
SNDUPSTST Cmd
TAASYTLC5 CL Pgm
STRUPSTST Cmd
TAASYTLC6 CL Pgm
ENDUPSTST Cmd
TAASYTLC7 CL Pgm
ENDUPSMON Cmd
TAASYTLC8 CL Pgm
|