ECO NUMBER: ALPSHAD03_071 ----------- PRODUCT: OpenVMS Alpha Operating System -------- UPDATED PRODUCT: OpenVMS Alpha Operating System 7.1 ---------------- APPRX BLCK SIZE: 6858 ---------------- COVER LETTER 1 KIT NAME: ALPSHAD03_071 2 KITS SUPERSEDED BY THIS KIT: ALPSHAD02_071 3 KIT DESCRIPTION: 3.1 Version(s) of OpenVMS to which this kit may be applied: OpenVMS Alpha V7.1 3.2 In order to receive the full fixes listed in this kit the following remedial kits also need to be installed: None 3.3 Files patched or replaced: o [SYS$LDR]SYS$SHDRIVER.EXE (new image) o [SYSEXE]SHADOW_SERVER.EXE (new image) o [SYSEXE]SDA.EXE (new image) o [SYSLIB]SDA$SHARE.EXE (new image) o [SYS$LDR]SYS$BASE_IMAGE.EXE (new image) o [SYSEXE]SHOW.EXE (new image) o [SYSEXE]SYSINIT.EXE (new image) o [SYS$LDR]EXCEPTION.STB (new image) o [SYS$LDR]EXCEPTION.EXE (new image) o [SYS$LDR]EXCEPTION_MON.STB (new image) o [SYS$LDR]EXCEPTION_MON.EXE (new image) -- COVER LETTER -- Page 2 25 July 1997 4 PROBLEMS ADDRESSED IN ALPSHAD03_071 KIT o A potential system crash with SHADDETINCON bugcheck at SHDRIVER+12124 during boot from a multi-member shadow set. This occurs if the booting member is not the first in the member array, and the other member is not yet visible. o SHADDETINCON bugchecks on multiple nodes in cluster during a merge operation. System crash information ------------------------ Time of system crash: 13-APR-1997 13:21:05.59 Version of system: OpenVMS (TM) VAX Version V6.2 System Version Major ID/Minor ID: 1/0 VAXcluster node: CYV7KE, a VAX 7000-760 Crash CPU ID/Primary CPU ID: 00/00 Bitmask of CPUs active/available: 0000003F/0000003F CPU 00 reason for Bugcheck: SHADDETINCON, SHADOWING detects inconsistent state Process currently executing on this CPU: None Current IPL: 8 (decimal) CPU database address: C9212000 MPB address: B29B09C0 CPU 00 Processor stack General registers: R0 = 00000000 R1 = B67D258C R2 = B67D2180 R3 = B6544600 R4 = B35992C0 R5 = B624A340 R6 = B65447C8 R7 = 00000000 R8 = B67D2180 R9 = B6544730 R10 = 00000000 R11 = B6544600 AP = B65446B8 FP = 7FE2534C SP = C9213DAC PC = B82E42B3 PSL = 04080000 Processor registers: P0BR = C9946800 SBR = 1EF80400 ASTLVL = 00000004 P0LR = 0000018B SLR = 003FFF00 SISR = 00000010 P1BR = C9216400 PCBB = 7F7B0020 ICCS = 00000000 P1LR = 001FF116 SCBB = 1EF5F000 SID = 17000201 LDEV = 00018002 LBER = 00000000 LCNR = 00000001 LCON0 = DF0007ED LCON1 = 00000000 TODR = 44D09B64 LBECR0 = 0040003A LBECR1 = 00008060 LMODE = 000332A4 LMERR = 00000000 BIU_STAT = F00E1070 BIU_ADDR = 00000298 MMESTS = 10004005 TBSTS = 800001D0 PCSTS = FFFFF800 ISP = C9213DAC KSP = 7FFE7800 ESP = 7FFE9800 SSP = 7FFED800 USP = 7FE2534C o System crashes in SHADDETINCON SYS$SHDRIVER+3D3C0. Bugcheck Type: SHADDETINCON, SHA RBADC2 (Clustered) -- COVER LETTER -- Page 3 25 July 1997 CPU Type: AlphaServer 2100 4/233 VMS Version: V6.2-1H2 Current Process: NULL Current Image: Failing PC: FFFFFFFF 8025B3C0 Failing PS: 08000000 00000804 Module: SYS$SHDRIVER Offset: 0003D3C0 Boot Time: 15-APR-1997 08:39:31.00 System Uptime: 5 22:23 Crash/Primary CPU: 00/00 Saved Processes: 22 Pagesize: 8 KByte (8192 bytes) Physical Memory: 256 MByte (32768 PFNs) Dumpfile Pagelets: 184518 blocks Dump Flags: olddump,writecomp,errlogcomp,dump_style EXE$GL_FLAGS: poolpging,init,bugdump Stack Pointers: KSP = FFFFFFFF 8A731D88 ESP = FFFFFFFF 8A733000 SSP = FFFFFFFF 8A72D000 USP = FFFFFFFF 8A72D000 General Registers: R0 = 00000000 00000001 R1 = FFFFFFFF 8162F7E0 R2 = FFFFFFFF 8162F7C0 R3 = FFFFFFFF 8186EBC0 R4 = 00000000 00000003 R5 = FFFFFFFF 8162F890 R6 = FFFFFFFF 8186EE80 R7 = 00000000 00000000 R8 = FFFFFFFF 8162F7C0 R9 = FFFFFFFF 8186EDE8 R10 = 00000000 00000000 R11 = FFFFFFFF 8186EBC0 R12 = FFFFFFFF 8186ED38 R13 = FFFFFFFF 8710A270 R14 = FFFFFFFF 87084200 R15 = 00000000 003C60E0 R16 = 00000000 000008B4 R17 = 00000000 00000501 R18 = 00000000 00000000 R19 = FFFFFFFF 87084200 R20 = 00000000 00000000 R21 = FFFFFFFF 8162F808 R22 = FFFFFFFF 8710FB20 R23 = 00000000 00000000 R24 = 00000000 00000001 AI = 00000000 00000001 RA = FFFFFFFF 80288928 PV = FFFFFFFF 8710A698 R28 = 00000000 00000000 FP = FFFFFFFF 8A731DE0 PC = FFFFFFFF 8025B3C4 PS = 08000000 00000804 System Registers: Page Table Base Register (PTBR) 00000000 00007FF8 Processor Base Register (PRBR) FFFFFFFF 8110A000 Privileged Context Block Base (PCBB) 00000000 0110A080 System Control Block Base (SCBB) 00000000 000001B3 Software Interrupt Summary Register (SISR) 00000000 00000000 Address Space Number (ASN) 00000000 00000000 -- COVER LETTER -- Page 4 25 July 1997 AST Summary / AST Enable (ASTSR_ASTEN) 00000000 00000000 Floating-Point Enable (FEN) 00000000 00000000 Interrupt Priority Level (IPL) 00000000 00000008 Machine Check Error Summary (MCES) 00000000 00000000 Virtual Page Table Base Register (VPTB) 00000002 00000000 Failing Instruction: SYS$SHDRIVER_NPRO+393C0: BUGCHK Instruction Stream (last 20 instructions): SYS$SHDRIVER_NPRO+39370: RET R31,(R28) SYS$SHDRIVER_NPRO+39374: LDQ_U R31,(SP) SYS$SHDRIVER_NPRO+39378: SUBQ SP,#X10,SP SYS$SHDRIVER_NPRO+3937C: STQ R16,#X0008(SP) SYS$SHDRIVER_NPRO+39380: STQ R17,(SP) SYS$SHDRIVER_NPRO+39384: LDQ R17,#XF8E0(R13) SYS$SHDRIVER_NPRO+39388: BIS R17,#X04,R17 SYS$SHDRIVER_NPRO+3938C: BIS R31,R17,R16 SYS$SHDRIVER_NPRO+39390: LDQ R17,(SP) SYS$SHDRIVER_NPRO+39394: ADDQ SP,#X08,SP SYS$SHDRIVER_NPRO+39398: BUGCHK SYS$SHDRIVER_NPRO+3939C: HALT SYS$SHDRIVER_NPRO+393A0: SUBQ SP,#X10,SP SYS$SHDRIVER_NPRO+393A4: STQ R16,#X0008(SP) SYS$SHDRIVER_NPRO+393A8: STQ R17,(SP) SYS$SHDRIVER_NPRO+393AC: LDQ R17,#XF8E0(R13) SYS$SHDRIVER_NPRO+393B0: BIS R17,#X04,R17 SYS$SHDRIVER_NPRO+393B4: BIS R31,R17,R16 SYS$SHDRIVER_NPRO+393B8: LDQ R17,(SP) SYS$SHDRIVER_NPRO+393BC: ADDQ SP,#X08,SP SYS$SHDRIVER_NPRO+393C0: BUGCHK SYS$SHDRIVER_NPRO+393C4: HALT SYS$SHDRIVER_NPRO+393C8: BIS R31,R31,R31 SYS$SHDRIVER_NPRO+393CC: BIS R31,R31,R31 SYS$SHDRIVER_NPRO+393D0: SUBQ SP,#X50,SP o The Volume Shadowing software which was shipped in OpenVMS Alpha and VAX V7.1 and the CLUSIO remedial kits, requires additional non-paged pool to improve synchronization. Customers should take this into account when they are tuning their systems, and be aware that Volume Shadowing is now more sensitive to resource problems with the possibility that systems may crash if non-paged pool is exhausted. Shadowing uses approximately 800 bytes additional non-paged pool per concurrent IO to the virtual unit. This remedial kit includes codes which avoids system crashes if a system exhausts non-paged pool. Please be aware that there are still cases under which Non-Paged Pool exhaustion will result in a SHADDETINCON BugCHECK. This modification reduces the probability but does not completely eliminate them. -- COVER LETTER -- Page 5 25 July 1997 o During internal testing, a system crashed which indicated that IO's were left outstanding in DUDRIVER after a virtual unit had been removed. o There was a missing index on a check for member valid in the BBR_READ_RECOVERY routine. o There was an "infinite" loop condition at SHCP$START_QUED, and the code has been modified so that the persistent thread will be "killed" if the VU it was spawned fails. o This remedial kit includes additional error logging capabilities to collect additional information when a virtual unit is made available. The new LOG_IT macro code has the following input parameters: o R0 - value of P4 o R1 - value of P5 o R2 - address of LW in SHAD containing P6 o R3 - VU UCB o R5 - SHAD IRP address with: - CDRP$L_BCNT = P1 - CDRP$L_MEDIA = P2 - CDRP$L_PID = P3 The implementation makes use of the following cells in the errorlog record. o EMB$W_SP_BOFF - set to %xBADE as TAG o EMB$W_SP_FUNC - reason code o EMB$L_SP_BCNT - LW for information o EMB$L_SP_MEDIA - LW for information o EMB$L_SP_RQPID - LW for information o EMB$Q_SP_IOSB - 2 LW for information o EMB$L_SP_CMDREF - LW for Information o Process intermittently hangs during dismount of a shadow-set while waiting for completion of the QIOW in DO_IO routine. o KRNLSTAKNV halt during MOUNT/CLUSTER DSAx: Bugcheck Type: CPUSANITY, CPU sanity timer expired Node: AI84 (Clustered) CPU Type: AlphaServer 8400 Model EV56/440 -- COVER LETTER -- Page 6 25 July 1997 VMS Version: V6.2-1H3 Current Process: PM2SKZ Current Image: DSA40:[ZENT410.][EXE]BUS.EXE Failing PC: FFFFFFFF 8001F8D0 Failing PS: 18000000 00001604 Module: SYSTEM_PRIMITIVES_MIN Offset: 0000B8D0 Boot Time: 26-JUN-1997 08:34:37.00 System Uptime: 1 00:46:34.07 Crash/Primary CPU: 01/00 Saved Processes: 26 Pagesize: 8 KByte (8192 bytes) Physical Memory: 2048 MByte (262144 PFNs) Dumpfile Pagelets: 999974 blocks Dump Flags: writecomp,errlogcomp,dump_style EXE$GL_FLAGS: poolpging,init,bugdump,pgflfrag Stack Pointers: KSP = 00000000 7FF91C98 ESP = 00000000 7FF96000 SSP = 00000000 7FF9C100 USP = 00000000 7EDE4030 General Registers: R0 = 00000000 00000000 R1 = FFFFFFFF 814EA180 R2 = FFFFFFFF 81410000 R3 = FFFFFFFF 9DE268F8 R4 = 00000000 0000012C R5 = 00000000 7FF91D40 R6 = 00000000 7FF445A0 R7 = 08000000 00000200 R8 = FFFFFFFF F7710250 R9 = 00000000 00000030 R10 = 00000000 00000031 R11 = 00000000 00000001 R12 = 00000000 00008001 R13 = FFFFFFFF 9DE268F8 R14 = FFFFFFFF 9DE25640 R15 = FFFFFFFF 9DE04200 R16 = 00000000 00000774 R17 = 00000000 7FF91C38 R18 = FFFFFFFF 9DE32CE0 R19 = FFFFFFFF 9DE04200 R20 = 00000000 00000000 R21 = 00000000 272007F0 R22 = FFFFFFFF 9DE04200 R23 = 00000000 00000000 R24 = FFFFFFFF 9DE04AC0 AI = 00000000 00000000 RA = FFFFFFFF 00000000 PV = FFFFFFFF FFFFFFFF R28 = FFFFFFFF 8001F83C FP = 00000000 7FF91E10 PC = FFFFFFFF 8001F8D4 PS = 18000000 00001604 Failing Instruction: EXE$HWCLKINT_C+00510: BUGCHK o The system crashes when a second node attempts to boot a system disk shadow set with two members. The following SHADDETINCON bugcheck at SHDRIVER+12124 or SYS$SHDRIVER_NPRO+449B4 occurs: SHADDETINCON, SHADOWING detects inconsistent state o The mount of a shadow set fails. The failure report says that the set is already mounted or that there is a duplicate unit number. -- COVER LETTER -- Page 7 25 July 1997 o This kit provides a new SYS$BASE_IMAGE.EXE. The V7.1-1H1 limited hardware release also provides this image. Both images contain support for all of the features in both releases. Therefore, there are no dependencies on the order of installations. ALPSHAD03_071 may be installed prior to or following the installation of V7.1-1H1. However, if ALPSHAD03_071 is installed after V7.1-1H1, you will see a warning message in regards to SYS$BASE_IMAGE.EXE. You can safely ignore this message. o SDA does not handle relocatable global (non-universal) symbols correctly if they are in resident images. o SDA> SHOW POOL can take an excessive period of time. o SHOW POOL gives NOSUCHPOOL errors unnecessarily. o SHOW POOL/SUMMARY counts and space totals do not match. o SHOW POOL can not always find the range. o When minimum SYSTEM_PRIMITIVES is in use, SDA will not work instead of signaling the correct message. o The symbol file is opened by SDA even when /OVERRIDE specified (and it is not used). o SDA can get into a loop printing blank lines. o Some of BUGCHECK's messages are confusing. o The Base SVA of buffer objects is only displayed as 32 bits. o An incomplete dump is inaccessible by SDA. The changes in this remedial kit will now treat DUMPINCOMPL as a warning if this is a selective dump and the dump has progressed far enough to dump the first process. o SDA SHOW EXEC does not always display all execlets. READ/EXEC does not read all the symbols. o MODIFY DUMP does not work on the dump header and /CONFIRM fails when the field being updated is a byte or a word and the original value is negative. o BUGCHECK's two public routines, (EXE$BUGCHK_REMOVE_VA, EXE$BUGCHK_CANCEL_REMOVE_VA), do not synchronize their manipulations with spinlocks. o BUGCHECK fails if the only process is the swapper. o Handling of Halt/Restart crashes when the Halt HWPCB is used is faulty. -- COVER LETTER -- Page 8 25 July 1997 o SHOW DEV MC only allows /HOME but it is documented as /HOMEPAGE. 5 PROBLEMS ADDRESSED IN ALPSHAD02_071 KIT 5.1 These three MOUNT problems are not addressed in this SHADOWING kit. o On V7.1 systems with ALPSHAD01_071 installed, systems may not shut down properly and crash dumps may be lost if a shadowed system disk is in use. The error message will be: **** Boot driver initialization routine returned failure **** Memory dump canceled. IOVector = 00000000, Flags = 02016874 This error occurs because there is a dependency between the ALPSHAD01_071 SYS$SHDRIVER and EXCEPTION.EXE; however, EXCEPTION.EXE was not distributed with the ALPSHAD01_071 kit. This kit simply provides the correct EXCEPTION.EXE. The other images are the same as were shipped in ALPSHAD01_071. 6 PROBLEMS NOT ADDRESSED IN ALPSHAD01_071 KIT o The following three MOUNT problems were discovered at a late stage in the release of this kit. OpenVMS Engineering is working on solutions to these problems which will be available in a future MOUNT ECO kit. If a user, either manually or by a command procedure, performs one of the following errors, MOUNT may incorrectly add members to existing shadow sets. - A MOUNT/SHAD with an incorrect volume label will succeed in adding the member to the shadow set, for example: $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA1 TST1 $! The shadow set DSA1 is now available with DUA1 as $! the only member $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA5 TST5 $! The device $4$DUA5 is wrongly added as a full copy $! target. - Similarly, a MOUNT/SHAD with an incorrect volume label of a shadow set that is MOUNTed elsewhere in the cluster will succeed in adding the member to the shadow set on the other nodes, but the MOUNT will fail on the local node, for -- COVER LETTER -- Page 9 25 July 1997 example: NODE_1> $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA1 TST1 NODE_1> $ ! The shadow set DSA1 is now available on NODE_1 NODE_2> $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA5 TST5 NODE_2> $! The MOUNT correctly fails on NODE_2 with $! INCVOLLABEL error NODE_1> $! However, the member $4$DUA5 is wrongly added NODE_1> $! to the set DSA1 as a full copy target. - MOUNT will incorrectly allow a non-shareable MOUNT/SHADOW of a disk that is already mounted on another node as "shareable" to succeed. As a result, corruption of the disk(s) will take place, for example: NODE_1> $ MOUNT/SYSTEM DSA1/SHAD=$4$DUA1 TST1 NODE_1> $ ! The shadow set DSA1 is now available on NODE_1 NODE_2> $ MOUNT /NOSHARE DSA5/SHAD=$4$DUA1 TST1 NODE_2> $! The shadow set DSA5 is (wrongly) now available $! on NODE_2 NODE_1> $! The shadow set DSA1 is also available on NODE_1 Corruption of the disk will occur when write operations are performed by either node. 7 PROBLEMS ADDRESSED IN ALPSHAD01_071 o A SHADDETINCON BUGCHECK will occur in SHD_THREADS when trying to terminate a thread that is still a Significant Event. o The Volume Shadowing driver delivered in V7.1 and the V6.2 Cluster Compatibility kits (xxxCOMPAT_062) did not contain the full solution for the 'Bad Block Repair' (BBR) problem. As a result, when it might be warranted, a disk would not be expelled from the shadow-set. o An incompatibility has developed between StorageWorks Host Based RAID Software, and the enhanced volume shadowing provided in both OpenVMS 7.1 and in the Cluster Compatibility Kits (xxxCOMPAT_62). Because of this incompatibility, RAID software can no longer detect that a shadow set state change has occurred. o Write protecting a shadow set member which is being added to an existing shadowset causes the virtual unit to hang. -- COVER LETTER -- Page 10 25 July 1997 o System crashes with INVEXCEPTN Bugchecks in SHSB$SEND_MESSAGE because the UCB address in R5 is zero. Or, it may also crash in IOC_STD$CVT_DEVNAM in IO_ROUTINES when the code tries to get a DDB out of a UCB that is bad. The problem is caused when the IRP$L_ARB field in not correctly set up with the clone error index. Routine SH$VP_DEV_DRVERR uses this byte as an index to fetch the UCB of the erring device. The value is FF so an incorrect longword is fetched. The bad value occurs when volume processing initiates mount verification after a device error occurs. o Shadowsets can hang in mountverify for hours after encountering a controller failure (DRAB_INT) on an HSJ50 followed by many 'forced error flagged in last sector read' error messages on multiple shadowset member disks. 8 PROBLEMS ADDRESSED IN ALPSYS01_071 KIT o Problem was isolated to a specific $UNWIND call not transferring control to the correct PC on OpenVMS Alpha V7.1. 9 KIT INSTALLATION RATING: The following kit installation rating, based upon current CLD information, is provided to serve as a guide as to which customers should apply this remedial kit. (Reference attached Disclaimer of Warranty and Limitation of Liability Statement) INSTALLATION RATING: 2 : To be installed by all customers using the following feature(s): SHADOWING 10 INSTALLATION INSTRUCTIONS: Install this kit with the VMSINSTAL utility by logging into the SYSTEM account, and typing the following at the DCL prompt: @SYS$UPDATE:VMSINSTAL ALPSHAD03_071 [location of the saveset] The saveset location may be a tape drive, or a disk directory that contains the kit saveset. System should be rebooted after successful installation of the kit. If you have other nodes in your VMScluster, they should also be rebooted in order to make use of the new image(s). -- COVER LETTER -- Page 11 25 July 1997 Copyright (c) Digital Equipment Corporation, 1997 All Rights Reserved. Unpublished rights reserved under the copyright laws of the United States. The software contained on this media is proprietary to and embodies the confidential technology of Digital Equipment Corporation. Possession, use, or dissemination of the software and media is authorized only pursuant to a valid written license from Digital Equipment Corporation. DISCLAIMER OF WARRANTY AND LIMITATION OF LIABILITY THIS PATCH IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND. ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY EXCLUDED TO THE EXTENT PERMITTED BY APPLICABLE LAW. IN NO EVENT WILL DIGITAL BE LIABLE FOR ANY LOST REVENUE OR PROFIT, OR FOR SPECIAL, INDIRECT, CONSEQUENTIAL, INCIDENTAL OR PUNITIVE DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, WITH RESPECT TO ANY PATCH MADE AVAILABLE HERE OR TO THE USE OF SUCH PATCH.