坏块检测

编程入门 行业动态 更新时间:2024-10-23 19:23:56

坏块检测

坏块检测

坏块检测

  • 坏块的类型
    • 物理坏块
    • 逻辑坏块
  • 坏块的模拟
    • 物理坏块模拟
    • 逻辑坏块模拟
  • 坏块的检测
    • 物理坏块检测
    • 逻辑坏块检测

数据库坏块是指由于硬件错误等问题导致数据库在写入数据块时出现异常(nologing也可能导致).
通常,我们将坏块分为2类,物理坏块和逻辑坏块。

坏块的类型

物理坏块

物理坏块:通常是由于硬件损坏如磁盘异常导致、内存有问题、存储链路有问题、 IO 有问题、文件系统有问题、 Oracle 本身的问题等

逻辑坏块:可能都是软件问题导致通常是由于oracle bug 导致,比如data block 和 index block 数据不 一致。

物理坏块的场景:
This kind of block corruptions are normally reported by Oracle with error ORA-1578 and the detailed corruption description is printed in the alert log.

Corruption Examples are:
Bad header - the beginning of the block (cache header) is corrupt with invalid values
The block is Fractured/Incomplete - header and footer of the block do not match
The block checksum is invalid
The block is misplaced
Zeroed out blocks Note 1545366.1

逻辑坏块

逻辑坏块的场景:
This is when block contains a valid checksum and the structure below the beginning of the block is corrupt (Block content is corrupt). It may cause different ORA-600 errors.
The detailed corruption description for Logical Corruptions are not normally printed in the alert.log. DBVerify will report what is logically corrupted in the block.
Corruption Examples are:
row locked by non-existent transaction - ORA-600 [4512], etc
the amount of space used is not equal to block size
avsp bad
etc.

oracle根据自己的机制,把数据落盘后,就自己认为这个块是正常的,底层的东西他就不管了。落盘完成后,由于某种原因(存储、网络)等原因,导致磁盘上的块和之前写入磁盘的块不一致了(这里的不一致指的是各种结构和校验值,其实checksum校验值也是对快内容进行的校验,下面实验有验证),有所改动,oracle就会认为这是一个物理坏块。

而逻辑坏块通常就是索引和表不一致,报错ora-600.

坏块的模拟

物理坏块模拟

基本环境准备

SQL> create table t2(id int,name varchar2(10));Table created.SQL> insert into t2 values (1,'aaa');1 row created.SQL> commit;Commit complete.SQL>  select rowid,id from zhuo.t2;ROWID                      ID
------------------ ----------
AAATtTAAFAAAAWUAAA          1SQL> select file_id,extent_id,block_id,blocks from dba_extents where segment_name='T2' and owner = 'ZHUO';FILE_ID  EXTENT_ID   BLOCK_ID     BLOCKS
---------- ---------- ---------- ----------5          0       1424          8
SQL> select SEGMENT_NAME,HEADER_FILE,HEADER_BLOCK from dba_segments where SEGMENT_NAME='T2';SEGMENT_NA HEADER_FILE HEADER_BLOCK
---------- ----------- ------------
T2                   5         1426select dbms_rowid.rowid_relative_fno('AAATtTAAFAAAAWUAAA') file_id,
dbms_rowid.rowid_block_number('AAATtTAAFAAAAWUAAA') block_id
from dual;FILE_ID   BLOCK_ID
---------- ----------5       1428                --真正的数据块
SQL> alter system flush buffer_cache;System altered.

1.Bad header the beginning of the block (cache header) is corrupt with invalid values
在block结构中,kcbh损坏的都算。

测试:

BBED> p kcbh
struct kcbh, 20 bytes                       @0       ub1 type_kcbh                            @0        0x06ub1 frmt_kcbh                            @1        0xa2ub1 spare1_kcbh                          @2        0x00ub1 spare2_kcbh                          @3        0x00ub4 rdba_kcbh                            @4        0x01400594ub4 bas_kcbh                             @8        0x000e0333ub2 wrp_kcbh                             @12       0x0000ub1 seq_kcbh                             @14       0x01ub1 flg_kcbh                             @15       0x06 (KCBHFDLC, KCBHFCKV)ub2 chkval_kcbh                          @16       0x4af9ub2 spare3_kcbh                          @18       0x0000

修改为坏块

BBED> m /x ff offset 14
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) yFile: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5)Block: 1428             Offsets:   14 to   25           Dba:0x01400594
------------------------------------------------------------------------ff06f94a 00000100 0000533b <32 bytes per line>BBED> sum apply
Check value for File 5, Block 1428:
current = 0x4a07, required = 0x4a07BBED> v
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf
BLOCK = 1428Block 1428 is corrupt
Corrupt block relative dba: 0x01400594 (file 0, block 1428)
Fractured block found during verification
Data in bad block:type: 6 format: 2 rdba: 0x01400594last change scn: 0x0000.000e0333 seq: 0xff flg: 0x06spare1: 0x0 spare2: 0x0 spare3: 0x0consistency value in tail: 0x03330601check value in block header: 0x4a07computed block checksum: 0x0DBVERIFY - Verification completeTotal Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED

查询报错ORA-1578:

SQL> select * from zhuo.t2;
select * from zhuo.t2*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 1428)
ORA-01110: data file 5:
'/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf'

alert日志报错:
Corrupt block relative dba: 0x01400594 (file 5, block 1428)
Fractured block found during multiblock buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x01400594
last change scn: 0x0000.000e0333 seq: 0xff flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x03330601
check value in block header: 0x4a07
computed block checksum: 0x0

2.The block is Fractured/Incomplete header and footer of the block do not match
block不完整,从block header中获取的信息和block tail中的不一致。

tail值=scn后4位+type+seq

测试:

BBED> p tailchk
ub4 tailchk                                 @8188     0x03330601BBED> d /v count 12File: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5)Block: 1428    Offsets: 8188 to 8191  Dba:0x01400594
-------------------------------------------------------01063303                            l ..3.<16 bytes per line>BBED> p bas_kcbh
ub4 bas_kcbh                                @8        0x000e0333BBED> m /x 01063304 offset 8188File: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5)Block: 1428             Offsets: 8188 to 8191           Dba:0x01400594
------------------------------------------------------------------------01063304 <32 bytes per line>BBED> sum apply
Check value for File 5, Block 1428:
current = 0x4df9, required = 0x4df9BBED> v
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf
BLOCK = 1428Block 1428 is corrupt
Corrupt block relative dba: 0x01400594 (file 0, block 1428)
Fractured block found during verification
Data in bad block:type: 6 format: 2 rdba: 0x01400594last change scn: 0x0000.000e0333 seq: 0x1 flg: 0x06spare1: 0x0 spare2: 0x0 spare3: 0x0consistency value in tail: 0x04330601check value in block header: 0x4df9computed block checksum: 0x0DBVERIFY - Verification completeTotal Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 2
Message 531 not found;  product=RDBMS; facility=BBED

查询报错ORA-1578:

SQL> select * from zhuo.t2;
select * from zhuo.t2*
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 1428)
ORA-01110: data file 5:
'/u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf'
---如果查询不报错,执行刷盘操作:alter system flush buffer_cache;

alert日志报错:
Corrupt block relative dba: 0x01400594 (file 5, block 1428)
Fractured block found during multiblock buffer read
Data in bad block:
type: 6 format: 2 rdba: 0x01400594
last change scn: 0x0000.000e0333 seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x04330601
check value in block header: 0x4df9
computed block checksum: 0x0
解决:
修改tail值
Set dba 5,131
M /x 23232 offset 8188
Sum apply
3.The block checksum is invalid

测试:
块修改之前的checksum

用 dd 把 5 号 file,1428 号 block 给拷 出来

[oracle@oracle11g datafile]$ dd if=o1_mf_zhuo_gxdcfr5s_.dbf of=/tmp/1428 bs=8192 count=1 skip=1428

可以看到这个block 现在在块头记录的 checksum 值是 F8 4A


现在我们将将ID 为 1 的那条记录的 ID 值由 1 改为 2 ,即将 C1 02 改成 C1 03

SQL> select dump(1,16) from dual;DUMP(1,16)
-----------------
Typ=2 Len=2: c1,2SQL> select dump('aaa',16) from dual;DUMP('AAA',16)
----------------------
Typ=96 Len=3: 61,61,61SQL> select dump(2,16) from dual;DUMP(2,16)
-----------------
Typ=2 Len=2: c1,3


改完后ftp 到 LINUX 上 然后再用 dd 把上述修改过的 block 给拷回去 再 startup force 上述数据库:

[oracle@oracle11g datafile]$ dd of=o1_mf_zhuo_gxdcfr5s_.dbf if=/tmp/1428 seek=1428 count=1 bs=8192 conv=notrunc
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000158362 s, 51.7 MB/s



从上述alert log 里我们可以看出以下两点:
1、块头记录的 checksum 值是 4a f9 oracle 这里做异或操作后的 checksum 值是100
2、 oracle 当发现 checksum 值不对的时候会尝试再次读一下该 block

上述的操作也可以直接用bbed修复offset16,不sum apply来完成。上述演示知识为了根据实际情况,由于硬件,值在存储上面修改了,绕过了oracle自动计算checksum的值,导致错误。
怎样计算出正确的 checksum 值?
1、手工计算
我们现在根据
0x4af9 和 0x100 来计算出上述 block 在修改后正确的 checksum 值
0x4af9 = 100101011111001
0x100 = 000000100000000
根据
异或算法原理 ,这里很容易可以看出 oracle 计算出来的正确的 checksum 值应该是:
100101111111001 ,即 4B F9
2、直接sum(首选)
BBED> sum
Check value for File 5, Block 1428:
current = 0x4af9, required = 0x4bf9 --所需要的值

3、1)需要把 block的16、17位清零,那么从alert log 就可以看到 oracle 计算出来的 checksum 值了,
而且这个时候 alert log 里的 checksum 值就是这个 block 被 oracle 计算出来的正确的 checksum值 。
2)因为oracle在alert log里记录的computed block checksum 值实际上是并不是这个block被oracle计算出来的正确的checksum值,而是这个block被oracle计算出来的正确的checksum值与这个block的第16 、17 位记录的checksum值做异或操作后的值。
3)当你把一个 block 的 16 、 17 位清零后,因为 0 与一个值做异或操作后还是等于那个值,所以在将
16 、17 位清零后,这个时候 alert log 里的 checksum 值就是这个 block 被 oracle 计算出来的正确的 checksum 值了
(computed block checksum)=(check value in block header) XOR (oracle计算出来的正确checksum)
check value in block header=bbed 里面p chkval_kcbh打印出来的值。块头的checksum值。

修复:

BBED> sum apply
Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y
Check value for File 5, Block 1428:
current = 0x4bf9, required = 0x4bf9BBED> p chkval_kcbh
ub2 chkval_kcbh                             @16       0x4bf9SQL> alter system flush buffer_cache;System altered.SQL> select * from zhuo.t2;ID NAME
---------- ----------2 aaa      --1已经修改为2

总结下:如果block的checksum有问题,bbed直接sum 可以看出正确值,sum apply直接就可以修复
可以看出,当数据落盘后,由于人为修改了块里具体的值,导致oracle读取的时候,重新计算的checksum值和块头存储的checksum值不一致,导致物理坏块。所以物理坏块也会校验块内容。
4.The block is misplaced

oracle检测到读取的数据库包含的内容属于另一个block,同时checksum正常
测试:

BBED> copy file 5 block 132 to file 5 block 131File: /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf (5)Block: 131              Offsets:    0 to  511           Dba:0x01400083
------------------------------------------------------------------------06a20000 84004001 3ca20b00 00000204 12e40000 01000000 e23a0100 36a20b00 00000000 03003200 80004001 ffff0000 00000000 00000000 00000000 00800000 36a20b00 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00015300 ffffb800 0c045403 54030000 5300311f e21e931e 421ef11d a11d4f1d f71ca91c 591c091c b51b5a1b 081bb31a 621a0b1a b4196119 0f19ad18 4c18f817 a4175317 fb16ad16 5d160b16 b9156715 1315bf14 70140e14 ad135b13 0e13bf12 70122012 ce117711 1e11cb10 7c102510 d70f860f 360fe80e 990e440e eb0d930d 380de30c 910c3e0c e80b870b 360be30a 8f0a3b0a e4098209 2109c708 6d081b08 c6077207 1e07c706 70061906 c2056b05 1405bc04 64040c04 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 <32 bytes per line>BBED> set file 5 block 131FILE#           5BLOCK#          131BBED> v
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf
BLOCK = 131Block 131 is corrupt
Corrupt block relative dba: 0x01400083 (file 0, block 131)
Bad header found during verification
Data in bad block:type: 6 format: 2 rdba: 0x01400084last change scn: 0x0000.000ba23c seq: 0x2 flg: 0x04spare1: 0x0 spare2: 0x0 spare3: 0x0consistency value in tail: 0xa23c0602check value in block header: 0xe412computed block checksum: 0x0DBVERIFY - Verification completeTotal Blocks Examined         : 1
Total Blocks Processed (Data) : 0
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 1
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBEDBBED> 
BBED> set file 5 block 132FILE#           5BLOCK#          132BBED> v
DBVERIFY - Verification starting
FILE = /u01/app/oracle/oradata/ZHUO/datafile/o1_mf_zhuo_gxdcfr5s_.dbf
BLOCK = 132DBVERIFY - Verification completeTotal Blocks Examined         : 1
Total Blocks Processed (Data) : 1
Total Blocks Failing   (Data) : 0
Total Blocks Processed (Index): 0
Total Blocks Failing   (Index): 0
Total Blocks Empty            : 0
Total Blocks Marked Corrupt   : 0
Total Blocks Influx           : 0
Message 531 not found;  product=RDBMS; facility=BBED

5.Zeroed out blocks/ORA 8103 模拟

逻辑坏块模拟

1、oracle bug 也可能导致逻辑坏块的产生 . 特别是 parallel dml . 例如
Bug 5621677 Logical corruption with PARALLEL update
Bug 6994194 Logical corruption from UPDATE DML
Bug 15980234 ORA1400 / logical corruption from direct path INSERT ALL(fail withORA 1400, in direct path / PDML) 。
2、多数情况下逻辑坏块可能都是软件问题导致,当然数据库异常也可能导致。比如掉电的情况下,
就可能导致块内数据不一致。

坏块的检测

物理坏块检测

逻辑坏块检测



逻辑坏块的分类

参考:
Physical and Logical Block Corruptions. All you wanted to know about it. (Doc ID 840978.1)
Master Note for Handling Oracle Database Corruption Issues (Doc ID 1088018.1)

更多推荐

坏块检测

本文发布于:2024-03-12 17:03:14,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1731981.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!