ges resources暴长导致 ORA

编程入门 行业动态 更新时间:2024-10-04 15:36:50

<a href=https://www.elefans.com/category/jswz/34/1763685.html style=ges resources暴长导致 ORA"/>

ges resources暴长导致 ORA

1.概述
2015年5月6日09:16 两实例均分别出现ORA-04031应用无法连接,其中信用3A继ORA-04031之后触发了ORA-00600错误,最终由ORA-481中断该实例;信用4A为了及时恢复业务及时对实例进行了重启;后经分析此次故障是人工执行预付费的账号的一个过程所致,该过程使用游标的方式对表进行大量的UPDATE操作加上未使用绑定变量致使用数据库出现ORA-04031;最终触发了BUG;

2.故障现象
数据库出ORA-04031、ORA-00600 [kksscl-inf-inl-loop]最后由ORA-00481中断实例,信用4a应用报ORA-04031数据库日志未出现告警;

--日志报错信息

Wed May  6 09:16:34 2015
Errors in file /oracle/app/admin/oraxy3/udump/oxy3a_ora_1483130.trc:
ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","insert into CREDIT_EVENT_ERR...","sga heap(3,0)","library cache")
ORA-04031: unable to allocate 4216 bytes of shared memory ("shared pool","insert into CREDIT_EVENT_ERR...","sga heap(3,0)","library cache")
Errors in file /oracle/app/admin/oraxy3/udump/oxy3a_ora_492244.trc:
ORA-00600: internal error code, arguments: [kksscl-inf-inl-loop], [2500], [3], [323], [652], [652], [], []
Wed May  6 09:17:00 2015
Errors in file /oracle/app/admin/oraxy3/udump/oxy3a_ora_635918.trc:
ORA-00600: internal error code, arguments: [kksscl-inf-inl-loop], [2500], [3], [323], [652], [652]
Errors in file /oracle/app/admin/oraxy3/bdump/oxy3a_lmon_1422378.trc:
ORA-00600: internal error code, arguments: [kjrfr:!owner], [0x7000001E9DA5890], [], [], [], [], [], []
Wed May  6 09:23:06 2015
Errors in file /oracle/app/admin/oraxy3/bdump/oxy3a_lmon_1422378.trc:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [kjrfr:!owner], [0x7000001E9DA5890], [], [
Errors in file /oracle/app/admin/oraxy3/bdump/oxy3a_pmon_1503680.trc:
ORA-00481: LMON process terminated with error
Wed May  6 09:25:32 2015
PMON: terminating instance due to error 481
Wed May  6 09:25:32 2015
Errors in file /oracle/app/admin/oraxy3/bdump/oxy3a_lms4_1343952.trc:
ORA-00481: LMON process terminated with error

3.   故障分析
3.1 数据库负载分析

08:30-09:00数据库每秒执行的事物高达7439.33次相当于平时的165倍,从这里可以看出数据库当时业务量非常高;

DB Name         DB Id    Instance     Inst Num Release     Cluster Host
------------ ----------- ------------ -------- ----------- ------- ------------
ORAXY4        3696212656 oxy4a               1 9.2.0.6.0   YES     sxy4a


              Snap Id     Snap Time      Sessions Curs/Sess Comment
            --------- ------------------ -------- --------- -------------------
Begin Snap:    186816 06-May-15 08:30:01      681      10.2
  End Snap:    186833 06-May-15 09:00:01      686      10.2
   Elapsed:               30.00 (mins)

 

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:          3,460,589.76                465.17
              Logical reads:            151,302.03                 20.34
              Block changes:             36,679.13                  4.93
             Physical reads:              7,794.99                  1.05
            Physical writes:                716.58                  0.10
                 User calls:              2,380.18                  0.32
                     Parses:                664.15                  0.09
                Hard parses:                  3.03                  0.00
                      Sorts:                205.90                  0.03
                     Logons:                  3.17                  0.00
                   Executes:             24,984.70                  3.36
               Transactions:              7,439.33  ===========》每秒执行的事物高达7439.33次,说明数据库当时事物量非常高


3.2 Shared Pool 使用情况分析

从Shared Pool分析发现,其中GES RESOURCES组件分别在5月5日18:30-19:00之间由45M增长到80M,5月6日08:30-09:30之间由80M增长到1.7G;

SGA breakdown difference for DB: ORAXY4  Instance: oxy4a  Snaps: 186816 -186833

Pool   Name                                Begin value        End value  % Diff
------ ------------------------------ ---------------- ---------------- -------
java   free memory                         218,103,808      218,103,808    0.00
large  free memory                         218,103,808      218,103,808    0.00
shared 1M buffer                               528,384          528,384    0.00
shared Checkpoint queue                    271,427,840      271,427,840    0.00
shared DML lock                              1,885,064        1,885,064    0.00
shared FileIdentificatonBlock                2,247,328        2,247,328    0.00
shared FileOpenBlock                        20,380,456       20,380,456    0.00
shared KCL extra lock elements               1,728,000        1,728,000    0.00
shared KCL name table                        1,483,512        1,483,512    0.00
shared KGK heap                                 64,600           64,600    0.00
shared KGLS heap                             2,152,904        1,950,760   -9.39
shared KQR L PO                              2,505,432        2,070,176  -17.37
shared KQR M PO                             29,642,864       19,847,288  -33.05
shared KQR S SO                                  5,136            6,928   34.89
shared KQR X PO                                115,944          115,944    0.00
shared KSXR pending messages que               853,952          853,952    0.00
shared MTTR advisory                           352,576          352,576    0.00
shared PL/SQL DIANA                          1,573,960        1,260,880  -19.89
shared PL/SQL MPCODE                        89,231,648        2,981,424  -96.66
shared PL/SQL PPCODE                            24,664           12,344  -49.95
shared PLS non-lib hp                            3,632            3,632    0.00
shared branch                                  788,560          788,560    0.00
shared db_block_hash_buckets                 2,851,520        2,851,520    0.00
shared db_files                                371,128          371,128    0.00
shared db_handles                            2,320,000        2,320,000    0.00
shared dictionary cache                      3,229,952        3,229,952    0.00
shared enqueue                               3,877,288        3,877,288    0.00
shared enqueue resources                     1,525,720        1,525,720    0.00
shared errors                                   56,584           41,408  -26.82
shared event statistics per sess            24,942,960       24,942,960    0.00
shared fixed allocation callback                 3,672            3,672    0.00
shared free memory                         217,359,432      799,474,760  267.81
shared gcs resources                        39,716,184       39,716,184    0.00
shared gcs shadows                          26,385,752       26,385,752    0.00
shared ges big msg buffers                   8,866,088        8,866,088    0.00
shared ges enqueues                         71,480,032       71,480,032    0.00
shared ges process array                     2,561,280        2,561,280    0.00
shared ges regular msg buffers               2,064,248        2,064,248    0.00
shared ges reserved msg buffers              2,096,008        2,096,008    0.00
shared ges resource hash table               4,325,376        4,325,376    0.00
shared ges resources                        80,605,512    1,716,473,160 ####### ====》 从这里可以看出shared_pool大分部被ges resources占用; shared pool共2G,该组件就占用了1.7G
shared joxs heap init                              488              488    0.00
shared library cache                       957,054,656      379,577,984  -60.34
shared miscellaneous                       206,828,912      206,838,528    0.00
shared parameters                           10,241,824          237,752  -97.68
shared partitioning d                        2,880,912        1,696,736  -41.10
shared processes                             2,608,000        2,608,000    0.00
shared sessions                              5,962,320        5,962,320    0.00
shared set_descriptor_array                    685,184          685,184    0.00
shared sim memory hea                          690,232          690,232    0.00
shared sim trace entries                     5,210,112        5,210,112    0.00
shared sql area                            920,218,568       34,797,528  -96.22
shared table definiti                           10,928           13,544   23.94
shared transaction                           2,619,192        2,619,192    0.00

3.3   影响GES RESOURCE组件增加原因

ges resource解释:
Depending on application work loads, some components in a RAC environment can temporarily consume a lot of SGA memory for "ges enqueues" and this can fragment the shared pool sufficiently to cause ORA-4031 type errors

通过ges resources可以判断当时数据库正在做大量的word loads,从这里也正好验证了执行的事物高达7439.33次的事实

SQL ordered by Gets for DB: ORAXY4  Instance: oxy4a  Snaps: 186816 -186833
-> End Buffer Gets Threshold:     10000
-> Note that resources reported for PL/SQL includes the resources used by
   all SQL statements called within the PL/SQL code.  As individual SQL
   statements are also reported, it is possible and valid for the summed
   total % to exceed 100
                                                     CPU      Elapsd
  Buffer Gets    Executions  Gets per Exec  %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
     53,270,084   13,304,986            4.0   19.6   934.22   3161.89 1945535319
Module: PL/SQL Developer
SELECT VALUE FROM ACCOUNT_ATTR B WHERE B.KEY_ID = :B1 AND B.ATTR
_ID = 7
     15,017,455    6,793,977            2.2    5.5  1207.08   1120.64 2588995081
Module: PL/SQL Developer
UPDATE ACCOUNT_ATTR633 C SET C.FLAG = 1 WHERE C.ROWID = :B1
     14,589,255    6,488,773            2.2    5.4  1025.90    916.91  646745035
Module: PL/SQL Developer
UPDATE ACCOUNT_ATTR543 C SET C.FLAG = 1 WHERE C.ROWID = :B
     13,386,305   13,326,843            1.0    4.9   407.27    728.89 1053795750
Module: SQL*Plus
COMMIT
     10,007,055        1,883        5,314.4    3.7    37.08     61.03 2162479642
Module: exec_proc@c4cbeg02 (TNS V1-V3)
SELECT ROWID NROWID, ACCTID ACCOUNTID, PROCESSTIME INTIME, DECOD
E(OLDSTATUS,'FC17','FC121','IFC12','FC01','IFC121','FC01','IFC13
1','FC01','IFC132','FC01','IFC133','FC01','IFC30','FC30','IFC20'
,'FC20','IFC00','FC00','IFC01','FC01',OLDSTATUS) OLDSTATUS, DECO
DE(STATUS,'FC17','FC121','IFC12','FC12','IFC121','FC121','IFC131
      9,337,971            1    9,337,971.0    3.4   243.89    364.25 1852514272
Module: PL/SQL Developer
declare   v_value varchar2(10); begin   for i in (select a.*, ro
wid rid from account_attr633 a where mod(a.key_id,10)=9 and a.fl
ag is null) loop     begin       select value         into v_val
ue         from account_attr b        where b.key_id = i.key_id
         and b.attr_id = 7;       if v_value = '0' then
      9,068,352            1    9,068,352.0    3.3   243.69    367.38 1116525411
Module: PL/SQL Developer
declare   v_value varchar2(10); begin   for i in (select a.*, ro
wid rid from account_attr633 a where mod(a.key_id,10)=8 and a.fl
ag is null) loop     begin       select value         into v_val
ue         from account_attr b        where b.key_id = i.key_id
         and b.attr_id = 7;       if v_value = '0' then
      8,572,295            1    8,572,295.0    3.1   240.85    370.84 3382822563
Module: PL/SQL Developer
declare   v_value varchar2(10); begin   for i in (select a.*, ro
wid rid from account_attr633 a where mod(a.key_id,10)=7 and a.fl
ag is null) loop     begin       select value         into v_val
ue         from account_attr b        where b.key_id = i.key_id
         and b.attr_id = 7;       if v_value = '0' then
      8,121,479            1    8,121,479.0    3.0   242.36    387.30 3652706703

从上述报告可以看出,有人在使用PL/SQL Developer来执行大批量的update语句,通过该语句我们可以定位是一个存储过程; 

3.4 通过监听日志定位操作人员

从上述分析导致该问题的原因是由于人工执行一过程所致,那么如何知道谁执行此操作,我们可以通过监听日志得到一些相关的信息;

ORT=60899)) * establish * oraxy3 * 0
05-MAY-2015 17:37:20 * (CONNECT_DATA=(SERVICE_NAME=oraxy3)(CID=(PROGRAM=D:\oracle\PLSQL Developer\plsqldev.exe)(HOST=SD4ACITRIX13)(USER=4A_yuanjuju))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.19.194.25)(PORT=60904)) * establish * oraxy3 * 0
05-MAY-2015 17:38:17 * (CONNECT_DATA=(SERVICE_NAME=oraxy3)(CID=(PROGRAM=D:\oracle\PLSQL Developer\plsqldev.exe)(HOST=SD4ACITRIX13)(USER=4A_yuanjuju))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.19.194.25)(PORT=60995)) * establish * orax
y3 * 0
05-MAY-2015 17:39:28 * (CONNECT_DATA=(SERVICE_NAME=oraxy3)(CID=(PROGRAM=D:\oracle\PLSQL Developer\plsqldev.exe)(HOST=SD4ACITRIX13)(USER=4A_wangzhen2))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.19.194.25)(PORT=61100)) * establish * oraxy3 * 0
05-MAY-2015 17:39:28 * (CONNECT_DATA=(SERVICE_NAME=oraxy3)(CID=(PROGRAM=D:\oracle\PLSQL Developer\plsqldev.exe)(HOST=SD4ACITRIX13)(USER=4A_wangzhen2))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.19.194.25)(PORT=61101)) * establish * oraxy3 * 0

从监听日志里我们可以看出用户USER=4A_yuanjuju通过plsqldev.exe执行了该过程;

3.5 数据宕机原因定位

-- Trace File 分析  
oxy3a_ora_1483130.trc
通过对TRACE文件分析发现56%的资源被GES RESOURCES组件占用,通过Trace file内容可以判断该次故障命中Bug 3046725

参考文档:

Bug 3046725 - ORA-4031 due to shared_pool fragmented with high ges resources & enqueues ( Doc ID 3046725.8 )

Bug 5722472 - ORA-600[KKSSCL-INF-INL-LOOP] OCCURS IN HIGH LOAD TEST.

FileComment
-----------------
======================================
Allocation Request Summary Information
======================================
Current information setting: 00654fff
Dump Interval=300 seconds SGA Heap Dump Interval=3600 seconds
Last Dump Time=05/06/2015 09:16:34
Allocation request for: library cache
Heap: 700000000041e00, size: 4216
******************************************************
HEAP DUMP heap name="sga heap(3,0)" desc=700000000041e00
extent sz=0xfe0 alt=200 het=32767 rec=9 flg=-126 opc=0
parent=0 owner=0 nex=0 xsz=0x1

Subpool 1: total 989855744, top 5 components:

Allocation Name Size Ratio(%)
___________________________ ____________ ________
"ges resources " 552448736 55.81
"free memory " 182546912 18.44
"miscellaneous " 88791648 8.97
"Checkpoint queue " 64656160 6.53
"ges enqueues " 27310904 2.76

Subpool 2: total 989855744, top 5 components:

Allocation Name Size Ratio(%)
___________________________ ____________ ________
"ges resources " 552475400 55.81
"free memory " 189370904 19.13
"Checkpoint queue " 64976240 6.56
"miscellaneous " 63793920 6.44
"FileOpenBlock " 29265560 2.96

Subpool 3: total 973078528, top 5 components: <<<<<<<

Allocation Name Size Ratio(%)
___________________________ ____________ ________
"ges resources " 552211440 56.75
"free memory " 183526328 18.86
"miscellaneous " 80548568 8.28
"Checkpoint queue " 64976240 6.68
"ges enqueues " 27266248 2.80

Shared Pool: total 2952790016, top 5 components:

Allocation Name Size Ratio(%)
___________________________ ____________ ________
"ges resources " 1657135576 56.12
"free memory " 555444144 18.81
"miscellaneous " 233134136 7.90
"Checkpoint queue " 194608640 6.59
"ges enqueues " 82139704 2.78


---BUG定位

Bug 3046725 - ORA-4031 due to shared_pool fragmented with high ges resources & enqueues ( Doc ID 3046725.8 )

Versions confirmed as being affected

9.2.0.2
9.2.0.7
10.1.0.4

This issue is fixed in

10.2.0.1 (Base Release)

Description

Depending on application work loads, some components in
a RAC environment can temporarily consume a lot of SGA
memory for "ges enqueues" and this can fragment the
shared pool sufficiently to cause ORA-4031 type errors.

Workaround:
Pre-reserve plenty of space for GES resources in
the shared pools to prevent the fragmentation effect
of a sudden jump in requirements.
This can be achieved by setting the hidden parameters
_lm_locks / _lm_ress .

4. 故障处理

该此故障发生是由人为执行的存储过程里对一表执行大批量的UPDATE操作,且未合理使用绑定变量给数据库带来严重负载压力,触使GES RESOURCE资源由45M暴长到1.7G,占用了85% SHARED POOL大小。致使用应该连接报ORA-04031错误,触发了BUG;

5. 解决方案

5.1 临时方案

(1) 部署监控脚本,及时发现问题;

(2) 重启数据库释放内容、目前3、4区已在问题发生时进行了重启,2区未重启;需要安排进行进行重启;

(3) 修改存储过程,合理使用绑定变量

--原过程

declare
  v_value  varchar2(10);
begin
  for i in (select a.*, rowid rid
              from account_attr531  a
             where mod(a.key_id,10)=0 and a.flag is null) loop
    begin
      select value
        into v_value
        from account_attr b
       where b.key_id = i.key_id
         and b.attr_id = 7;
      if v_value = '0' then
        update account_attr531 c set c.flag = 0 where c.rowid = i.rid;
      else
        update account_attr531 c set c.flag = 1 where c.rowid = i.rid;
      end if;
    exception
      when others then
        update account_attr531 c set c.flag = 3 where c.rowid = i.rid;
    end;
  end loop;
  commit;
end;

--整改后:

declare
  v_value  varchar2(10);
  v_num    number(5);
begin
  v_num:=0;
  for i in (select a.*, rowid rid
              from account_attr531  a
             where mod(a.key_id,10)=0 and a.flag is null) loop
    v_num:=v_num+1;             
    begin
      select value
        into v_value
        from account_attr b
       where b.key_id = i.key_id
         and b.attr_id = 7;
      if v_value = '0
' then
        --update account_attr531 c set c.flag = 0 where c.rowid = i.rid;
        execute immediate 'update account_attr531 c set c.flag = 0 where c.rowid =:r' using i.rid;
      else
        --update account_attr531 c set c.flag = 1 where c.rowid = i.rid;
        execute immediate 'update account_attr531 c set c.flag = 1 where c.rowid =:r' using i.rid;
      end if;
      if v_num >=500 then
      v_num:=0;
        commit;
      end if;
    exception
      when others then
        --update account_attr531 c set c.flag = 3 where c.rowid = i.rid;
        execute immediate 'update account_attr531 c set c.flag = 1 where c.rowid =:r' using i.rid;
    end;
  end loop;
  commit;
end;

 

5.2 长期方案

(1)权限控制,禁止在生产系统上操作

(2)升级数据库到10G以上


来自 “ ITPUB博客 ” ,链接:/,如需转载,请注明出处,否则将追究法律责任。

转载于:/

更多推荐

ges resources暴长导致 ORA

本文发布于:2024-02-14 14:34:13,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1763471.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:ges   resources   ORA

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!