最近在客户系统作升级演练时候遇到一些dg问题,之前19c基本都是新库搭建好dg环境,然后再create新的业务pdb,或者是有一些业务pdb然后再搭建好高可用的dg环境,基本没问题。但这次由于需求,在搭建好的dg环境,进行remote hot clone pdb进入主库,备库上面的新clone的pdb出现问题,以此记录一下。
-- standby查看主库新clone进来的pdb,一切看似正常,同步延时为0,熟不知后台已经报错了(cdb mount和open都一样会有问题) -- 因为备库的这个故障pdb被认定为offline不再进行recover同步,所以cdb全局的同步延时显示为0,后台alert很明显提示了 SQL> show pdbs
CON_ID CON_NAME OPEN MODE RESTRICTED ---------- ------------------------------ ---------- ---------- 2 PDB$SEED READ ONLY NO 3 TEST_PDB MOUNTED
SQL> select value from v$dataguard_stats;
VALUE ---------------------------------------------------------------- +00 00:00:00 +00 00:00:00 +00 00:00:00.000
SQL> select open_mode from v$database;
OPEN_MODE -------------------- READ ONLY WITHAPPLY
1 2 3 4 5 6
-- 打开备库pdb开始报错 SQL> alter pluggable database all open; alterpluggabledatabaseallopen * ERRORat line 1: ORA-01111: namefordatafile40isunknown - renameto correct file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# 查看alert(mrp正常启动,但故障pdb无法恢复,导致不能打开) 2020-08-29T09:17:30.653161+08:00 stopping change tracking Warning: Datafile 40 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/system.344.1049672939) is offline during full database recovery and will not be recovered Warning: Datafile 41 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/sysaux.345.1049672939) is offline during full database recovery and will not be recovered Warning: Datafile 42 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/undotbs1.325.1049672939) is offline during full database recovery and will not be recovered Warning: Datafile 43 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/users.342.1049672937) is offline during full database recovery and will not be recovered Warning: Datafile 44 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/undotbs2.323.1049672939) is offline during full database recovery and will not be recovered Warning: Datafile 45 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.328.1049672941) is offline during full database recovery and will not be recovered Warning: Datafile 46 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.329.1049672941) is offline during full database recovery and will not be recovered Warning: Datafile 47 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.338.1049672943) is offline during full database recovery and will not be recovered Warning: Datafile 48 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.334.1049672943) is offline during full database recovery and will not be recovered Warning: Datafile 49 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.327.1049672941) is offline during full database recovery and will not be recovered Warning: Datafile 50 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.326.1049672939) is offline during full database recovery and will not be recovered Warning: Datafile 51 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.324.1049672939) is offline during full database recovery and will not be recovered Warning: Datafile 52 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbdata.339.1049672939) is offline during full database recovery and will not be recovered Warning: Datafile 53 (+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/test_pdbindex.322.1049672939) is offline during full database recovery and will not be recovered PR00 (PID:124906): Media Recovery Waiting for T-1.S-382 (in transit)
# 回头寻找主库clone pdb时,standby的相关alert日志发现: # 由于主库clone时源库为rw,或者是pdb refresh clone,或者主库clone时指定了standbys=none参数,这些特殊情况导致standby上面这个pdb为nostandby或者tablespace和datafile状态为offline,从而pdb无法进行正常恢复,filename也未正常convert进入asm,而是指向dbs下面的UNNAME文件。 2020-08-28T23:54:22.012162+08:00 Recovery created pluggable database TEST_PDB TEST_PDB(3):Tablespace-SYSTEM during PDB create skipped since source is in r/w mode or this is a refresh clone TEST_PDB(3):File #40 added to control file as 'UNNAMED00040'. Originally created as: TEST_PDB(3):'+DATADG/ORCL/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/system.341.1049672939' TEST_PDB(3):because the pluggable database was created with nostandby TEST_PDB(3):or the tablespace belonging to the pluggable database is TEST_PDB(3):offline.
二、修复standby上面由于主库有问题的pdb
1 2 3 4 5 6 7 8
-- standby停止mrp进程和故障pdb disable recovery conn / as sysdba show parameter standby-- 正常一般是设置为auto recovermanagedstandbydatabasecancel; ho ps -ef|grep mrp
# 开始恢复故障pdb # 此时cdb为mount或者open都行,pdb是mount状态因为故障无法打开,需要恢复 # from service 指定主库的 net service name,12c以后可以这样restore rman target / run { allocate channel ch00 type disk; allocate channel ch01 type disk; allocate channel ch02 type disk; allocate channel ch03 type disk; allocate channel ch04 type disk; allocate channel ch05 type disk; set newname for pluggable database test_pdb to '+DATADG/ORCLDG/ADF2FB440E69A7B3E0538387040A6468/DATAFILE/%U'; restore pluggable database test_pdb from service orcl; switch datafile all; } # 重启db shu immediate startup mount
-- 6. 需要被clone的nocdb或者pdb源库打开为read only模式 shu immediate; statup mount; alterdatabaseopenreadonly; -- or -- alter pluggable database orcl open read only;
-- 7. 主库克隆目标pdb进入cdb库(备库也是从source通过dblink克隆,一般会比主库clone完成慢) createpluggabledatabase orcl_test from orcl@dblink_cdb1522;
自己刚开始走错了路,导致耽搁了不少时间,因为备库alert日志太多,一个临时库未配置清理,alert有上千万行,一直在刷,而且看到UNNAME感觉很熟悉,可能是convert参数有问题,或者什么路劲有问题导致无法正确转换主库dbfile。于是打算standby create修复一下,但意外的是$ORACLE_HOME/dbs/下面并没有控制文件里面记录的UNNAME文件,咨询了2个大牛,决定还是create看看能不能apply恢复成功,create as花了很久,因为300多G的库。可能create需要存在实际的unname数据文件,然后复制过去,如果unname不存在强行create as 后并不能apply恢复,也不能restore或者recover from service恢复,可能只是一个空的文件。我后面是把asm文件直接手动删除了,去主库手动备份过来restore才好了。所以出问题还是要先看日志,找到为什么出问题的原因,而不能看到表象就直接上手处理,很可能会把简单问题复杂化甚至恢复不好。