Tuesday, June 23, 2009

NetBackup Error: Watch your policies or else..


I have been using Veritas NetBackup for Windows 6.5 for about 12 months and I really enjoy using it. This product is a huge package of various
interconnected components that allow you to backup every machine in your enterprise environment.

My NetBackup Topolgy Map

The machines can be located anywhere in the world, as long as you have the right plumbing in place. Even better is that there was VCB support released in 6.5.X So now I can perform LAN free, off-host, hot-backups by using VCB to mount my VMDKs and stream to tape. I can also stream to disk and it performs very well.
On a couple of physical servers, I use the NetBackup agent installed locally to backup to tape. I also have a VM that is running a backup
agent and is connected via iSCSI to an AX-150i NTFS volume used as a share by everybody in my organisation. See below.
iSCSI shared drive in Disk Management

I have performed weekly data restore tests with no problems for a long time. Today is different, I had a problem ! It looked like this...

the restore failed to recover the requested files(5)

Aaargggh. Not a good message, so I decided to delve deeper into this. The last thing I needed was this message appearing on a production resto !

The offending job has a square around it.

(This scenario could actually could be worse because the test data in this example is a female staffers WEDDING photo's. Surely a great use of expensive storage :)
So I checked Activity Monior and saw this:

24/06/2009 2:50:16 PM - begin Restore
24/06/2009 2:50:18 PM - restoring image XXXXXXX_1245225000
24/06/20092:50:18 PM - requesting resource Fibre Transport
24/06/2009 2:50:18 PM - Info nbjm(pid=2752) NBU status: 800, EMM status: Fibre Transport resources are not available
24/06/2009 2:50:18 PM - Info nbjm(pid=2752) NBU status: 800, EMM status: Fibre Transport resources are not available
24/06/2009 2:50:25 PM - connecting
24/06/2009 2:50:30 PM - connected; connect time: 00:00:0524/06/2009 2:50:30 PM - started process bptm (3004)
24/06/2009 2:50:33 PM - Critical bptm(pid=3004) image open failed: error 2060013: no more entries

So the jobs took about 15 secs to crash. Fibre Transport resources are not available? Image open failed?

I clicked on the Troubleshoot button for more info and was told to go to http://support.veritas.com/nbucode/5

Just as well there was nobody breathing down my neck to get this fixed, there were 42+ pages on this error, so being a believer in
persistence, I then decided to try a restore from last night, and it worked.

So I compared the successful and unsuccesful logs and noticed differences.

24/06/2009 3:39:23 PM - begin Restore
24/06/2009 3:39:26 PM - 1 images required
24/06/2009 3:39:26 PM - media VDLY04 required
24/06/2009 3:39:31 PM - restoring image XXXX_1245743401
24/06/2009 3:39:38 PM - connecting
24/06/2009 3:39:42 PM - started process bptm (3216)24/06/2009 3:39:42 PM - mounting VDLY04
24/06/2009 3:39:43 PM - connected; connect time: 00:00:05
24/06/2009 3:39:40 PM - requesting resource VDLY04
24/06/2009 3:39:40 PM - granted resource VDLY04
24/06/2009 3:39:40 PM - granted resource IBM.ULT3580-TD4.000
24/06/2009 3:40:26 PM - mounted; mount time: 00:00:44
24/06/2009 3:40:29 PM - positioning VDLY04 to file 119
24/06/2009 3:42:59 PM - positioned VDLY04; position time: 00:02:30
24/06/2009 3:42:59 PM - begin reading
24/06/2009 3:43:16 PM - end reading; read time: 00:00:17
24/06/2009 3:43:18 PM - restored image XXXXX_1245743401 - (the requested operation was successfully completed(0)); restore time 00:03:47
24/06/2009 3:43:22 PM - end Restore; elapsed time: 00:03:59

the requested operation was successfully completed(0)


Notice the difference in the first and second lines in the logs?


The top log says:


24/06/2009 2:50:16 PM - begin Restore

24/06/2009 2:50:18 PM - restoring image XXXXXXX_1245225000

24/06/2009 2:50:18 PM - requesting resource Fibre Transport


The second log says:


24/06/2009 3:39:23 PM - begin Restore

24/06/2009 3:39:26 PM - 1 images required

24/06/2009 3:39:26 PM - media VDLY04 required


I realised that VDLY04 is a daily tape, so the image the failed job was trying to restore was from a disk. I then remembered I was experimenting with disk storage for a Policy, but I deleted the folder used for backup images last night.


Veritas could not find the images, so it exited the job. As it should.


I immediately suspended the Disk policy and let tape do all the work.
Be careful when using multiple policies and storage units for the same data.


Cheers

1 comment:

  1. I got the same or similar message from a disk image restore BUT I had all media servers in the "override" option on the master. After I removed them, rebooted, the restore worked BUT still showed this message:
    7/29/2009 8:20:47 AM - Info nbjm(pid=256) NBU status: 800, EMM status: Fibre Transport resources are not available.

    However, this may just be a message from NBJM indicating it tried to use fiber transport but couldn't and used LAN instead.

    ReplyDelete