0 people want to...

survive yet another system outage


 

Entries

More work awaits 2 years ago

..but the worst is over. We should have our DR sorted out by end of Jan, plus an improved approach for backup & recovery.

While one should never say “never”, I hope I don’t have to go through a similar nightmare again.



Luckily.. 2 years ago

..the manual transport worked, thanks to the consultant we got on board. But our work is far from over. Apparently, the transport needs to be reconfigured, but we’re deferring it until after the year end stock count is done.

Another good news, my budget for the disaster recovery was verbally approved. It’s merely formality from hereon, if all goes well our first ever DR infra should be in place in a month’s time, if not less. Indeed, this is a major silver lining after the nightmarish episode we’ve had to go through. So true.. you don’t know what you’ve got, or what you’re willing to spend money on, until it’s gone/down/busted!



At the rate we're going 2 years ago

..I don’t know if I’ll ever mark it done. Frustrates the hell out of me, but I’m not onsite, so I try to be supportive of the guys who are there, trying to work things out. The bleedin transport protocol is not working and there’s a new module that needs to be tested this Monday. So by hook or by crook, this transport issue needs to be sorted out before Monday. The guys are activating our last resort, which is a system restore. I hope this works, the last time we tried this the freakin OS got corrupted. Suspect the backup was already spoilt – the OS must have had problems already by then. I hope to God this works out, the guys have been working on this non-stop for the last few days. I can’t afford to have them dropping like flies. Please.. please let this work. And I’m gonna activate my last resort.. prayer.



The worst is over 2 years ago

..but we’re stuck with a hopeless backup config. For some reason, after we had to reinstall and reconfigure the database server, the tape drive’s throughput has been appallingly low! This being a production db server, a good backup is the last “get out of jail” card if you’re ever in a jam. So not until that’s settled, I’ll be keeping this goal around. But I’m glad the system is up, now if only those folks in Admin will stop giving me dagger looks. Nah.. just kiddin’, we’re good.



Pushing away exhaustion 2 years ago

I don’t think I’ve recovered from the weekend rafting trip. But I had to be here to supervise things and hopefully, see things come to a positive end. I’m killing time catching up with work but not cheers unfortunately. I’ve got Muse keeping me company while the guys go out for a drink. I’m praying hard that the restoration will be successful so that we can bring the database back online smoothly. Am trying not to think of my comfy bed at home or my son who was fussing trying not to cry when I had to leave him with my dad for the night. I feel so bad for having to do this but duty calls. Might as well stop feeling torn, hope for the best and get some actual work done.



Despite my growing anxiety 2 years ago

I am going to choose..

  • faith and trust – knowing I have some of the best people working on this system outage, trying to retrieve data in order to bring back the database online
  • patience – knowing that being pushy wouldn’t help and could lead to unwanted stress on an already weary team
  • calm – knowing that we have a last resort, while it’s hardly ideal, it means we can move forward although it will require patience, diligence and cooperation from the user groups.

Each time this happens, there is a natural instinct to wish it all away. To curse one’s fate and ask, why me, why now and why this. But I’ve lived through similar episodes, and each time there’s always something new that we all learn. No doubt everyone wished we didn’t have to learn these things the hard way, but at times it is the only way we’ll learn and grow.

I am praying hard that the data retrieval can be completed successfully. If that works out, that means half the battle is won. If not, we’ll have to rebuild the system using the last good backup we have. I know some people will be upset and annoyed if it comes down to that, but at this point I seriously don’t see any other way. Guess I’ll have to make peace with this even if some may see that it is not exactly good enough.




 

I want to:
43 Things Login