Axe-Fx III 16.00 Beta 10 "Cygnus" Firmware - Public Beta #7

Status
Not open for further replies.
It can be very stressful. Especially if you're one of the deep technical resources, which I am. The more you know, the more people come to you for help even when it's not your problem...

Fortunately I have a good group of co-workers...

But the overall IT is a huge number of people in many disparate departments. And some people should not be doing what they do :(
I have similar war stories to the veterans commenting here and agree with all. It is fun (now) to swap war stories but it wasn't quite so funny at the time :). So much depends on variables, environments and consequences. Doing a new release late on a Friday, just when everyone is going home and the tech resources are on standby is a good thing in some cases, because the consequences of problems are not as dire since fewer people/transactions are impacted (smaller blast radius). On the other hand, if you were planning a long weekend away, you probably don't want to do a new release just before you leave :cool: . This can quickly become the wife's least favorite maneuver :). I'm thinking that Scott Adams (Dilbert cartoons) or Jeff Allen (Dry Bar) probably have some sketches about this. There used to be a "Programmers Credo" floating around some decades ago - it might be fun to update that in the Cloud era.
 
That's wild. I can't even imagine the pressure. One big chip recall could tank a semiconductor company.
Do you remember the Pentium floating point error back in 1993. I wasn't there at the time, but they replaced many thousands of processors at a cost of about $475 million. Even though most people would have never likely even noticed it. Many of the engineers wanted the returns but management decided to destroy them. Likely not true, but I heard they paved a road over them at one of the Intel facilities. Talk about the information highway.
 
Do you remember the Pentium floating point error back in 1993. I wasn't there at the time, but they replaced many thousands of processors at a cost of about $475 million. Even though most people would have never likely even noticed it. Many of the engineers wanted the returns but management decided to destroy them. Likely not true, but I heard they paved a road over them at one of the Intel facilities. Talk about the information highway.

Should've given those chips the Firebird X treatment. 😉
 
What if you drove this ship? 😬

View attachment 80614
6828_1201543653245_1668196500_558260_5722809_n.jpg
No thankfully, however, I was a nuke electrician on this 35,000 hp, 6500 ton pig, the USS Los Angeles (SSN-688) and had the luck of being Throttleman during an overhaul/refueling at Mare Island Naval shipyard. I got to op test the shaft out of the yards, from Ahead Flank to Full Astern as fast as I could. It was like riding a bucking bronco and it was amazing how fast these things stop. This is only allowed 3 times and the shaft is replaced ... imagine that thing breaking and coming out?
 
View attachment 80618
No thankfully, however, I was a nuke electrician on this 35,000 hp, 6500 ton pig, the USS Los Angeles (SSN-688) and had the luck of being Throttleman during an overhaul/refueling at Mare Island Naval shipyard. I got to op test the shaft out of the yards, from Ahead Flank to Full Astern as fast as I could. It was like riding a bucking bronco and it was amazing how fast these things stop. This is only allowed 3 times and the shaft is replaced ... imagine that thing breaking and coming out?
It's hard to appreciate just how enormous those things are. Wow.
 
I have similar war stories to the veterans commenting here and agree with all. It is fun (now) to swap war stories but it wasn't quite so funny at the time :). So much depends on variables, environments and consequences. Doing a new release late on a Friday, just when everyone is going home and the tech resources are on standby is a good thing in some cases, because the consequences of problems are not as dire since fewer people/transactions are impacted (smaller blast radius). On the other hand, if you were planning a long weekend away, you probably don't want to do a new release just before you leave :cool: . This can quickly become the wife's least favorite maneuver :). I'm thinking that Scott Adams (Dilbert cartoons) or Jeff Allen (Dry Bar) probably have some sketches about this. There used to be a "Programmers Credo" floating around some decades ago - it might be fun to update that in the Cloud era.
What one of co-workers used to call "a resume generating event" :D

I have a story I share with the young ones on the job periodically about what has become known as the "the Got Root? event" (I've literally been there and have the shirt ;)).

Yes, it's a different mood telling it years afterwards than when I experienced it in real time. But it still makes my heart go to my throat at parts...

But my team back then was excellent. We came together in a marathon overnight session to do massive remediation of the damage I had done that Friday night.

I was almost physically ill when I discovered it... Another co-worker that lived in the same town said "I'm coming to pick you up - you're not in the right mental state to drive!".

He also helped me to come to grips with things by explaining to me that "only you could have ever made this mistake because nobody else here would even have the ability to take on the change to begin with".

I was also a contractor at the time... Probably 15 years ago. I remained there as a contractor for 17 years before accepting an employee position almost 4 years ago.

Good times!

Anyway... Back to the awesome Cygnus thread ;)
 
Monstrous. And that one is still "small" compared to the Ohio-class and the crazy Russian Typhoon-class. :oops:
The physical available space to humans inside is about a 3 story house, and the crew was 130 men with 1/3 asleep at a time, on an 18 hour/day cycle. The boat was 360 feet long and 33 feet in diameter while surfaced and 30 feet at 1,000 depth. It was a miracle of human engineering. All the decks had gaps and would tighten the deeper we dove and everything was sound isolated.

I was on watch, inside the big black beast. You can see the torpedo doors if you look closely and the cap covers a huge disco ball covered with 100's of transducers that have enough energy to vaporize the water within 1 foot of it's surface.

It was hilarious, the first time we submerged out of the yards, there was a leak from the periscope. The XO thought it would seat so he kept asking for more depth. The Helmsman and Planesman had a fire hose of water shooting down on them. Turned out a shipyard worker forgot a seal :)
 
Last edited:
What one of co-workers used to call "a resume generating event" :D

I have a story I share with the young ones on the job periodically about what has become known as the "the Got Root? event" (I've literally been there and have the shirt ;)).

Yes, it's a different mood telling it years afterwards than when I experienced it in real time. But it still makes my heart go to my throat at parts...

But my team back then was excellent. We came together in a marathon overnight session to do massive remediation of the damage I had done that Friday night.

I was almost physically ill when I discovered it... Another co-worker that lived in the same town said "I'm coming to pick you up - you're not in the right mental state to drive!".

He also helped me to come to grips with things by explaining to me that "only you could have ever made this mistake because nobody else here would even have the ability to take on the change to begin with".

I was also a contractor at the time... Probably 15 years ago. I remained there as a contractor for 17 years before accepting an employee position almost 4 years ago.

Good times!

Anyway... Back to the awesome Cygnus thread ;)
I’d be interested to hear that story UG... while we’re comfortably OT, I have a couple of similar stories that still give me a twinge in my guts. One involves having a dev and a live SQL window open at once, and another one involves a misconfigured null check in a global replication routine... I hung my head.
 
The physical available space to humans inside is about a 3 story house, and the crew was 130 men with 1/3 a sleep at a time, on an 18 hour day cycle. The boat was 360 feet long and 33 feet in diameter while surfaced and 30 feet at 1,000 depth. It was a miracle of human engineering. All the decks had gaps and would tighten the deeper we dove and everything was sound isolated.

I was on watch, inside the big black beast. You can see the torpedo doors if you look closely and the cap covers a huge disco ball covered with 100's of transducers that have enough energy to vaporize the water within 1 foot of it's surface.

It was hilarious, the first time we submerged out of the yards, there was a leak from the periscope. The XO thought it would seat so he kept asking for more depth. The Helmsman and Planesman had a fire hose of water shooting down on them. Turned out a shipyard worker forgot a seal :)

These are huge mistakes that can cost lives and a lot of money. My biggest mistake so far has been to get married :D
 
I’d be interested to hear that story UG... while we’re comfortably OT, I have a couple of similar stories that still give me a twinge in my guts. One involves having a dev and a live SQL window open at once, and another one involves a misconfigured null check in a global replication routine... I hung my head.

Non tech people should probably stop reading here! :)

The short version is this:

I was supporting the Unix systems for SAP. There were a number of other Unix teams within the company.

Each team had a unique "domain" (using NIS) for managing Unix users and groups. In Unix (and Linux) the user ID number and group ID numbers are what the OS cares about - names are merely "labels" for humans. Each group was assigned specific ranges of UID and GID numbers even though each had their own domain.

The company decided to implement Active Directory single sign-on for Unix by incorporating an AD extension to include Unix user and group attributes into the central Active Directory domain... But, you need to maintain unique user and group IDs, and not all groups were correctly sticking to their assigned ranges over time.

So, as part of this multi-group project many of us needed to deal with conflicting user or group IDs by reassigning the values used by certain users and then updating the assigned values to all files with ownership by either the group or user across the entire system landscape for the team.

I wrote a series of Perl scripts using Find::File module that (basically) emulates the find command but in a programmatic fashion. I was able to find very specific file metadata attributes to drive these changes across all our systems from a single point.

For every file or directory that was owned by a conflicting user or group ID, it would report that information and update the values.

I (naturally) did a large amount of testing before the real event. No issues encountered.

The morning of the event (which was scheduled to begin that evening) I thought of another corner case, added logic and did more testing. Everything was great.

That night, I kicked off the first phase and was closely monitoring and seeing no issues. After a while I triggered a second, parallel stage. Also going great.

After a while, I stopped monitoring because there were no issues.

About an hour later, I got a call from the monitoring team as I was also the Oncall support that weekend. They had noticed a few of the big weekend backups had issues. We had a lot of backups so that was pretty typical.

As I was investigating and not seeing the typical errors, I got another call: all active backups had failed! Uh oh, that's not good.

Since I wasn't the expert on the backups, I called him - he was also my boss ;)

He wasn't home so I was "driving by cell phone" and describing what I was seeing... Nothing was standing out to him, either, but suddenly it occurred to me that the issues we were experiencing could be caused by incorrect file ownership.

I was "talking thru" this when I confirmed that this was in fact the reason and that it must be related to my (still running) change. I immediately cancelled all the processes.

I got light headed and my heart was in my throat... I thought I might throw up, or cry, or both!

By boss (who was always very cool and pragmatic) says nothing. Hello? Turns out he lost connection right before I revealed the smoking gun :eek:

I had to call him back and explain the whole thing - again!

So, Mr calm and cool says: I'm going to get a bridge line set up and call these people and you call those people. Let's figure out what to do.

After the bridge call, we agree to meet at the office and start working to understand the scope and undo it.

That's when my coworker from the previous post told me he was coming to get me.

About 8-10 of us spent 14 hours all night to use various methods to correct permissions and get everything back up and running... Because it was ALL down!

So what happened?

All of the files for conflicting users and groups that should have been changed were changed exactly as they should have been.

However, every other file was changed to be owned by user "root" (the Unix super user).

Hence, the Got Root?(tm) incident

The cause: the last minute change I made that morning was to add condition to a multi-level bit-wise Boolean expression. I didn't include it inside a parenthesised grouping of tests properly.

When I tested the changes, I only tested positive test case files but not negative test cases... Bad thing to overlook as that would have immediately shown my mistake :(

I fully expected to be fired. I wasn't... Everyone was very understanding.

And on top of that, my boss bought me a Got Root? shirt that still hangs on my cubicle wall.

Teachable Moment!
 
I can't believe it!!! A Friday has passed without a new firmware update. WHAT THE HELL is going on at Fractal? This is outrageous. Cliff, WAKE UP!!! Haha, just kidding.
Aside from all the war stories, which I am very much enjoying, some of us are still processing Cygnus in beta form. (I'll try to teach you guys on the wrong side of the Atlantic how that's pronounced some time. ;) ) For me, yes, it's cool, but a bit mind blowing at the same time. If I was playing out 3 nights a week I would definitely be looking for some downtime to check presets. Fortunate byproduct of current reality is that many of us get time to rethink everything. Cygnus is very cool, and couldn't have arrived at a better time for me.

Liam

[Oh and unix-guy, I get to enjoy SAP, and occasionally difficult Magento integration on a daily basis. Just glad I don't have to take responsibility for writing the code. The code itself is seldom the problem, but it's the easiest thing to blame.]
 
Last edited:
Status
Not open for further replies.
Back
Top Bottom