The Phoenix Project: My takeouts as a QA

Mkhitar Mkrtchyan
13 min readDec 25, 2023

--

Opening: All of us, I believe, are doing our daily jobs and decision-making based on our experience and knowledge projected on the current situation. Whether it’s hard or not, we know that ‘THIS’ is the way, and sometimes, out of the blue, someone or in this case, something changes your vision, like adjusting focus, the gut feelings are crystalizing, finding their way to the outer world into the forms of expressions, that can be shared and act like a huge magnet, aligning the wandering needles of disoriented compasses toward a common destination. This was the case with the book… “The Phoenix Project” was written in 2013 🤯 by Gene Kim

Before proceeding to the highlights, it’s worth noting that the story is told from the prims DevOps guy, with a corresponding attitude toward developers, product people, security people, and QAs…

About people…

Erik — The one who knows, has the the brain to digest every situation, basically the mentor and the lighthouse.

Sarah — senior VP of retail operations. She’s more ambitious than skilled and always blames others for her mistakes. Chris, the VP of app development, calls her ‘teflon’ because she never takes responsibility. I’ve started using that term too for people who avoid accountability like she does.

John — chief information security officer. Erik described his kind of people “with their heads in but”, who are willing to see the company burn down to ashes just to prove they were right. With a proper approach, they can transform from being a problem to being a solution.

Bill, Wes, and Patty —They collaborate well, take responsibility on their own, and don’t get involved in office politics. You can spot them easily because they stick together, focus on improving, and work effectively.

Brent — A uniquely gifted individual, effortlessly communicates with systems, seamlessly rectifying issues in a flow of pure, conscious thought.

About the work…

There are four categories of work —

  1. Business project — User stories, features…
  2. Internal projects — Automation tests suits, middlewares.
  3. Changes — Maintenance of existing business and internal projects.
  4. Unplanned work — This should be feared and avoided at any cost.

The only thing that can displace planned work is unplanned work, which is basically anti-work

It’s crucial to identify those, prioritize the first 3 types, and mercilessly eliminate the fourth one.

Every work center is made up of four things: the machine, the man, the method and the measures.

This sentence helped me to rethink the evaluation of the work. If something is dysfunctional I have to understand where the problem is.

Highlights of the book…

Show me a developer who isn’t crashing production systems, and I’ll show you one who can’t fog a mirror, or more likely is on vacation

This one amuses me because even if we become interstellar species, there will be developers with a high level of confidence (thanks to the Dunning–Kruger effect), who will mess with the production code or configs, causing trouble for the company.

There are newborn babies dropped off at church doorsteps with more operating instructions than what they’re giving us

Documentation… The Achilles’ heel of any company (except HashiCorp). Imagine if Moses brought tablets with Lorem Ipsum text instead of 10 commandments. Even people who are complaining because of the lack of documentation are not eager to write one, including me… However, these perfectly formed sentences make me discipline myself and start to document whatever is possible…

The best way to kill everyone’s enthusiasm and support is to prevent them from doing what they need to do. I doubt we’ll get a second chance to get this right.

In a highly enthusiastic and united atmosphere where people are eager to do what’s right, waiting or slowing down can be a big mistake. There might not be another opportunity. There isn’t an ideal moment or ideal circumstances…

Work in progress is a silent killer

Work in progress is a silent killer

True… So true and obvious that it is hard to admit… I‘m trying to do everything people ask of me and working hard on many tasks. Nevertheless, most of my work is still unfinished. People are getting impatient and starting to complain which adds stress to me and a sense of being unappreciated by those on the beneficiary side of my work. Not a single benefit from glorified multitasking…

It’s necessary to identify the bottleneck, any improvement before the bottleneck is illusion, and any improvement after the bottleneck is useless, because it will always remain starved, waiting for work from bottleneck

In addition to this axiom, in my opinion, improvements before the bottleneck are not just an illusion, but those also put more pressure on the bottleneck, and if the bottleneck is an employee, well, expect burnout…

You’re going to say that IT is pure knowledge work, and so therefore, all your work is like that of an artisan. Therefore, there’s no place for standardization, documented work procedures, and all that high-falutin’ ‘rigor and discipline’ that you claimed to hold so near and dear.”

Though generative AI grounded the developers who regard themselves as irreplaceable masterminds who build everything, Atlas carries the heaven on their shoulders by not letting it crash the earth and commoners… Still, we’re just little gears in a sophisticated system, and none of us are irreplaceable!

If you think IT Operations has nothing to learn from Plant Operations, you’re wrong. Your job as VP of IT is to ensure the fast, predictable, and uninterrupted flow of planned work that delivers value to the business while minimizing the impact and disruption of unplanned work, so you can provide stable, predictable, and secure IT service.

A predictable, uninterrupted flow of planned work… One sentence that did me the same as Mike Tyson did to Larry Holmes in 1988. I have to admit… I often fail at making accurate predictions, frequently interrupt tasks due to context switching, and consistently face challenges in planning effectively… The only thing that cheers me up and prevents resignation, is the quote “The first step in solving a problem is recognizing there is one”.

Remember, outcomes are what matter, not process, not controls, or, for that matter, what work you complete.

Whenever I give negative feedback on an employee’s performance, I often hear about their huge effort, extensive overtime, and personal sacrifices.

…no matter of delay they technically hit the deadline, there’s something really wrong with the definition of what a “completed project” is it means did Chris get all his Phoenix tasks done? Then it was a success. But if we wanted Phoenix in production that fulfilled the business goals, without setting the entire business on fire, we should call it a total failure.

Hitting the deadline was mentioned by Chris, VP of application development… And I hate this because it’s so true, though I’m blaming developers, it’s a humane thing. People with early access to the task will consume the majority of the time. Not only in IT but in every group project. Developers even invented the term “Dev Done”, which translates as the code is written, but no documentation, no test, and no deliverable! So what is done? It was made to cover the soft asses and fragile minds of developers during sprint reviews…

Half the time, you break more than you fix. Worse, you screw up the work schedules of everyone who’s actually doing important work. You are like the plumber who doesn’t even realize that you’re servicing an airplane, let alone the route you’re flying, or the business condition of the airline.

You win when you protect the organization without putting meaningless work into the IT system. And you win even more when you can take meaningless work out of the IT system. The biggest risk for Parts Unlimited is going out of business, and you seem hell-bent on making it even faster

A bitter pill to swallow, but building something is more valuable than testing it. The abovementioned citation is from the conversation with the security guy, who acts like accountant Terry from Pixar’s “Soul” movie. When he was warning the company about security threats to be handled, which are very effectively handled down the production line, he was waiting for the auditors to punish the organization, fine, or even sue them. He got frustrated when auditors did nothing because all the issues are handled at some point of the pipeline, by the system, or by people. If you’re a QA, who warned about the threat, wasn’t paid attention and then the threat becomes an issue in the production, harming both the company and the clients, you’ll feel some guilty pleasure, wouldn’t you?

The 3 ways…

The First Way: The Principles of Flow: This way focuses on optimizing the flow of work through the entire system. It emphasizes the importance of minimizing bottlenecks, reducing waste, and streamlining processes to ensure that work moves efficiently from development to operations to the customer.

Here’s an example of broken flow, accumulated from my bad experiences. New tickets are appearing in the To-Do column during the sprint, jeopardizing my planning, a bunch of tickets are left for days in the In Review column then one day, usually on the last day of the sprint, they appear in the QA column, or more frequently, the ticket is in QA column, but the functionality cannot be seen in any accessible environment, because of build fails, unavailability of environment, or other excuse provided by the developer. QA who is providing feedback to the developer promptly, resulting focus switch for the developer.

The Second Way: The Principles of Feedback: The second way emphasizes the importance of feedback loops at all stages of the development and deployment process. It encourages organizations to gather feedback from various sources, including monitoring and testing, and use that feedback to make continuous improvements. This way helps detecting and addressing issues early in the process.

There’s a saying “Strike While the Iron Is Hot”, the most effective approach I’ve experienced is for the developer to first deliver the functionality. While QA tests it, the developer writes unit tests. By the time these tests are done, QA has already provided the feedback.

The Third Way: The Principles of Continual Learning and Experimentation: The third way promotes a culture of continual learning and experimentation. It encourages teams to learn from their mistakes, experiment with new ideas, and foster a culture of innovation and improvement. This way is about creating an environment where employees feel empowered to take risks and learn from their experiences.

Encourage a culture of innovation by empowering individuals to be both the sculptor and the statue. In this environment, there are only two types of mistakes to avoid: 1) Fatal mistakes, which are irreversible and cannot be rectified, and 2) Repetitive mistakes, where lessons are not learned from past errors. Other incidents should not be seen as mistakes, but rather a part of the learning process.

If you can’t out-experiment and beat your competitors in time to market and agility, you are sunk. Features are always a gamble. If you’re lucky, ten percent will get the desired benefits. So the faster you can get those features to market and test them, the better off you’ll be

Agree, but proceed with caution. The book ‘Great By Choice’ suggests an approach of ‘shooting bullets before cannonballs.’ This means making small investments to test and gather feedback. Once you have positive results, only then commit to larger, high-stakes decisions.

In ten years I’m certain every COO worth their salt will have come from IT. Any COO who doesn’t intimately understand the IT systems that actually run the business is just an empty suit relying on someone else to do their job

Now, we’re witnessing this prophecy unfold. Remember, the book was written in 2013. During these ongoing waves of layoffs, the rule is clear: you either contribute or leave!

Underestimating capacity before taking the work leads to taking shortcuts, which leads to a more fragile system, which leads to more unplanned work…

Nothing to add! It’s like going to the toilet and without checking the presence of the toilet paper starting pooping.

Like financial debt, the compounding interest costs grow over time. If an organization doesn’t pay down its technical debt, every calorie in the organization can be spent just paying interest, in the form of unplanned work, and unplanned work is not free, quite the opposite. It’s very expensive because unplanned work comes at the expense of planned work

Like in the case of financial debt, it’s a matter of prioritization, first “pay” the debt with high interest, and then go down the list.

If the process is broken, there’s a leak, additional resource wouldn’t do much.

When they decide that Brent’s work should be documented so much that they can replicate it. Incidentally, until you do this, no matter how many more Brents you hire, Brent will always remain the constraint. Anyone you hire will just end up standing around.

Brent is the genius guy, eager to help everybody, resulting in the misuse of his magic. Another thing I appreciate here and in general is recognizing that it’s not right to solve problems by using up a lot of resources. If there is a lack of resources, the work should move slowly but should move. No matter whether it’s server capacity, db connection pool size, or human resources. No more resources should be poured in before the fix of the existing process with available resources.

Improving daily work is more important than doing daily work. Mike Rother says that it almost doesn’t matter what you improve, as long as you’re improving something. Why? Because if you are not improving, entropy guarantees that you’re actually getting worse, which ensures that there is no path to zero errors, zero work-related accidents, and zero loss.

The exact time when my skills and career started to grow constantly, was the moment when I decided not to set deadlines, but to set daily time limits to learn something new. No matter if I’m reading 40 pages during that 2 hours, or I’m stuck on page 1 line 3, it should be 2 hours. During these stormy times in IT, this approach helped me the most…

Operational risks posed by IT need to be managed just like any other business risk. In other words, they’re not IT risks, they’re business risks.

Though it’s not an obvious “a-ha” thing, later I realized that people are also considered to be a business risk.

In manufacturing, we have a measure called takt time, which is the cycle time needed in order to keep up with customer demand. If any operation in the flow of work takes longer than the takt time, you will not be able to keep up with customer demand.

For me, as an automation QA engineer, customer demand is the need of the team. For example, I had cases when there were lots of endpoints to automate, as well as new ones that were coming every sprint. So it is to make proper estimation and establish the takt, to keep up with the customer (team) demand.

It is necessary to create a deployment pipeline. That’s your entire value stream from code check-in to production. That’s not an art. That’s production. You need to get everything in version control. Everything. Not just the code, but everything required to build the environment

It’s like an assembly line, or powerplant, where there are a multitude of separate moving parts, yet the target is the same. And this paragraph made me think about monorepo…

Don’t be the idiot that fails because he didn’t ask for help.

These words remind me of the allegory of a long spoon. In the story, people in hell struggle to feed themselves with long spoons, while those in heaven feed each other under the same circumstances. This teaches the value of mutual help. I’ve observed people, myself included, nodding in conversations without full understanding, planning to search the keywords later. Eventually, I overcame my pride and began asking questions, humbly seeking knowledge like an ocean collects pure water from springs.

Final Thoughts…

I’m planning to reread this book after a while, and then, maybe other ideas will spark the flame…

--

--