Here are some cultural hints which I have been refining throughout my career. After sharing them in 1-1 conversations with my teams, and particularly with new hires or people early in their careers, I found that people found them useful.
This turned out to be a great move as they now can be improved via crowdsourcing; they are now just as much of a community effort as they are mine.
These hints are not meant to replace our cultural guidelines, performance guidelines, leadership principles, etc – but instead to clarify or add detail and suggestions for behaviors and mental models that you may need in hard situations or to become better. They are also intended to be recipes for you to create muscle memory about how to think about the job you do and how to act to be successful.
If you like what you find here, please share with others… If you have comments for improvement or just downright disagree, get in touch and let’s talk.
Enjoy!
~ Mark
We take our job of building quality products and services for our customers seriously in every way. Rigor of thought, speech, action, and writing (“go memo-guidelines”) at every level is the only way to do that reliably and efficiently at scale. This means that we have to be the right people to ourselves and to each other to have the right outputs for our teams, our company, and our customers.
Don’t reach out to others unless you have tried a reasonable number of things first. Not only will this stop you from task-switching others, but it will slowly build up your expertise in our tools and procedures. Most of the time, you’ll find the answer to your question by the time you’ve tried that third thing – without bothering somebody else – and learning something along the way. This is our equivalent of “LMGTFY“. Why three? It’s a guideline only. (TFT wouldn’t have been as memorable). In general, the point is to avoid going overboard and unproductively going into a black hole of learning everything yourself – that would be a waste of time.
As a (prior) CEO of mine once said, “We have goals. Those goals have dates. But be very careful about letting the dates become the goals. You will go down paths you regret if you do that.”
Remember that we’re here to produce quality experiences for our customers and we’re here for the long term. Sure, there are some high-profile dates that actually have a meaningful impact on the company, such as a conference or a key funding date or partnership commitment. But most other dates are just somebody’s best guess 18 months ago – and your estimate now is probably a lot more informed and thus accurate.
The best dates are given either with 50% achievability (internal and team goals) or 70% achievability (public and business goals). Of course we want to get things done, and dates are part of that (see DDD). We move fast here. But I will never tell you to produce buggy software to make a date. You personally own this reality for your own work, your manager owns it for their team’s work – and so on fractally everywhere in the team. If you break it at your point in the chain and tell untruths about dates, it’s broken at every level above and beside you. And it won’t get better with time, as you’re creating a debt of unrealistic expectations which only gets harder and harder to pay back. Be strong, say what needs to be said about your confidence in the date and whether it’s even a date/goal we should be focused on.
What’s the best definition of “Teamwork” you’ve ever heard? What about the rather non-intuitive “Get your job under control before reaching out to help others”? People who offer to help others while their job is not under control can create a spiral where nobody in an organization is actually on top of their work, but they’re all being oh-so-helpful to each other. Ignoring your own work and trying to cover by helping others is also a refuge for people who are struggling in their own job; everybody wants to find something they can be viewed as being valuable at. This doesn’t mean to be a jerk when people ask – we should all help each other. But meter that help against whether you’re getting your own job, your own special individual contribution, done well. Note that there are things that only you can do, and make sure you do that before focusing too much time on work that many people can do.
When you write an email, re-read it before you send it, through the eyes of your readers. If you see an obvious question, so will your readers. Analyse the problem enough to answer that new question and then answer it in your email. Repeat this until you think you’ve answered the questions at a reasonable level of depth. For example, by sending an email (especially a project status email), you are spending a lot of people’s time for them to read and understand it. Partially, they want to know the actual details of the project. But for the most part, they really just want to know that you’re on top of it. If you write a wishy-washy summary, full of half-statements and ambiguities, you will get responses that sound like they are requests for details. But what they most likely are is votes of no-confidence on your running of the project. By answering all reasonable follow-on questions to a reasonable level of depth, you inform effectively, cut down on email churn, and show your readers you are managing your work rigorously.
Everything should have a date. Don’t send an email that says you’re “working with the widget team to figure out how to make their new API work with our code” unless you have a date of when that will be done – and include that date. Or a date for when you think it will be done. Or even a date for when you think you’ll know a date for when it will be done. Emails or updates or decks that end without dates can be a huge waste of time. AAA/DDD go together, because many things that are unanswered in poor status reports are what the next step is and when it will be done. At the end of every interaction with another employee, I advise you to summarize AIs (Action Items), PICs (People in Charge), and Dates. This immediately creates accountability.
Working together in Tech is actually a lot about working with people. People work best together when they feel respected, both for their intent and for their capabilities. As we grow as leaders and contributors, we often find that we retain early instincts we had in our career – that somebody who holds a different point of view or who even outright disagrees with us must not be as bright as we are, or must have poor intentions. Surprisingly, it’s actually irrelevant whether that’s true or not. If you treat the person you’re working with as if they aren’t brilliant and well-intentioned, your chances of your own success with whatever you are asking of them approach zero. On the other hand, if you treat people as if they are brilliant and well-intentioned (and you must really believe it or all your body language and tone gives you away), you have a chance to get things done. Given these ground rules, you can align on what’s best to delight our customers, respect our stakeholders, and fulfill our employees. Escape clause: As a last resort, try to align on both of you being well-intentioned for our customers, and try to work back from there to the internal issues/priorities you were trying to resolve. A friend gave me this article on the Principle of Charity as a way of thinking about this. Another article I’ve found useful is /Hanlon’s Razor.
We all have a lot of things to do. And we get asked to do more things “urgently”, and “ASAP”, all the time. Sadly, many of us (including yours truly) accept these items onto our personal backlog without the required consideration. The problem is that if you accept an ASAP (i.e. “Yet another Priority 1”) item silently or incorrectly, you’re betraying everybody around you to whom you gave prior commitments. I would advise you to think about asks like this: When somebody asks you to do something, either 1) know that you can do it, and accept it onto your backlog, and make an appropriate commitment, or 2) know that you need to think about it, and give a date for when that will happen, 3) Decline the item and question the asker how important this is compared to your other priorities. If you accept it onto your backlog, you must know where it falls in priority or you’re just fooling yourself. Side note: One of the most annoying problems as a leader or teammate is that if you give things to somebody who is unreliable and they commit to doing them, you now 1) are pretty sure they won’t get done, 2) can’t give them to somebody else. (Think Brent in The Phoenix Project).
When bad things happen, that’s OK. What’s not ok is not changing things so it doesn’t happen again. This doesn’t mean we go after every single thing that happens in the code or fleet once, but it sure means that when it happens a second time, you look into it. We have a lot of folks who have never worked on either enterprise or distributed software, or only did it at school. So they think blaming the subsystem is actually a root cause. Don’t settle for “the storage volume was stuck”, or “Service XXX threw a 500.” Ticket the storage system team and follow up, even if it eventually lands you talking to Leslie Lamport about TLA and the unprovability of algorithms that depend on 2-member quorum or even debugging a kernel mode device driver. Don’t just close the ticket. Go get the service code and look into it – or get somebody who does know the code involved. The definition of a complete Root Cause Analysis is that there is no more investigation needed to have somebody start figuring out how to solve the problem. Sometimes, a cultural problem with RCAs is that “nobody has time”. That’s a false choice – if a problem is going to come back time and time again, somebody sometime is going to have to fix it – you’re saving time by starting to solve it now. If doing a good RCA causes you not to be able to do other parts of your job, talk to your manager about how deep you should go or whether it should be given to somebody else – but don’t drop it silently.
This principle ties together with RCA. As much as we all like to blame things on gamma rays, one-off timing considerations that will never happen again, cloud service infrastructure upgrades, etc, there is a real cause with a real solution to every single thing that happens to our software. At our scale, not only will most one-off things actually happen again, but as we get bigger, more things that used to be rare become commonplace, overwhelming our ability to tease them apart and fix them. Software systems, like chemical reactions between molecules, are both deterministic and hard to monitor precisely. Just because you can’t figure out why a variable got set to something doesn’t mean that an instruction didn’t set it in a piece of code that looks just fine. If you get to the end of a problem where you have nowhere else to go, add monitoring, alarming, try/catch, etc. so that next time you can get closer to the root cause. But never ever write off a problem to random chance – there are truly no ghosts in computer science. That’s why it’s called a science.
There are always problems, and that’s OK. If it was easy, would they pay us these salaries and have free soup packets in the kitchen? There are even repeated occurrences of the same problem. People often have good intentions to fix them – but good intentions don’t work over the long haul or at scale. When something is bad enough to require your attention multiple times, isn’t it bad enough to make you want you to take the time to build a self-correcting feedback-based mechanism so that this will get better over time and then stay within an acceptable operating range without supervision? Whether this is something as simple as weekly emails to raise the visibility of bad things happening or code that actually stops deployments, initiates rollbacks and blocks deployments whenever it sees a watched metric go into alarm, we must have mechanisms. The overall goal is to put something in place that continues to fix a problem and make it better without humans being involved. Over time, the technical debt of manual operations and the resulting problems can result in an unacceptable slowing of velocity in an org.
The truth is that most things you think are going well in your group probably aren’t going as well as you wish – and the things you know aren’t going well are probably going a lot worse than you care to admit. What does this mean? Whenever you sniff smoke of a fire possibly happening, don’t analyze the urgency of the issue based on the smoke – realize that the fire is probably much worse than the wisp of smoke that made it to you. And very few software fires go out on their own. Projects described to you as yellow status are probably red, and most greenish projects are actually yellow. Work to get projects to “bright green” in order to deliver projects that will delight your customers – even if you have to deliver less of them. To be more worried sooner is simply to accept the realities of the complex world we live in – and fight back.
When you’re asked whether you can do something for our customers, the company, or your teammates, your job is not to say “No”, if you can’t do it. Instead, your job is to remove the bounding constraints and explore what it would take to get to “Yes”. Maybe those constraints are impossible – but maybe they aren’t. Discussions which start with “No” shut down innovation and creative thought. The leaders who are asking you for a deliverable that is really hard (and may well be impossible) don’t have the same information on the ground that you do – and a short answer that only contains “No” sure doesn’t change that. So rather than just saying “No”, you add value to the room by giving all the possibly crazy ways you can get to “Yes”. Most of the time this ends up in us getting to a very different “Yes” than the impossible crazy thing that was asked – but something that will solve the problem. Sometimes, it indeed does end with everybody in the room understanding that “No” is indeed not only the choice one person is making, but the right choice for the business and our customers. It’s your job not to close the door of that opportunity without everybody peeking inside. We can hire lots of people who can say “No”. Instead, add value to the company and our customers by figuring out the right “Yes.”
Post-Mortems are our way of acknowledging that something went wrong and the mechanism for how to get it fixed. As Jeremiah, one of the most senior engineers I ever met at Amazon, liked to say: “At Amazon, one of the only times that you get to call out everything you’d like to fix and actually be rewarded for it is in connection to a COE (Amazon’s equivalent of a Post-Mortem).” Most other times when you try to fix tech debt, you’ll get either grudging approval or the dreaded “how important is this to revenue/adoption-producing feature X?’ question, which as all engineers and scientists know is an unanswerable question posed by project managers to trick us into shipping crappy software . So when things go wrong, open a post-mortem. Be vocally self critical as much as you need to, but focus more on the actions needed to be taken to fix the problem. We make lots of mistakes because we move fast and because we’d probably make almost as many if we moved slowly. But ones that make it to having a customer-impacting negative effect are the worst. Use the company culture to your advantage to produce the code and service and customer experience you will be proud of, in a group that has the culture you want.
Ownership is a hard thing to fully understand. Think of walking down the street in a city you care about and seeing a piece of garbage. Ownership is picking it up, even though you were on your way to a meeting, and putting it into the trash. In my house, this often means now the trash can is full, so now I have to take it out – all because of a piece of trash I didn’t leave laying around. In other words, once a problem is identified, it shouldn’t be let go of again. Think about it as a wild tiger, a dangerous beast – you’re awesome and you’ve managed to grab the tiger by the tail. But it often turns out to be a tiger that you can’t tame or subdue. So do you let it go, assuming somebody else will grab it? No – a true owner wouldn’t let the tiger tail go – it will just go hurt somebody else; a true owner would find somebody who can actually fix the problem and hand it the problem (the tiger’s tail) carefully and rigorously to them. So ownership includes both the initial finding of a problem and a reasonable degree of following through. When you see something wrong, you owe to everybody else what you expect from them – that you’ll be on top of it, or find somebody else to own it, so that it gets fixed, and entropy doesn’t win.
Amazon has a principle that guides people to stand up when needed or to back down appropriately as well – in the best way for the company. This principle is called “Have Backbone; Disagree and Commit”. The key to this principle is to balance on the semi-colon. Nobody wants sheep – people who lean too fast to the right of the semi-colon. But we also can’t have people who stay to the left of the semi-colon too much – and have backbone unproductively and perhaps even disrespectfully. If you do fall to the right of the semi-colon, commit wholly; give the decision a fair chance, with all your energy. A good book for you if you’re having problems here is “Crucial Conversations” – if you can’t have productive discussions about high-stakes things, you can’t get this skill right. Note that in some cultures, you are expected to say what needs to be said, even if it gets in the way of social cohesion; note that even in those “radical candor” cultures it’s even better if you manage to say what needs to be said productively. In our culture of heart and humility, we bias strongly towards being respectful – and that’s awesome – but that’s not an excuse for not getting your job done by communicating effectively. In other words, in our more quiet and non-confrontational culture, we need to have a higher bar for figuring out how to say the important things that need to be said and saying them productively and respectfully.
Everyone deserves to be treated with respect. It’s not uncommon in organizations for people to value someone’s input or idea more (or less) based on their title or seniority, which means we discount someone’s idea if they aren’t “important.” But we’ve all been in situations where someone new asks a key question or makes a suggestion that can change the conversation in an important way – if we are really listening to what they say. Moreover, organizations in which people find time to talk to each other, listen to each other, and respect each other are organizations in which people feel empowered to think about how the organization can improve and feel good about their contributions to the team. This principle also applies with someone who is struggling in their job or even someone leaving the business. We should take care of people as well on the way down or out as on the way up or in. We should treat people with respect during PIPs (Performance Improvement Plans), meeting with them regularly and going above and beyond to make sure that we are helping, not hampering their chance to be successful in their next role. And, if for some reason, we or someone on your team decide this company or group isn’t the right place for them, we should consider publicly announcing them leaving (to the appropriate audience) including encouraging words to make sure that everybody knows that transitions out are just as important as transitions in. Not only are you being a better person (hooray!) but when you show respect even in the situation where your position or the rules don’t seem to require it, you set an important cultural bit for everybody who hears about or sees the interaction – that we are a culture of honor and respect. You are demonstrating that everyone truly does matter.
In the absence of people making things better, they will get worse. It’s just a fact. Pages will get outdated, method signatures will become convoluted, regression tests won’t work, etc. It is the expectation of every employee that they put some reasonable amount of effort into fixing things. So when you visit a wiki page that is valuable to you but see problems with it, leave it slightly better. When you see code that could use some comments, add them. This small tax (5%? 10%?) more than pays for itself if you live in a culture where most people do this. Sometimes people don’t fix things because they think their managers or product managers won’t prioritize this work; but teach them the value of great code, great docs, and great products – and the lack of surprises that come from well maintained artifacts.
We need to always make sure we prioritize our requirements correctly. In order, we think about Security, Durability, Correctness, Availability, Scalability via Scale-out, Operability, Features, Performance via Scaleup, and Efficiency. What this means is that for each item on the left side, it is more important than the items on the right side. For example, if we had to, we would give up availability of the data (shut the servers down) rather than risk it being deleted (loss of durability). The two that are often debated are the relationship between Scalability and Performance. The reason I place them in this order is that if we have the ability to scale out arbitrarily, we won’t be at risk of having our workload become impossible to run – but if we only have single-node scale-up, we could be trapped in a solution that doesn’t scale to fit our needs. Of course we want both. Notice that customers will only want to talk about Features, Performance, and Efficiency – it’s because they just expect all the things to the left of those three to be there. But we need to protect our customers who have decided to base their businesses on us as our first order of business. We need to take the harder and more holistic view and make sure all of them are addressed.
You’re often asked to pull in dates to be sooner, without losing quality or features or adding people, etc. Think about it – if you’re doing the right things, in the right order, with the right level of quality, with the right people, and you’re giving them the right support, then the date is the date. If you want to change the date, then figure out which of the 5 sides of the pentagram to change. Please ask your leaders to not “Just ask for a new date”. Educate them that they’ll get a new date, but it will be wrong. Help brainstorm through the constraints of the project rather than the optical date at the end. (Note that this is a shameless extension of the Iron Triangle…)
I find that so many projects get lots of energy and time and then peter out a bit at the end. Crossing the finish line with a high bar is important. In Tech, as an IC, this is about fixing that last bug, going over the document one more time, or adding that last test case. As a leader, it’s often about change initiatives – most change initiatives dribbles into nothing before they have taken root, not only wasting the time of everybody involved, but often branding the leader as a person who thrashes others. In everything we do, we need to have a clear Definition of Done, (often called “Done, Done, Done” by Tech nerds) and either get to it or intentionally decide not to. To a large degree, I find lack of this principle being the reason I can’t promote somebody I otherwise would.
One of the worst things a senior member of the team can do is misrepresent something they hope is true as something they think is true or even worse, know is true. Other members of the team will make decisions based on that surety – and those decisions will likely be wrong. In order for us to make good decisions, everybody needs to be clear about their level of confidence of every statement that will affect a decision. Don’t over-represent your optimism about an architecture or project, and don’t under-represent it either. Not only is it OK to expose your level of uncertainty, it’s required, especially on the high-stakes decisions you find it most uncomfortable to be uncertain about.
The power of delighted mistakes is huge. When I was in Amazon, when we messed up sending somebody a package, sent it to the wrong place, or sent them the wrong thing, and they contacted us, we could fix it. That’s the low bar – anybody can do that. But we could do better. We could pick up the “call me now” button in 15 seconds or less with the customer’s messed up order already up on our screen and make them feel cared for, even though we messed up. We could fix it in a way that absolutely delights our customer: we could tell them to keep the incorrect item, we could refund them the whole purchase price, and we could send them the correct one overnight express – for free. We could send an apology letter from Amazon to the person who didn’t get their gift on time. Correctly handled, a mistake is actually one of our best opportunities to create longterm loyalty from our customers. For our part of the world, this can be hard. When you’ve lost a customer the ability to run their business, and cost them credibility with their customers by having our products fail, it’s almost impossible to delight them again. But we can do the best we can; by responding quickly, by refund/crediting them without being asked, and we can make them feel like they are the single most important customer we have. Delight is the bar, even (especially?) when you’ve messed up. At least for me, that makes my job more meaningful and I hope for you as well.
I hope these have been useful to you; if you’ve gotten this far with an investment of your time, maybe drop me a quick note. Note that since others have taken this text for their own use, they’ve asked about how they can/cannot use it. Thus, I have decided to license this content under Apache 2 – Copyright – 2020 until further notice. Thanks for being a part of us all becoming better leaders together!
– Mark