Stu Mason

I don’t consider myself particularly stupid, and I really enjoy DevOps, but every year I try and read the Accelerate State of DevOps Report (SoDR) and every year it just makes me feel dumb. The language used removes me from the gold within. Take the following sentence for example: “Contextualizing the research is possible when practitioners have conversations about how work is completed today”. I don’t fully understand that sentance on first read through. I find it muddy, unclear. The language, combined with the design of the PDF, leaves a plate of dark tinted glass over the entire thing.

Other technical documents - AWS Whitepapers for example - don’t do this to me, or not so much anyway. Perhaps because the focus of these whitepapers is singular, or at least much less abstract. Either way, I think sentences like the example above are just a bit daft (“We can understand the research better when people talk about how they currently work.” perhaps?)

Anyway, I…

Broke the PDF up into single pages
Converted each page from PDF to a .txt file
Ran it through the Anthropic API telling the LLM to reformat the text into pretty Markdown
Generated a simple-language summary at the bottom of each page
Combined all the pages into a single document, below.

The following is the output of that process. It brought me no closer to the goal - it’s actually a little awful, but the journey was fun, and that’s what matters right? (The goat diagrams are the best part).

Accelerate State of DevOps Report 2023

Introduction

The Accelerate State of DevOps Report 2023 is an annual report that provides insights into the current state of DevOps practices and their impact on organizational performance. The report is based on a survey of professionals from various industries and aims to identify the key factors that contribute to successful DevOps implementations.

Key Findings

High-performing organizations continue to outperform their peers in terms of deployment frequency, lead time for changes, time to restore service, and change failure rate.
Automation and continuous delivery practices are strongly associated with improved performance outcomes.
Psychological safety and a culture of learning and experimentation are critical for fostering innovation and improving performance.
The use of cloud technologies and modern architectures, such as microservices and containers, is increasing among high-performing organizations.
The adoption of site reliability engineering (SRE) practices is growing, with a focus on balancing reliability and innovation.

Conclusion

The Accelerate State of DevOps Report 2023 highlights the importance of embracing DevOps practices, fostering a culture of collaboration and experimentation, and leveraging modern technologies to drive organizational performance. By adopting these practices and principles, organizations can improve their ability to deliver value to their customers and maintain a competitive edge in an increasingly digital world. Here is the table of contents converted to markdown format:

Executive Summary

Prelude

This research aims to provide leaders and practitioners with insights on where they can make an impact. The research explored three key outcomes and the capabilities that contribute to achieving them:

Outcome	Description
Organizational performance	The organization should produce not only revenue, but value for customers and the extended community.
Team performance	The ability for an application or service team to create value, innovate, and collaborate.
Employee well-being	The strategies an organization or team adopts should benefit the employees—reduce burnout, foster a satisfying job experience, and increase productivity.

The research also explored performance measures that are often talked about as ends-in-themselves:

Performance Measure	Description
Software delivery performance	Teams can safely, quickly, and efficiently change their technology systems.
Operational performance	The service provides a reliable experience for its users.

For nearly a decade, the DORA research program has been investigating the capabilities and measures of high-performing technology-driven organizations. They have heard from more than 36,000 professionals from organizations of every size and across many different industries.

DORA tries to understand the relationship between ways of working (capabilities) and outcomes, which are meaningful accomplishments that are relevant across an organization and to the people in it. This research uses rigorous statistical evaluation and is platform-agnostic.

Summary

The DORA research program has been studying high-performing technology organizations for almost 10 years. They look at the relationship between how organizations work (their capabilities) and the outcomes they achieve, like organizational performance, team performance, and employee well-being. The research also looks at measures like software delivery performance and operational performance. The goal is to give leaders insights into where they can make changes to improve their organization’s performance and the well-being of their employees. Summary:

The key findings emphasize the importance of establishing a healthy culture, focusing on users, and balancing technical capabilities to drive organizational performance and employee success. Some key points:

Generative cultures lead to 30% higher organizational performance.
A user focus results in 40% higher organizational performance.
Faster code reviews are linked to 50% higher software delivery performance.
High-quality documentation greatly amplifies the impact of technical capabilities on organizational performance.
Cloud computing and infrastructure flexibility lead to 30% higher organizational performance.
Balancing delivery speed, operational performance, and user focus yields the best results and improves employee well-being.
Underrepresented groups and women tend to experience higher levels of burnout, likely due to taking on more repetitive work. Ensuring a fair distribution of work is crucial. Applying DORA Insights in Your Context

To get the most out of the DORA research, it’s important to consider it in the context of your own team and users. For example, while teams with faster code reviews have 50% higher software delivery performance, your performance may not improve if code reviews are already fast but speed is constrained elsewhere in the system. Contextualizing the research requires conversations about how work is completed today, which can lead to improved empathy, collaboration, and understanding of each participant’s motivations.

Improvement work is never done. The process involves finding a bottleneck in your system, addressing it, and repeating the process. The most important comparisons come from looking at the same application over time, rather than comparing to other applications, organizations, or industries.

Metrics and Measurements

Metrics and dashboards help teams monitor their progress and correct course. While practitioners and leaders strive for organizational performance, team performance, and well-being, measurement itself is not the goal, just as delivering software is not the goal.

Fixating on performance metrics can lead to ineffective behaviors. Instead, investing in capabilities and learning is a better way to enable success. Teams that learn the most improve the most.

You Cannot Improve Alone

We can learn from each other’s experiences. The DORA Community site (https://dora.community) is an excellent forum for sharing and learning about improvement initiatives.

Summary: This page discusses how to apply insights from the DORA (DevOps Research and Assessment) research in your own context. It emphasizes the importance of contextualizing the research through conversations about current work processes, focusing on continuous improvement by addressing bottlenecks, and avoiding fixation on performance metrics. Instead, investing in capabilities and learning is encouraged. The page also highlights the value of learning from others’ experiences through the DORA Community site. Concepts and Measures

This section describes the concepts DORA tries to measure, which form the foundation of both the report and the models used. It is important for the authors to be clear and consistent about these concepts.

Key points:

Multiple indicators are often used to capture multifaceted concepts
Exploratory and confirmatory factor analysis is used to evaluate the success of capturing these concepts (more details in the Methodology section)
Scores are scaled from 0 to 10, with 0 representing the complete lack of a concept and 10 representing its maximum presence
This standardizes how the concepts are discussed and allows for comparing data across years

Each concept is presented with:

An icon for easier reference
The average score (mean) for the concept in the sample
The boundaries of the interquartile range (25th and 75th percentiles) to show the spread of responses
The median value, which if dramatically different from the mean, may indicate skewed data
A description of the concept and how it is measured

Summary: This page introduces the key concepts DORA measures and explains how they are presented in the report. Scores are standardized on a 0-10 scale for consistency and comparison across years. Each concept is accompanied by an icon, average score, interquartile range, median, and description for clarity and ease of reference. Key Outcomes

The key outcomes are the goals that people, teams, or organizations strive to reach or avoid. These measures are important for self-evaluation and assessing performance at various levels.

The key outcomes discussed are:

Outcome	Mean	Median	IQR	Description
Organizational performance	6.3	6.3	5-8	High performing organizations have more customers, higher profits, and more relative market share for their primary product or service.
Team performance	7.6	8	6.6-9	High performing teams adapt to change, rely on each other, work efficiently, innovate, and collaborate.
Software delivery performance	6.3	6.4	5.1-7.8	Measured by deployment frequency, lead time for changes, change failure rate, and failed deployment recovery time.
Operational performance	6.2	6.3	5-7.5	The extent to which a service meets user expectations, including availability and performance.
Job satisfaction	6.08	7.1	5.7-7.1	A single-item question that asks respondents to rate how they feel about their job as a whole.
Burnout	4.1	4	2-6	The psychological and physical toll of work, and how one appraises the value and meaning of their work. Burnout causes cynicism.
Productivity	7.5	7.9	6.7-8.8	A productive person does work that aligns with their skills, creates value, and lets them work efficiently.
Reliability targets	7	7.5	5-7.5	The extent to which a service meets its stated goals for measures like availability, performance, and correctness.

Well-being is a composite of burnout, productivity, and job satisfaction.

Summary:

This page discusses the key outcomes that people, teams, and organizations strive to achieve or avoid. These outcomes include organizational performance, team performance, software delivery performance, operational performance, job satisfaction, burnout, productivity, and reliability targets. The page provides a brief description of each outcome and presents their mean, median, and interquartile range (IQR) values. Additionally, it mentions that well-being is a combination of burnout, productivity, and job satisfaction. Summary:

This page discusses various reliability practices, processes, and technical capabilities that teams use to improve the operational performance of services. These include:

Artificial intelligence contribution: The role of AI in contributing to technical tasks (mean: 3.3, median: 2.4)
Documentation: The quality of written content used in daily work (mean: 5.8, median: 6.25)
Code review speed: The time it takes from pull request to code change review (mean: 6.5, median: 6)
Trunk-based development: Making small, frequent changes regularly merged into the main code branch (mean: 5.6, median: 5.6)
Continuous integration: Automatically building and testing software changes (mean: 6.9, median: 7.8)
Loosely coupled architecture: Software that can be written, tested, and deployed independently (mean: 6.4, median: 6.7)
Continuous delivery: Getting changes into production safely, quickly, and sustainably (mean: 7.0, median: 7.3)
Flexible infrastructure: Scalable infrastructure that is elastic, accessible, and measured (mean: 6.6, median: 7.3)

The page provides the mean, median, and interquartile range (IQR) for each practice, giving a sense of how widely adopted and important each one is considered to be.

Culture Aspects

Defining culture is challenging, but it can be described as the prevailing norms (such as flexibility), the prevalent orientation (such as user-centricity), and the ambience (such as organizational stability) of the workplace.

Key Culture Aspects

Aspect	Mean	Median	IQR	Description
User-centrism	7.8	7.8	5.6-8.3	Understanding and incorporating users’ needs and goals to make products and services better.
Knowledge sharing	6.4	6.7	5.0-8.3	How ideas and information spread across an organization. Team members answer questions once and make the information available to others. People don’t have to wait for answers.
Westrum organizational culture	7.3	7.8	6.1-8.6	How an organization tends to respond to problems and opportunities. There are three types of culture: generative, bureaucratic, and pathological.
Job security	5.9	6.7	3.3-8.3	A single-item measure that asks people how often they worry about their job security. Higher scores equal less worry.
Work distribution	5.8	5.8	3.8-7.9	Formal processes to help employees distribute tasks equitably within a team.
Organizational stability	7.2	8.3	6.7-8.3	A single-item measure that asks how stable or unstable the work environment is for employees.
Flexibility	7.7	8.3	6.6-8.9	How, where, and when a person works on tasks.

Summary

This page discusses various aspects of organizational culture and their importance in the workplace. Key aspects include user-centricity, knowledge sharing, Westrum organizational culture, job security, work distribution, organizational stability, and flexibility. The table provides an overview of each aspect, including its mean, median, and interquartile range (IQR) values, along with a brief description. These aspects collectively contribute to the overall culture of an organization and can significantly impact employee satisfaction, productivity, and the success of the organization as a whole. Here is the converted text in markdown format with a summary at the end:

Chapter 1 Takeaways

The first step in improving performance is to set a baseline for an application’s current software delivery performance, operational performance, and user-centricity. These measures help teams evaluate how they’re doing and provide a good signal for how things are changing over time.

These measures, though, are not the means by which a team will improve. With this baseline, it is important to assess a team’s strength across a wide range of people, processes, and technical capabilities to identify which might be holding back progress. Next, teams need the time and space to align, experiment, and reassess. Repeating this process will help teams adopt a mindset and practice of continuous improvement.

Watch out for these and other pitfalls when using these comparisons:

Unlike comparisons. Comparing applications based solely on these clusters is not likely to be useful. Doing so discards the context of each application in ways that might be detrimental to the goal of improving.
Setting metrics as a goal. Ignoring Goodhart’s law and making broad statements like “every application must demonstrate ’elite’ performance by year’s end” increases the likelihood that teams will try to game the metrics.
One metric to rule them all. Attempting to measure complex systems with the “one metric that matters.” Using a combination of metrics to drive deeper understanding.
Narrowly scoped metrics. People tend to measure what is easiest to measure, not what is most meaningful.
Using industry as a shield against improving. For example, some teams in highly regulated industries might use regulations as a reason not to disrupt the status quo.

Goodhart’s law: when a measure becomes a target it ceases to be a good measure.

Summary

This chapter discusses the importance of setting a baseline to measure an application’s current performance across software delivery, operations, and user-centricity. However, these measures alone will not drive improvement. Teams need to assess their capabilities, experiment, and continuously improve. The chapter warns against pitfalls like comparing applications solely based on performance clusters, setting metrics as goals which encourages gaming, relying on a single metric, measuring only what’s easy vs meaningful, and using industry regulations as an excuse not to improve. Goodhart’s law states that when a measure becomes a target, it is no longer a good measure. Introduction

Cluster analyses are performed annually to identify common trends across applications.
Teams should use these analyses for understanding their performance, but avoid fixating on comparisons with other applications.
The best comparisons are those made over time for the same application.
Teams that focus on user needs are better equipped to build the right software in the right way.

Results Software delivery performance is assessed using the following measures:

Change lead time: Time taken for a change to go from committed to deployed.
Deployment frequency: How often changes are pushed to production.
Change failure rate: How often a deployment causes a failure requiring immediate intervention.
Failed deployment recovery time: Time taken to recover from a failed deployment.

Measure	Description
Change lead time	Time taken for a change to go from committed to deployed
Deployment frequency	How often changes are pushed to production
Change failure rate	How often a deployment causes a failure requiring immediate intervention
Failed deployment recovery time	Time taken to recover from a failed deployment

Reducing the batch size of changes for an application is a common approach to improve all four measures. Smaller changes are easier to reason about, move through the delivery process, and recover from if there’s a failure. Teams should aim to make each change as small as possible to ensure a fast and stable delivery process. This approach contributes to both change velocity and change stability.

Summary: This page discusses the importance of cluster analyses in understanding software delivery performance trends across applications. It emphasizes the significance of focusing on user needs and reducing the batch size of changes to improve key performance measures such as change lead time, deployment frequency, change failure rate, and failed deployment recovery time. By making changes as small as possible, teams can enhance both the speed and stability of their delivery process.

Performance level	Deployment frequency	Change lead time	Change failure rate	Failed deployment recovery time	% of respondents
Elite	On demand	Less than one day	5%	Less than one hour	18%
High	Between once per day and once per week	Between one day and one week	10%	Less than one day	31%
Medium	Between once per week and once per month	Between one week and one month	15%	Between one day and one week	33%
Low	Between once per week and once per month	Between one week and one month	64%	Between one month and six months	17%

This table shows the software delivery performance of the survey respondents, categorized into four levels: Elite, High, Medium, and Low. The performance levels are based on four key metrics:

Deployment frequency: How often the organization deploys code changes.
Change lead time: The time it takes from code commit to running in production.
Change failure rate: The percentage of changes that result in degraded service or require remediation.
Failed deployment recovery time: The time it takes to restore service when a change fails.

Elite performers deploy on demand, have less than one day change lead time, a 5% change failure rate, and recover from failures in less than an hour. On the other hand, Low performers deploy between once per week and once per month, have a change lead time of one week to one month, a 64% change failure rate, and take between one to six months to recover from failed deployments.

The majority of respondents fall into the High (31%) and Medium (33%) categories, while 18% are Elite performers and 17% are Low performers. Here is the converted markdown with a summary at the end:

Operational performance

We assessed operational performance by asking respondents how frequently their service does the following:

Receives reports about end users being dissatisfied with the reliability of the system.
Is unavailable, performs slower than expected, or performs incorrectly.

For an exploration of how operational performance predicts organizational performance, see Chapter 5 - Reliability unlocks performance.

User-centricity

A user-centric application or service is built with the end user in mind. Building a product like this requires a good sense of what users need and incorporating that into the product’s roadmap. We assessed respondents’ user-centricity by asking them the extent to which the following are true:

Their team has a clear understanding of what users want to accomplish.
Their team’s success is evaluated according to the value they provide to their organization and to the users of the application.
Specifications (for example, requirements planning) are continuously revisited and reprioritized according to user signals.

Here’s a view into how this year’s survey respondents are doing with user-centricity:

Here’s a view into how this year’s survey respondents are doing with operational performance:

Accelerate State of DevOps 2023 v. 2023-12

Summary

This page discusses two key metrics used to assess the performance of software development teams: operational performance and user-centricity.

Operational performance measures how often a service has reliability issues or performs poorly. User-centricity evaluates how well a team understands and prioritizes user needs when building applications.

The data shows the distribution of scores across respondents. For user-centricity, the middle 50% scored between 5.6 and 8.3 out of 10. For operational performance, the middle 50% scored between 5 and 7.5 out of 10.

Understanding these metrics can help teams identify areas for improvement in delivering reliable, user-focused software. The full report explores how these factors ultimately impact an organization’s overall performance.

Team Type	Software Delivery Performance	Operational Performance	User-Centricity
User-centric	7.5	7.5	7.5
Feature-driven	5.0	5.0	5.0
Developing	2.5	2.5	2.5
Balanced	10.0	10.0	10.0

Summary: This page discusses four types of teams based on their software delivery performance, operational performance, and user-centricity. The team types are User-centric, Feature-driven, Developing, and Balanced. The table shows the mean scores for each team type across the three dimensions on a scale of 0-10. The GOAT diagram visually represents where each team type falls in relation to the three dimensions. Balanced teams score the highest across all dimensions, while Developing teams score the lowest. User-centric and Feature-driven teams fall in between, with User-centric teams leaning more towards operational performance and user-centricity, and Feature-driven teams leaning more towards software delivery performance. Here is the converted text in markdown format:

Think of the performance metrics we’ve been discussing as dials that an organization or team can adjust to change the organizational performance, team performance, and the well-being of the individuals on the team.

The graphs below show the performance outcomes predicted by each team type.

Each team type has unique characteristics, makes up a substantial proportion of our respondents, and has different outcomes. Your own team likely does not fit cleanly into only one, nor would we expect your team type to remain constant over time.

How do you compare?

Team type	Predicted burnout	Predicted job satisfaction
User-centric	5.0	6.5
Feature-driven	5.5	6.0
Developing	6.0	5.5
Balanced	6.5	5.0

Faster code reviews are one of the benefits of loosely coupled teams, leading to significant improvements to software delivery performance and operational performance. There are several paths to improving the efficiency of code reviews:

When the code being reviewed only affects the scope of the team’s architecture, the reviewer has a better understanding of the impact the code will have on the system.
The smaller the code review, the easier it is for the reviewer to understand the implications of the change.
Working in small batches improves the feedback cycle, efficiency, and focus for the team.
Pair programming is a practice that can reduce code review time regardless of current architecture and integration practices.

Additionally, these capabilities and processes don’t show a detrimental impact on the well-being of the individuals doing the work. In fact, most of these predict improvements to the individual’s well-being.

Summary

This page discusses how evaluating and improving your code review process can lead to better software delivery performance. Factors like review duration, batch size, number of teams and locations involved can impact review effectiveness. Improving code review speed builds upon and enhances other technical capabilities like code maintainability, learning culture, and generative culture. Strategies to improve code reviews include having reviewers familiar with the architecture, keeping reviews small in scope, working in small batches, and pair programming. Importantly, improving these capabilities and processes does not negatively impact developer well-being and can actually improve it.

Technical Capabilities Predict Performance

The use of loosely coupled architecture, continuous integration, and efficient code reviews enables teams to improve their organizational outcomes while maintaining and sometimes improving their well-being. When teams have the autonomy to improve and maintain a reliable system that delivers value to their users, they experience improved job satisfaction, team performance, and software delivery performance.

Architecture plays a significant role in a team’s ability to focus on the user and improve their software delivery. By starting small and focusing on the user, teams saw significant improvements across trunk-based development, loosely coupled architecture, continuous integration, continuous delivery, and SRE.

To improve your technical capabilities, provide opportunities for team experimentation and continuous improvement.

Technical capabilities and processes	Effect on burnout*	Effect on job satisfaction	Effect on productivity
AI	Minor decrease	Minor increase	Minor increase
Continuous integration	No effect	Minor increase	No effect
Code review speed	Substantial decrease	Minor increase	Minor increase
Loosely coupled architecture	Substantial decrease	Substantial increase	Substantial increase
Trunk-based development	Substantial increase	No effect	No effect

*You might notice how the color scheme is flipped for burnout. This is because reducing burnout is a good thing!

Summary

This page discusses how technical capabilities and processes can impact team performance and well-being. Loosely coupled architecture, continuous integration, and efficient code reviews are shown to improve organizational outcomes while maintaining or improving team well-being. The autonomy to improve and maintain a reliable, user-focused system leads to better job satisfaction, team performance, and software delivery performance.

The table provided shows the effects of various technical capabilities on burnout, job satisfaction, and productivity. Notably, loosely coupled architecture substantially decreases burnout while substantially increasing job satisfaction and productivity. The page emphasizes the importance of team experimentation and continuous improvement in enhancing technical capabilities. Here is the text converted to markdown format with a summary at the end:

Technical capabilities predict performance

The fundamental tenet of continuous delivery (CD) is to work so that our software is always in a releasable state. To achieve this, we need to work with high quality. That way, when we detect a problem, it is easy to fix, so that we can recover to releasability quickly and easily.

To keep our software in that golden, releasable state, we need to work to establish fast feedback and recover from failures very quickly.

As a reader of this year’s report, I imagine that these ideas are sounding familiar. The metrics of Stability (change failure rate and failed deployment recovery time) are all about quality, and the metrics of Throughput (change lead time and deployment frequency) are all about feedback and ease of detection of any problem.

If you practice CD, then you will be scoring highly on Stability & Throughput. If you have high scores on Stability & Throughput, it is hard to imagine that you aren’t also practicing CD in order to achieve those high scores.

This year’s analysis includes a look at how capabilities drive performance by looking for mediators of each capability. CD—the ability to release changes of all kinds on demand quickly, safely, and sustainably—is a substantial mediator of many technical capabilities. In other words, these capabilities work because they create an environment that makes CD possible. The practice of CD, in turn, provides the mechanism through which these capabilities can predict stronger software delivery performance.

Releasability is an important standard to meet in general for software development, which is why CD emphasizes it. Releasability matters because it is a subjective, but definite, and context-applicable, statement of quality. The degree of rigor that defines releasability might be different if we’re working on safety-critical systems than if we’re writing software for a cake shop. But in both cases, releasability defines that we’ve done everything that we deem necessary to say that this code is ready, good enough, and safe enough for release into the hands of users.

So optimizing to keep our changes releasable is also optimizing for a context-specific definition of minimum acceptable quality for our system.

Teams that prioritize getting and acting on high-quality, fast feedback have better software delivery performance.

Author: Dave Farley
Accelerate State of DevOps 2023
v. 2023-12

Summary

This page discusses how technical capabilities like continuous delivery (CD) predict software delivery performance. The key points are:

CD aims to keep software always in a releasable state by working with high quality
Fast feedback and quick recovery from failures are critical to maintaining releasability
The Accelerate metrics of Stability and Throughput align with the goals of CD
Teams practicing CD will score highly on these metrics
CD mediates the impact of many technical capabilities on performance by enabling releasable, high-quality changes to be delivered quickly and safely
Releasability is a context-specific definition of the minimum acceptable quality needed to deliver software to users
Teams that prioritize fast feedback and action on that feedback perform better at software delivery

In essence, technical practices that enable continuous delivery lead to better performance by allowing teams to frequently release high-quality software. Technical Capabilities Predict Performance

The author expresses surprise that continuous integration (CI) and trunk-based development did not have a bigger impact on software delivery performance, as they seem foundational. The author wonders how high scores on Throughput and Stability can be achieved without these practices, and suggests that CI is a key mediator of software delivery performance.

The table shows the effect of various technical capabilities and processes on software delivery performance, and whether this effect is mediated through continuous delivery:

Continuous integration has a minor increase on performance, completely mediated through continuous delivery.
Code review speed has a substantial increase on performance, partially mediated through continuous delivery.
Loosely coupled architecture has a minor increase on performance, partially mediated through continuous delivery.
Trunk-based development has a minor increase on performance, completely mediated through continuous delivery.

Optimizing Organizational Processes and Capabilities

The author discusses how culture drives success, and what in turn drives culture. From a practitioner perspective, improving day-to-day work practices positively impacts cultural elements like sharing risk, increasing cooperation, and establishing psychological safety.

From a leadership perspective, culture starts with awareness and education on its importance. Transformational leadership can foster a blameless environment that encourages experimentation, learning, and gives trust and voice to practitioners. Engineers need visibility into the business and autonomy to take action in order to solve complex problems.

The author concludes that culture is downstream from leadership, and the best results come from looking at culture from both top-down and bottom-up perspectives.

Summary: This page discusses how technical capabilities like continuous integration and trunk-based development impact software delivery performance, and how this effect is mediated through continuous delivery. It also explores how day-to-day work practices and leadership approaches drive organizational culture, which in turn drives success. The key takeaway is that optimizing both technical processes and organizational culture, from practitioner and leadership perspectives, yields the best outcomes. Here is the converted markdown:

Technical capabilities predict performance

Artificial intelligence (AI)

Some analysts and technologists hypothesize that AI will make software teams more performant without negatively affecting professional well-being. So far our survey evidence doesn’t support this. Our evidence suggests that AI slightly improves individual well-being measures (such as burnout and job satisfaction) but has a neutral or perhaps negative effect on group-level outcomes (such as team performance and software delivery performance). We speculate that the early stage of AI-tool adoption among enterprises might help explain this mixed evidence. Likely, some large enterprises are testing different AI-powered tools on a trial basis before making a decision about whether to use them broadly.

There is a lot of enthusiasm about the potential of AI development tools, as demonstrated by the majority of people incorporating at least some AI into the tasks we asked about. This is shown in the graph below. But we anticipate that it will take some time for AI-powered tools to come into widespread and coordinated use in the industry.

Importance of AI

AI contribution to technical tasks

For the primary application or service you work on, how important is the role of Artificial Intelligence (AI) in contributing to each of the following tasks today?

Task	Selected “Extremely Important”	Did NOT select “Not at all important”
Analyzing data	26%	100%
Writing code clocks or data functions	20%	80%
Analyzing security	18%	60%
Learning new skills	16%	40%
Optimising code	14%	20%
Analyzing logs	12%	0%
Monitoring logs	10%
Identifying bugs	8%
Writing tests	6%
Organizing user feedback	4%
Writing documentation	2%
Making decisions	0%
Scailing running services
Collaborating with my teammates
Responding to incidents
Managing projects
Managing my coding environment
Solving file path issues
Recovering from incidents

*Interval provided for each graph represents 89% credibility interval. Provided to show inherent uncertainty in our estimates.

We are very interested in seeing how adoption grows over time and the impact that growth has on performance measures and outcomes that are important to organizations.

Summary

This page discusses the impact of artificial intelligence (AI) on software development teams. Early survey evidence suggests AI slightly improves individual well-being like reducing burnout, but has a neutral or negative effect on team performance. This may be because enterprises are still in early stages of testing AI tools.

There is enthusiasm for AI in development, with most people using at least some AI for various tasks. However, it will likely take time for AI tools to become widely adopted in a coordinated way across the industry.

The most common tasks where AI is seen as extremely important are analyzing data, writing code, and analyzing security. Tasks like recovering from incidents and solving file path issues had the lowest importance of AI.

The authors are interested to track how AI adoption grows over time and the impact it has on key performance measures for organizations.

Documentation is foundational

Takeaways

Quality documentation is foundational. It drives the successful implementation of technical capabilities and amplifies the impact those capabilities have on organizational performance.
Documentation has a positive impact on outcomes, such as team performance, productivity, and job satisfaction.
However, increasing documentation quality doesn’t lead to better well-being for everyone: as the quality of documentation increases, some respondents report increased levels of burnout.

Introduction

This year we look deeper at internal documentation—the written knowledge that people in the organization use day-to-day. We investigate the impact of documentation on technical capabilities and on key outcomes.

To measure documentation quality, we measured the degree to which documentation is:

Reliable
Findable
Updated
Relevant

We then calculate one score for the entire documentation experience. We’re not evaluating documentation page-by-page, but as a whole.

Summary

This chapter highlights the importance of quality documentation in driving successful implementation of technical capabilities and amplifying their impact on organizational performance. Good documentation also positively impacts team outcomes like productivity and job satisfaction. However, the chapter notes that increasing documentation quality can lead to increased burnout for some individuals. The key takeaway is that documentation is foundational, but organizations need to be mindful of potential negative impacts on well-being as they improve their documentation practices.

Results

Documentation is foundational: it drives and amplifies technical capabilities. As found in previous years, documentation quality continues to drive the successful implementation of every single technical capability studied.

The table below shows how documentation quality amplifies the impact of each technical capability on organizational performance:

Technical capability	Amplification of impact on organizational performance
Continuous integration	2.4x
Continuous delivery	2.7x
Trunk-based development	12.8x
Loosely coupled architecture	1.2x
Reliability practices	1.4x
Artificial intelligence contribution	1.5x

Quality documentation also drives key outcomes, affecting team performance, organizational performance, and operational performance:

Key outcomes	Effect of quality documentation
Team performance	Substantial increase
Organizational performance	Substantial increase
Software delivery performance	No effect
Operational performance	Substantial increase

In addition to improving technical capabilities, quality documentation has a positive impact on an individual’s well-being:

Aspects of well-being	Effect of quality documentation
Burnout	Substantial decrease
Job satisfaction	Substantial increase
Productivity	Substantial increase

Some of this impact is attributed to quality documentation increasing knowledge sharing. It’s easier to get work done and less frustrating when knowledge is readily available.

Summary

This page highlights the importance of quality documentation in driving and amplifying technical capabilities, as well as its positive impact on key outcomes and individual well-being. The data shows that documentation quality substantially increases team performance, organizational performance, and operational performance, while reducing burnout and increasing job satisfaction and productivity. The amplification effect of documentation quality on various technical capabilities is also quantified, with trunk-based development seeing the highest amplification at 12.8x. Surprisingly, there is no observed effect of quality documentation on software delivery performance for the second year in a row. Here is the converted text in markdown format:

What’s behind this positive effect on the three key outcomes?

As a reader, using clear documentation is beneficial. The writing process might be a factor as well. Creating high-quality documentation requires teams to decide on processes in the first place. Documentation can force teams across an organization to explicitly discuss and get aligned on what to do and how to do it.

Quality documentation also acts as a repository for team knowledge, even as people come and go. It helps knowledge scale, both throughout the organization and through time.

Is documentation tied to decreased well-being for some people?

We noticed an unexpected trend when we looked at respondents who identify as underrepresented. For this group, documentation quality is tied to an increase in burnout.

For this finding, we also looked at gender, and were surprised to find no effect. Respondents who identified as male, female, or self-described their gender all saw a significant reduction in burnout with high-quality documentation. However, people who identified as underrepresented, regardless of gender identity, noted a higher rate of burnout in the presence of quality documentation.

The following graph shows simulated predictions based on our data. In the lower set, we see burnout decrease for the majority of respondents as documentation quality increases. However, in the higher set, we see burnout significantly increase for individuals who identify as underrepresented.

This graph shows 1,000 simulated lines for each group. More densely packed lines mean that the slope is more likely given our data.

Summary

This page discusses the positive effects of high-quality documentation on key outcomes in organizations. It helps align teams, acts as a knowledge repository, and allows knowledge to scale.

However, the data shows an unexpected trend for underrepresented groups. While documentation quality reduces burnout for most respondents, it is tied to increased burnout for underrepresented individuals. This effect is seen regardless of gender identity.

The graph illustrates this with simulated predictions. The densely packed lines for the underrepresented group show a positive slope, indicating higher burnout as documentation quality increases, in contrast to the negative slope for the majority group.

The Impact of Documentation Quality on Burnout

This finding is similar for documentation quality, generative culture, and team stability: as these attributes increase, burnout also increases for people who identify as underrepresented. For documentation, what’s going on?

It takes work to create and maintain high-quality documentation. This is technical work, with significant impact on technical capabilities, team productivity, and organizational performance. It’s also work that might not be consistently recognized for the importance and impact that it has. Are people who identify as underrepresented doing a disproportionate amount of this work, and if so, does this work help explain the effect on burnout?

Could the reliance on using documentation be problematic? With increased documentation quality, does knowledge sharing not increase for some respondents? Or, if it does increase, is it not enough to counteract other aspects that lead to burnout for this group?

It’s possible that something else entirely is at play that drives quality documentation but also creates or maintains burnout for respondents who identify as underrepresented. More research is needed.

It seems that who you are on the team matters. Aspects of the workplace like quality documentation have significant benefits to the team and to the overall organization. But they might also be tied to negative outcomes for some individuals. We explore this more in Chapter 8 - How, when, and why who you are matters.

Resources to Get Started

See the 2021 report for practices that drive quality documentation. This year, we also found that work distribution, including formal processes to distribute documentation work, significantly increase the quality of documentation.

Lots of resources and training exist for technical writing. You can learn more from these resources:

Society for Technical Communications (stc.org)
Technical Writing Courses for Engineers (developers.google.com/tech-writing)
Write the docs (writethedocs.org)

Documentation is foundational.

Summary

This page discusses the relationship between documentation quality and burnout, particularly for underrepresented groups in tech. As documentation quality increases, burnout also increases for these groups. The authors suggest that underrepresented individuals may be doing a disproportionate amount of documentation work, which is often undervalued. They also propose that reliance on documentation may not be enough to counteract other factors leading to burnout. The page emphasizes the importance of documentation and provides resources for improving technical writing skills. It concludes by stating that while quality documentation benefits the team and organization, it may have negative impacts on certain individuals. Here is the text converted to markdown format with a summary at the end:

Reliability unlocks performance

Takeaways

Strong reliability practices predict better operational performance, team performance, and organizational performance. The data shows that the effects of improving these practices follow a nonlinear path—that is, there might be times when performance improvements seem to stall as organizations build stronger capabilities. However, over time, staying committed to these practices still predicts good outcomes.

Introduction

Reliability is a widely used term in the IT operations space. We define reliability as the extent to which a service meets its stated goals for measures like availability, performance, and correctness. A common approach to achieve reliability outcomes is SRE, which originated at Google (https://sre.google) and is now practiced in many organizations. SRE prioritizes empirical learning, cross-functional collaboration, extensive reliance on automation, and the use of measurement techniques, including service level objectives (SLOs). Many organizations use reliability practices without referring to them as SRE; alternative terms include Production Engineering, Platform Teams, Infrastructure Teams, TechOps, and others. In order to assess the extent of these practices as objectively as possible, our survey uses neutral, descriptive language in the survey text.

We also collect data on the outcomes of reliability engineering—the extent to which teams are able to achieve their reliability targets. Both reliability practices and reliability outcomes (which we refer to as operational performance) are reflected in our predictive model alongside other capabilities.

Reliability practices

We asked respondents to think about reliability by having them think through three essential aspects of their operations:

Do they have mitigation plans for their dependencies?
Do they regularly test their disaster recovery plans through either simulated disruptions, practical failovers, or table-top exercises?
When they miss their reliability targets, do they perform improvement work or otherwise reprioritize and adjust their work?

Summary

This page discusses the importance of reliability practices in IT operations. Reliability refers to how well a service meets its goals for availability, performance, and correctness. Site Reliability Engineering (SRE) is a common approach that originated at Google and focuses on empirical learning, collaboration, automation, and measurement. The survey assessed reliability practices by asking if teams have mitigation plans, regularly test disaster recovery, and make improvements when missing targets. Strong reliability practices are shown to predict better operational, team, and organizational performance, though improvements may follow a nonlinear path over time. Here is the text converted to markdown format with a summary at the end:

We think these measures encapsulate the spirit of a team that follows established SRE principles such as “embracing risk” and “measuring user happiness.” Such a team sets a reasonable goal that aligns with user happiness. They then perform tests to ensure they’re able to meet that goal, but they change plans if they’re having trouble. We use this as a proxy for a team that’s successfully “doing SRE” without tying assessments of teams to particular SRE implementations.

Results

Confirming the J-curve of reliability practices

Since 2018, DORA has theorized that there’s a nonlinear relationship (Figure 1) between operational performance and practices like automation. As we’ve deepened our explorations into reliability practices, we’ve seen evidence of this pattern in the survey data.

In 2022 we measured this directly. We surveyed teams and observed that the relationship between reliability practices and reliability outcomes did indeed follow this type of non-linear curve (Figure 2). This suggested that teams saw significant reliability gains only after they adopted many reliability practices. But seeing the data in this way didn’t feel like we were seeing the full picture. The 2022 curve made it feel like SRE is only for experts or otherwise not worth investing in, which conflicts with the experience of many SRE teams. We needed more data.

Figure 1: 2018 hypothetical J-curve

Stage	Description
1	Teams begin transformation and identify quick wins.
2	Automation helps low performers progress to medium performers.
3	Automation increases test requirements, which are dealt with manually. A mountain of technical debt blocks progress.
4	Technical debt and increased complexity cause additional manual controls and layers of process around changes, slowing work.
5	Relentless improvement work leads to excellence and high performance! High and elite performers leverage expertise and learn from their environments to see jump in productivity.

Summary: This page discusses how DORA has theorized and observed a non-linear relationship between reliability practices and outcomes, which they call the “J-curve”. In 2022, they directly measured this and found that teams only saw significant reliability gains after adopting many practices. However, this made it seem like SRE is only for experts, which conflicts with many teams’ experiences. The hypothetical J-curve shows the stages teams go through, from beginning transformation to becoming high performers through relentless improvement. More data is needed to get the full picture. Here is the converted text in markdown format:

Figure 2: 2022 curve

In 2023, we were able to ask more questions, which helped better define a curve that more closely matches our lived experiences. The new curve is closer to the hypothetical J-curve of transformation described in the 2018 report (see the Methodology section for more on how we perform our analysis). This suggests that there are indeed early wins in adopting reliability practices, followed by a lull as complexity introduces new challenges, and then finally another uptick in operational performance. The results reinforce what we’ve seen with many teams.

This curve matters for a few reasons:

It helps companies rationalize and fund initial SRE adoption, even if they’re not looking for extreme levels of reliability or don’t expect to significantly invest in SRE. Adopting even small levels of reliability practices can result in operational performance improvements, which has further beneficial effects on team performance and organizational performance.
It prepares companies who are looking to heavily invest in reliability to stick it out through the lull. It can be tempting to expect linear positive results from long-term investment in SRE, but the data tells us that isn’t the case. When teams know about the nonlinearity of this curve ahead of time, they can make a decision about whether to make this investment, and they can plan ahead to ensure they don’t abandon it before realizing the full benefits.
Changes like this might require cultural transformation. We have found success comes from a combination of bottom-up and top-down change. Teams can adopt reliability practices and reap the immediate rewards, then those benefits can be shown off to other teams, reinforced and incentivized by leadership. These incentives and structured programs can be designed with the J-curve in mind.

Figure 3: 2023 curve

Summary: This page discusses how adopting Site Reliability Engineering (SRE) practices can improve a company’s operational performance and reliability. A graph shows that there are early benefits to adopting SRE, followed by a temporary dip as complexity increases, and then further improvements in the long run. The key takeaways are that even small investments in SRE can yield benefits, companies need to be prepared to stick with SRE through the temporary dip to realize the full long-term gains, and successfully implementing SRE often requires both bottom-up adoption by teams and top-down support from leadership to transform the company culture. Here is the converted text in markdown format:

Reliability practices and well-being

Traditional operations practices are highly reactive and often more concerned with the health of the technical system than with the happiness of its users. On-call alerts for things that don’t impact users’ experiences, repetitive manual tasks, fear of making mistakes, and similar experiences lead to burnout and poor well-being for individuals on the team.

We see the opposite in teams that leverage reliability practices. Teams report higher productivity and job satisfaction and lower levels of burnout than their counterparts who are not using these practices. We suspect that these improvements in well-being are driven by some published SRE practices:

Reducing toil
Blameless postmortems
Team autonomy
Sublinear scaling of teams

Operational performance

We also asked respondents to describe the operational performance of their service. First, we asked how frequently they hear directly from their users about dissatisfaction in the reliability of their service.

Next, we asked them how often their service is unavailable, slow, or otherwise operating incorrectly.

Reliability practices amplify team and organizational performance, through operational performance

By adopting reliability practices, teams improve their operational performance. If an organization is able to operate its production fleet effectively, we found that this amplifies other outcomes. If the outcomes are high, reliability practices will make them higher. If outcomes are low, reliability practices will not help, they will just stay that way.

Reliable systems still need to have the right software capabilities for your customers, delivered effectively. This makes sense because SRE was never intended to operate in a vacuum. Meeting reliability targets is a key metric of success for SRE teams, and this is reflected in operational performance. Although there are likely other benefits to the use of reliability practices, the data suggests that the most critical one is the impact on operational performance. Furthermore, increased operational performance has benefits beyond service health; in fact, we see evidence that the use of reliability practices predicts greater well-being for practitioners.

Operational performance affects well-being

A common industry perception is that highly reliable services have a negative impact on the well-being

Reliability unlocks performance

Summary

This page discusses how adopting reliability practices can improve the well-being of teams and amplify their operational performance. Traditional operations practices often prioritize system health over user happiness, leading to burnout. In contrast, teams using reliability practices like reducing toil, blameless postmortems, team autonomy, and sublinear scaling report higher productivity, job satisfaction, and lower burnout.

The authors surveyed respondents about their service’s operational performance, asking how often users express dissatisfaction and how frequently the service is unavailable or operating incorrectly. They found that adopting reliability practices improves operational performance, which in turn amplifies other positive outcomes. However, reliable systems still need the right software capabilities delivered effectively.

The data suggests that the most critical benefit of reliability practices is their impact on operational performance, which has additional benefits like greater well-being for practitioners. This challenges the common perception that highly reliable services negatively affect well-being. Here is the text converted to markdown format with a summary at the end:

We found that high operational performance actually results in lower burnout, better productivity, and higher job satisfaction. This aligns with the SRE principle of reducing toil; automating the manual parts of operations is satisfying for individuals and also results in reduced ongoing burden for the team.

Organizational performance and team performance amplify operational performance

We found that operational performance has a substantial positive impact on both team performance and on organizational performance. This shouldn’t be a surprise to followers of the DevOps movement. Being able to operate the machine effectively allows teams to achieve more, which allows organizations to thrive.

Operational performance amplifies software delivery performance

While software delivery performance can improve both team performance and organizational performance, they are both significantly enhanced by operational performance. Moreover, high-performing software delivery teams won’t achieve very high team performance and organizational performance without also achieving high operational performance. Both are needed. In fact, teams that improve their software delivery performance without corresponding levels of operational performance end up having worse organizational outcomes. So, if you can quickly write stunning software but it fails to run in production in a way that meets its audience’s expectations, there won’t be any reward from the marketplace.

What’s missing, what’s next?

We believe there are more measurements that can help us understand these interactions. For example, a common question this year has been how cost management plays into these capabilities and outcomes. Some organizations are more cost-sensitive than others, and this has implications for how the organization makes plans and decisions. Similarly, we theorize that reliability practices might emerge from highly collaborative cultures, even without being explicitly sought after or planned for. We want to get a better understanding of how teams evolve their existing IT operations practices, and how that evolution affects system reliability, team performance, and well-being.

Mostly, we want to hear from you. Come join us and other practitioners at DORA.community. SRE is still a new field. Its impact is different in every organization that adopts reliability practices, or in organizations that realize that they have been doing SRE this whole time. These changes are slow, and we want to make consistent measurements to show progress over time. As a community, we can share what works, elevating each other along the way.

Summary

This page discusses how high operational performance leads to better outcomes for teams and organizations, such as lower burnout, higher productivity, and greater job satisfaction. Operational performance amplifies the benefits of software delivery performance. The authors believe there are additional factors to measure, such as cost management and how reliability practices emerge from collaborative cultures. They invite readers to join the DORA community to share insights as SRE practices continue to evolve and impact organizations in different ways over time. Summary:

Google’s Site Reliability Engineering (SRE) has evolved over two decades to enable the company’s rapid growth while maintaining high reliability. SRE was developed in an engineering-led environment that valued building over buying, and SREs were entrusted with scaling systems in novel ways.

As Google grew, the number of SREs could not scale linearly with the number of users, servers, clusters, or services. To address this, a management structure was developed where SREs aligned with product development teams and worked together to decide how best to utilize the available SREs. This approach allowed Google to maintain a constrained growth model for SRE while still supporting the company’s expanding product offerings and customer base.

Scaling SRE Teams at Google

Google’s SRE teams scaled in various ways to support the growing infrastructure and services:

Dev teams could fund new SREs directly, even if they were not customer-facing teams.
Shared infrastructure teams (e.g., Bigtable, Borg, Colossus) allowed customer-facing teams to scale without dedicated SRE teams.
SRE teams maintained a consistent hiring and promotion process by staying in their own organization.
SRE teams developed their own internal products with internal product managers to improve production operations.
New on-call SRE teams had a minimum size of 12 (two sites of 6 members) to ensure cross-timezone coverage and work-life balance.

SRE teams continue to adapt and evolve, but they stick to their core principles:

Embracing risk
Measuring service levels
Eliminating toil
Embracing automation
Striving for simplicity

Summary

Google’s SRE teams scaled by allowing dev teams to fund new SREs directly, leveraging shared infrastructure teams, maintaining consistent hiring and promotion processes, developing internal products, and ensuring team health and sustainability. Despite adaptations and changes, SRE teams continue to adhere to their core principles of embracing risk, measuring service levels, eliminating toil, embracing automation, and striving for simplicity.

Flexible Infrastructure is Key to Success

Takeaways

Flexible infrastructure is a predictor of team performance, organizational performance, operational performance, and software delivery performance.
Cloud computing is a core enabler of flexible infrastructure, but the benefit isn’t automatically realized: how you use the cloud is the important part.

Introduction

Throughout DORA’s research, they’ve asked practitioners about their infrastructure by focusing on the essential characteristics of cloud computing, as defined by the National Institute of Standards and Technology (NIST):

On-demand self-service
Broad network access
Resource pooling
Rapid elasticity
Measured service

DORA has consistently seen that these five characteristics predict improved organizational performance and improved software delivery performance. This year they wanted to see if using cloud computing predicted more flexible infrastructure.

Statistic	Value
Flexible infrastructures predict higher organizational performance than inflexible infrastructures	30%

Summary

This page discusses the importance of flexible infrastructure in predicting various aspects of organizational performance, including team performance, operational performance, and software delivery performance. Cloud computing is identified as a key enabler of flexible infrastructure, but the benefits are not automatic - it depends on how the cloud is used.

The page introduces the five essential characteristics of cloud computing as defined by NIST, and notes that DORA’s research has consistently shown these characteristics to predict improved organizational and software delivery performance. This year, DORA specifically looked at whether using cloud computing predicted more flexible infrastructure.

The key statistic presented is that flexible infrastructures predict 30% higher organizational performance compared to inflexible infrastructures.

Computing Environments and Their Impact on Performance

The study confirmed previous findings that how a team uses the cloud is a stronger predictor of performance than simply using the cloud. While cloud computing can be a powerful enabler, it doesn’t automatically produce benefits. In fact, there are strong indicators that public cloud leads to decreased software and operational performance unless teams make use of flexible infrastructure. This finding suggests that simply “lifting and shifting” workloads from a data center to the cloud is not beneficial and can even be detrimental.

The use of cloud computing is associated with substantial benefits, including:

Decreased burnout
Increased job satisfaction
Increased productivity

The table below shows where respondents said their primary application or service is running:

Computing Environment	Percentage
Multi-cloud	19.6%
Public cloud	51.5%
Hybrid cloud	33.6%
On premises	19.8%
Under the desk	3.2%
Other	2.5%

Note: Respondents were able to select multiple answers.

The key takeaway is that flexible infrastructure is crucial to success when using cloud computing.

Summary: This page discusses the impact of different computing environments on team performance and well-being. The study found that how a team uses the cloud is more important than simply using the cloud. Flexible infrastructure is essential for realizing the benefits of cloud computing, such as decreased burnout and increased job satisfaction and productivity. The table shows the percentage of respondents using various computing environments, with public cloud being the most common.

40 Flexible infrastructure is key to success

Cloud type and Organizational performance

Cloud Type	Software delivery performance	Operational performance
Private	No sign of impact	Substantial increases associated with using cloud computing
Public	Very substantial increases associated with using cloud computing	Substantial decreases associated with using cloud computing
Hybrid	No sign of impact	No sign of impact
Multi	No sign of impact	Substantial decreases associated with using cloud computing

Simply “using cloud” provides mixed results. As the results table shows, simply “using cloud” has either neutral or negative impacts on software delivery and operational performance. This neutral-to-negative impact is likely the result of practitioners who have taken their first step on their cloud journey, and are now faced with working in a new environment, working with new tools, and doing some things differently. Often, companies use cloud in the same way they did in their own data centers, only with the added complexities and cognitive burden of a new environment. Failing to adapt to this new environment doesn’t improve software delivery or operational performance, but instead hurts them.

The one exception to this finding is that of operational performance within the context of private cloud.

What does improve software delivery and operational performance is flexible infrastructure which we will discuss shortly.

Summary

This page discusses the impact of different cloud computing models on software delivery and operational performance in organizations. Simply using cloud computing does not necessarily lead to improvements. Private clouds can substantially increase operational performance, but have no impact on software delivery. Public clouds can greatly improve software delivery performance, but decrease operational performance. Hybrid and multi-cloud show no impact or decreases.

The key takeaway is that just adopting cloud is not enough - companies need to adapt their practices to the new cloud environment to see benefits. What really drives improvement is having a flexible cloud infrastructure, a topic that will be covered next. Cloud infrastructure enables flexibility

Using a public cloud leads to a 22% increase in infrastructure flexibility compared to not using the cloud. Using multiple clouds also led to an increase, but less than using a single public cloud. The data suggests that flexible infrastructure, often enabled by cloud computing, is more impactful than just using a cloud platform. Mastering cloud platforms takes time, and each platform is different, so increasing the number of cloud platforms increases the cognitive burden required to operate each platform well.

Type of cloud computing	Change in infrastructure flexibility relative to non-cloud users
Public	1.0
Private	0.5
Multi	0.5
Hybrid	0.5

Summary: This page discusses how using cloud infrastructure, particularly public clouds, can significantly increase an organization’s infrastructure flexibility compared to not using the cloud. However, using multiple cloud platforms can be less beneficial than using a single public cloud due to the increased complexity and learning curve associated with managing different platforms. The key takeaway is that flexible infrastructure, which is often enabled by cloud computing, is more important for success than simply using a cloud platform. Here is the converted markdown with a summary at the end:

Flexible infrastructures predict higher performance on key outcomes

Cloud computing has a positive impact on key outcomes through flexible infrastructure. It is important to recognize that flexible infrastructure drives success in organizational performance, team performance, software delivery performance, and operational performance. Many organizations choose to lift and shift infrastructure to the cloud, and this can be a great first step, but it is just the beginning of the journey. If you do decide to lift and shift a portion of your workloads, your next step is to modernize them by refactoring to make use of flexible infrastructure.

Flexible infrastructure is key to success

Cloud type with flexible infrastructure	Organizational performance	Team performance	Software delivery performance	Operational performance
Private	Partially mediated by flexible infrastructure	Partially mediated by flexible infrastructure	Partially mediated by flexible infrastructure	Partially mediated by flexible infrastructure
Public	Fully mediated by flexible infrastructure	Fully mediated by flexible infrastructure	Partially mediated by flexible infrastructure	Partially mediated by flexible infrastructure
Hybrid	Partially mediated by flexible infrastructure	Fully mediated by flexible infrastructure	Partially mediated by flexible infrastructure	Fully mediated by flexible infrastructure
Multi	Partially mediated by flexible infrastructure	Partially mediated by flexible infrastructure	Partially mediated by flexible infrastructure	Partially mediated by flexible infrastructure

Accelerate State of DevOps 2023 v. 2023-12

Summary

This page discusses how having a flexible cloud infrastructure positively impacts key performance metrics for organizations. Moving to the cloud alone is not enough - the infrastructure needs to be modernized and made flexible to see substantial improvements in organizational, team, software delivery, and operational performance.

The type of cloud (private, public, hybrid, multi) and the extent to which the infrastructure is flexible determines how much the cloud migration improves each performance area. In general, more flexible infrastructures lead to better outcomes. Organizations should view moving to the cloud as the first step, with modernizing the infrastructure as the crucial next phase.

Cloud Computing and Flexible Infrastructure

Cloud computing platforms can have a positive impact on software delivery and operational performance when used in a way that maximizes their flexible infrastructure characteristics. However, simply shifting workloads from a data center to the cloud does not guarantee success. The key is to take advantage of the flexible infrastructure that cloud enables.

To maximize the potential benefits of cloud computing, organizations must rethink how they build, test, deploy, and monitor their applications. This involves taking advantage of the five key characteristics of cloud computing:

On-demand self-service
Broad network access
Resource pooling
Rapid elasticity
Measured service

The impact of cloud computing on outcomes depends on how it is implemented:

Infrastructure Type	Outcomes
Cloud coupled with flexible infrastructure	Positive impact on software delivery and operational performance
Cloud without flexibility	Less significant impact on outcomes

Summary

Cloud computing platforms can significantly improve software delivery and operational performance when used in a way that leverages their flexible infrastructure characteristics. However, simply moving to the cloud is not enough; organizations must rethink their application development and management processes to take full advantage of the five key characteristics of cloud computing. The impact of cloud computing on outcomes depends on whether it is implemented with flexibility or not.

Cloud computing improves well-being

The data shows that cloud computing is largely beneficial to employee well-being. We see a substantial increase in both job satisfaction and productivity, and a neutral or positive impact on burnout. Said another way, cloud doesn’t have a detrimental effect on well-being even though cloud computing comes with additional cognitive burden, learning new tools, and new ways of working.

Flexible infrastructure is key to success

Cloud type	Burnout*	Job satisfaction	Productivity
Private	No sign of impact	Substantial increases associated with using cloud computing	Substantial increases associated with using cloud computing
Public	Very substantial decreases associated with using cloud computing	Substantial increases associated with using cloud computing	Substantial increases associated with using cloud computing
Hybrid	No sign of impact	Substantial increases associated with using cloud computing	Substantial increases associated with using cloud computing
Multi	No sign of impact	Substantial increases associated with using cloud computing	Substantial increases associated with using cloud computing

*You might notice how the color scheme is flipped for burnout. This is because reducing burnout is a good thing!

As practitioners ourselves, we hypothesize a few reasons why we are seeing this. Engineers like learning and solving problems, and enjoy working in an environment with flexible computing characteristics. Learning new technologies is not only fun, but is a great form of career development. Engineers are happier when their organization is succeeding.

Summary

This page discusses how cloud computing positively impacts employee well-being in terms of job satisfaction, productivity, and burnout. The data shows that using different types of cloud computing infrastructure (private, public, hybrid, multi) is associated with substantial increases in job satisfaction and productivity. For public cloud specifically, it is also linked to very substantial decreases in burnout.

The authors hypothesize this is because engineers enjoy the challenge of learning new technologies and problem-solving in a flexible computing environment. Developing new skills is engaging and good for career growth. Overall, despite the cognitive demands of adopting cloud computing, it appears to make engineers happier, especially when it helps their organization succeed.

None of this works without investing in culture

Takeaways

Culture is a key driver of employees’ well-being and organizational performance.
A healthy culture can:
- Reduce burnout
- Increase productivity
- Increase job satisfaction
A healthy culture leads to meaningful increases in:
- Organizational performance
- Software delivery and operational performance
- Team performance
A healthy organizational culture can help teams be more successful at implementing technical capabilities associated with improved outcomes.

Summary

This page emphasizes the importance of investing in a healthy organizational culture. It highlights that culture is a crucial factor in promoting employee well-being and driving organizational performance. A positive culture can mitigate burnout, boost productivity, and enhance job satisfaction. Moreover, a healthy culture contributes to significant improvements in various aspects of an organization, including overall performance, software delivery, operational efficiency, and team effectiveness. The page also suggests that a supportive organizational culture enables teams to successfully implement technical capabilities that are linked to better outcomes. Westrum’s Typology of Organizational Culture

Aspect	Definition
Westrum’s organizational culture	How an organization tends to respond to problems and opportunities. There are three types of culture: generative, bureaucratic, and pathological.
Organization stability	How stable or unstable the environment is for employees.
Job security	How often employees worry about their job security.
Flexibility	How one works, where one works, and when one works.
Knowledge sharing	How ideas and information spread across an organization. Team members answer questions once, and the information is available to others. People don’t have to wait for answers.
User-centrism	A focus on the end user when developing software and a deep understanding of users’ needs and goals. User signals are used to make products and services better.
Work distribution	Formal processes that help teams distribute burdensome tasks equitably across its members.

Summary: This page introduces the concept of organizational culture and its importance in predicting performance. The authors focus on Westrum’s typology, which categorizes culture into three types: generative, bureaucratic, and pathological. The page also lists several aspects that contribute to team and organizational culture, such as stability, job security, flexibility, knowledge sharing, user-centrism, and work distribution. The authors believe that culture and practices are intertwined and that investing in culture is crucial for success. Here is the converted text in markdown format with a summary at the end:

What did we find and what does it mean?

Healthy culture improves key outcomes

Overall, a healthy culture has a positive impact on all key outcomes. We replicate previous years’ findings that a generative culture drives organizational performance, software delivery performance, and operational performance. It also drives this year’s new performance metric: team performance.

We found that a user-centered approach to software development leads to meaningful increases in performance. This is worth highlighting. Organizations can experience a cascade of benefits when they put the user first. User feedback helps teams prioritize projects and helps them create products and services that meet user needs. This leads to a better user experience, increased user satisfaction, and increased revenue.

We also assessed the health of an organization’s culture by measuring work distribution across teams. We found that equitable work distribution benefits team and organizational performance. However, we found that equitable work distribution was associated with lower software delivery performance. Perhaps formal processes around work distribution slow the completion of burdensome tasks that are part of the software delivery pipeline. It’s also possible that formal processes impact who within the team should take on a given task.

Another seemingly incongruent finding is that organization stability shows a small but significant decrease in software delivery performance. A potential explanation is that more established (and likely larger) organizations don’t feel pressure to move as fast as newer, less established (and smaller) organizations. More established organizations might already have an established product, which gives them flexibility around the speed of their software delivery.

When information flows easily, things get done. We found that higher levels of information sharing were associated with increased software delivery performance and operational performance. When information is readily accessible and when there are few knowledge silos, people can spend time on tasks that matter instead of chasing information needed to perform those tasks.

Finally, flexible work arrangements, where employees can determine when, where, and how they work, have a beneficial impact across all performance metrics. This is particularly true for software delivery performance. Even as organizations tighten their remote-work policies, allowing employees to maintain some flexibility is likely to have a benefit.

Teams with generative cultures have	30% higher organizational performance than teams without

None of this works without investing in culture Accelerate State of DevOps 2023 v. 2023-12

Summary

This page discusses the findings of a study on how a healthy organizational culture impacts key performance metrics. The study found that a generative culture, which is supportive and collaborative, leads to better organizational, software delivery, operational, and team performance.

A user-centered approach to software development and equitable work distribution among teams also improves performance, although equitable work distribution may slow down software delivery. More established organizations may prioritize speed less than newer ones.

Easy information sharing and flexible work arrangements also boost performance across all metrics. The key takeaway is that investing in a healthy culture is essential for organizations to succeed.

The Impact of Cultural Aspects on Performance

Aspect of culture	Effect on team performance	Effect on organizational performance	Effect on software delivery performance	Effect on operational performance
Westrum’s organizational culture	Substantial increase	Substantial increase	Substantial increase	Substantial increase
Organization stability	Minor increase	Substantial increase	Minor decrease	No effect
Job security	Minor increase	No effect	Minor increase	Minor increase
Flexibility	Minor increase	Minor increase	Substantial increase	Minor increase
Knowledge sharing	Minor increase	Minor decrease	Substantial increase	Substantial increase
User-centrism	Substantial increase	Substantial increase	Minor increase	Substantial increase
Work distribution	Substantial increase	Substantial increase	Substantial decrease	No effect

Summary

This page discusses the impact of various cultural aspects on different types of performance within an organization. Westrum’s organizational culture, which likely refers to a model of organizational culture developed by Ron Westrum, has a substantial positive effect across all performance categories. Other aspects, such as organization stability, job security, flexibility, knowledge sharing, user-centrism, and work distribution, have varying degrees of impact on team, organizational, software delivery, and operational performance. The key takeaway is that investing in culture is essential for these aspects to have a positive influence on performance.

Healthy culture improves technical capabilities

Our findings suggest that good culture helps improve the implementation of technical capabilities. We believe the relationship between culture and technical capabilities is reciprocal: culture emerges from practices, and practices emerge from culture.

Culture is broad and hard to define, while technical capabilities are usually scoped and well-defined. This has implications for how individuals within an organization can help drive change.

For example, leaders can create incentive structures that promote a generative culture. Both leaders and individual contributors can emphasize a user-centered approach to software development. Individual contributors can help drive the implementation of technical capabilities that improve performance—trunk-based development, continuous integration, reliability practices, and loosely coupled architecture. Implementing these technical capabilities is not easy, and successfully doing so requires people to work together, to have an open mind, and to lean on and learn from each other. These are all components of a healthy culture. These teams can become examples for others within the organization, who might feel more empowered to drive change using whatever levers they have within their grasp.

Long-lasting and meaningful changes to an organization’s culture come about through concurrent top-down and bottom-up efforts to enact change.

Aspect of culture	Effect on trunk-based development	Effect on reliability practices	Effect on continuous integration	Effect on continuous delivery	Effect on loosely coupled architecture
Westrum’s organizational culture	Substantial increase	Substantial increase	Substantial increase	Substantial increase	Substantial increase
Organization stability	Minor increase	Substantial increase	No effect	No effect	No effect
Job security	Minor decrease	Minor decrease	No effect	No effect	No effect
Flexibility	No effect	Minor decrease	Substantial increase	Minor increase	Substantial increase
Knowledge sharing	No effect	No effect	No effect	Minor increase	Minor increase
User-centrism	Substantial increase	Substantial increase	Substantial increase	Substantial increase	Substantial increase
Work distribution	Substantial increase	Substantial increase	Substantial increase	Substantial increase	Substantial increase

Summary

This page discusses how a healthy organizational culture can help improve the implementation of technical capabilities like trunk-based development, continuous integration, reliability practices, and loosely coupled architecture. The relationship between culture and technical capabilities is seen as reciprocal - culture emerges from practices and vice versa.

The page suggests that leaders can promote a generative culture through incentive structures, while both leaders and individual contributors should emphasize a user-centered approach. Implementing technical capabilities requires collaboration, open-mindedness, and learning from each other, which are all components of a healthy culture.

The table shows how different aspects of culture (like Westrum’s organizational culture, flexibility, user-centrism etc.) can have substantial or minor effects on the implementation of various technical capabilities. Meaningful cultural changes require both top-down and bottom-up efforts within the organization.

Healthy culture improves employee well-being

A healthy culture leads to high levels of employee well-being by reducing burnout, increasing job satisfaction, and increasing productivity. Employee well-being is not a nice-to-have: it’s foundational to an organization’s overall health and success.

What happens when organizations don’t invest in a better culture? The likelihood of burnout increases, and job satisfaction decreases. Employees become cynical and their productivity declines. Their physical and psychological health are also negatively impacted. Burnout is persistent; it’s not something people get over after taking some time off. Burnout also increases turnover—employees leave to look for healthier work environments. Therefore, alleviating burnout requires organizational changes that address its causes.

Aspect of culture	Effect on burnout*	Effect on job satisfaction	Effect on productivity
Westrum’s organizational culture	Substantial decrease	Substantial increase	Substantial increase
Organization stability	Substantial decrease	Substantial increase	Minor increase
Job security	Substantial decrease	Minor increase	Minor increase
Flexibility	Minor decrease	Minor increase	Minor increase
Knowledge sharing	Substantial decrease	Minor increase	Minor increase
User-centrism	Minor decrease	Substantial increase	Substantial increase
Work distribution	No effect	Minor increase	Minor increase

*You might notice how the color scheme is flipped for burnout. This is because reducing burnout is a good thing!

Summary

This page discusses how a healthy organizational culture is crucial for employee well-being. When organizations don’t invest in improving their culture, employees are more likely to experience burnout, decreased job satisfaction, and reduced productivity. Burnout can have persistent negative effects on physical and mental health, and can lead to increased turnover as employees seek healthier work environments.

The table shows how different aspects of culture impact burnout, job satisfaction, and productivity. Notably, having a good organizational culture (as defined by Westrum) substantially decreases burnout while increasing job satisfaction and productivity. Other factors like organizational stability, job security, flexibility, and knowledge sharing also help reduce burnout and boost satisfaction and productivity to varying degrees.

The key takeaway is that organizations must actively invest in creating a healthy culture to support employee well-being and organizational success. Without this investment, the negative consequences of poor culture will persist. Here is the content converted to markdown format with a summary at the end:

How, when, and why who you are matters

Chapter 8 Takeaways

Who you are matters: we found that certain groups of respondents have different outcomes than other respondents, such as more burnout or less productivity.
We have also identified specific practices you can implement to mitigate some of these negative outcomes.

Introduction

A general phenomenon pervaded 2022’s analysis: the way that work is set up might be conducive to the well-being of some, but not all.

In 2022, we found that people who identified as being underrepresented reported higher levels of burnout.[^1]

In this chapter, we’ll see that this finding replicates, and start to address why underrepresented groups are more likely to experience burnout and what factors can help prevent this.

Further, the instability that has gripped many industries has led to questions around new hires. Organizations are concerned that it takes new employees a long time to become productive. They’re looking for ways to help new employees get up to speed more quickly. We will dig into this here, too.

What did we find and what does it mean?

Some people are more burnt out than others

Last year, we found that respondents who identified as women or self-described their gender, and respondents who identified as being underrepresented in any way, reported being more burnt out than respondents who identified as men and did not identify as underrepresented. These findings are consistent with a body of prior research that suggests people who are underrepresented experience a greater degree of burnout[^2] and work-related stress[^3] than their represented peers.

For these reasons, we were interested in examining whether disparities in burnout would be found in our data again this year, and they were. Respondents who identified as women or self-described their gender reported experiencing 6% higher levels of burnout than respondents who identified as men. Respondents who identified as being underrepresented in any way reported 24% higher levels of burnout than respondents who did not identify as being underrepresented.

Summary

This chapter discusses how certain groups, particularly those who are underrepresented, experience higher levels of burnout compared to others. The findings from the 2022 report are consistent with this year’s data, showing that women and underrepresented groups reported higher burnout levels than men and those not underrepresented. The chapter aims to explore why these disparities exist and identify practices that can help mitigate the negative outcomes. Additionally, the chapter will address concerns about new hires and their productivity, seeking ways to help them get up to speed more quickly.

How, when, and why who you are matters

Some types of work predict more burnout. Aspects of the workplace that might seem neutral or beneficial, like quality documentation or a stable team, don’t reduce burnout for all individuals. This might be because of tasks that benefit the organization but contribute to burnout for some individuals.

To understand the respondents’ experience of burnout, the study looked at:

Specific tasks, like coding, meetings, or supporting teammates
Characteristics of the work, like unplanned work, its visibility, or amount of toil

The same task might be experienced differently by different people or at different times. For example, some code reviews might be unplanned toil, while others might showcase leadership and technical expertise.

Respondents who identify as underrepresented reported doing 24% more repetitive work (toil) than those who do not identify as underrepresented. Respondents who identified as women or self-described their gender reported doing 40% more repetitive work than respondents who identified as men. These two groups also report doing more unplanned work that is less visible to peers and less aligned with their professional skill set. This partially explains the burnout reported by these groups.

Non-promotable tasks

Non-promotable tasks matter to the organization but do not help advance one’s career, such as through increased compensation or marketability. Evidence shows women do more of this type of work because they are more likely to be asked and more likely to say yes due to social costs of saying no.

The unequal distribution of non-promotable tasks can negatively impact women’s careers and earnings. Some women take on more hours to have adequate career-relevant work.

Summary

This page discusses how the type of work someone does can predict their level of burnout. Underrepresented groups and women report doing more repetitive, unplanned work that is less visible and less aligned with their skills, which contributes to their higher levels of burnout. The concept of “non-promotable tasks” is introduced - work that benefits the organization but doesn’t advance the individual’s career. Women tend to do more of these tasks, which can hurt their career progression and compensation. The unequal distribution of work is a key factor in the differing experiences of burnout across groups.

How, when, and why who you are matters

Formal processes of work distribution reduce burnout for some respondents

We asked respondents if they have formal processes to distribute work evenly. We call this work distribution, and we expected to see this mitigate the burnout experienced by some respondents.

We found that work distribution did reduce burnout for respondents who identified as men and for respondents who identified as women or self-described their gender. With a high level of work distribution, the difference in burnout across genders disappeared.

We were surprised to find that work distribution had no impact on the level of burnout experienced by respondents who identify as underrepresented. This finding raises more questions:

Do formal processes to distribute work evenly still result in unequal work distribution?
Does “equal work” take into account the characteristics of tasks, like interruptions or visibility?
And how do we mitigate other factors contributing to burnout, apart from work tasks, that might be more significant for this group?

Summary

This page discusses how formal processes to distribute work evenly (called “work distribution”) impact burnout levels for different groups. The study found that work distribution reduced burnout for men, women, and those who self-described their gender, to the point where burnout levels were similar across these groups when work distribution was high. However, work distribution did not impact burnout levels for underrepresented groups. This raises questions about whether formal work distribution processes truly result in equal work, if they account for task characteristics like interruptions or visibility, and how to mitigate other burnout factors that may be more significant for underrepresented groups.

How, When, and Why Who You Are Matters

A Key Finding and Some Context

A key finding of this report is that individuals who define themselves as underrepresented experience much higher burnout than their colleagues. The report explored some possible reasons for this. This section connects these findings to broader research on belongingness and associated organizational practice strategies.

Identifying as underrepresented in a group demonstrates a vulnerability to “belonging uncertainty”, a well-established psychological phenomenon. This uncertainty (for example, “Do I belong here”, “Can people like me be successful here?”) is either reinforced or redefined through people’s continued experiences and interpretations of those experiences. These well-established processes related to belonging uncertainty may help contextualize the finding from this report that individuals who identify as underrepresented report higher levels of burnout.

What Can Organizations Do?

It is important to remember that diversity, inclusion, equity, and belonging mean different things and to achieve them, they require different, interconnected, and sustained strategies. Achieving belongingness requires true and sustained commitments.

If individuals struggle in an organization, the first question shouldn’t be: “What is wrong with this individual?” The first questions should be: “Why would it make sense for someone to feel this way and what structural elements of our organization facilitate this feeling (for example, what elements keep this feeling in place or make it worse)?”

When problems are identified, changes should be at the organizational level while also providing support at the individual level—a “both and” approach. Supporting individuals to impact the systems governing an organization will allow changes to become built into the system and outlast the individual actors. Taking this systems and sustainability mindset will allow changes to be built into the institution so they outlast individual actors. This generative quality is what can allow organizations to strive towards belongingness. Strive is key here. Belongingness is built through sustained experience and action; it is never done, and that is why it is so fundamental to workplace health and productivity.

A number of tools exist to support organizations in this work. For example, the 2023 Surgeon General’s report on the topic of loneliness identifies that social connection and belongingness are key antidotes to loneliness and burnout.

Summary

This page discusses how individuals who identify as underrepresented in an organization often experience higher levels of burnout. This is connected to the concept of “belonging uncertainty” - the doubt of whether one truly belongs or can succeed in a group. Organizations need to take a systemic approach to fostering belonging, making changes at the organizational level while also supporting individuals. Belongingness requires ongoing effort and commitment, as it is fundamental to workplace well-being and productivity. The Surgeon General has identified social connection and belonging as important solutions to the problems of loneliness and burnout. Here is the text converted to markdown with a summary at the end:

New hires do struggle with productivity

New hires (<1 year of experience on team) score 8% lower on productivity than experienced teammates (>1 year experience). Maybe this is to be expected. Starting on a new team is challenging and even if you are experienced in the role, the amount of team-specific knowledge required to get off the ground can be daunting. Further, being on a team is more than just a matter of skills and knowledge. Anecdotally, there is also a social component that is essential for productivity. Things like belonging, feeling like a contributing member, and psychological safety take time to develop.

Is there anything that might help new hires ramp up? We hypothesized that organizations could help new hires in three ways:

Providing high-quality documentation.
Incorporating artificial intelligence into workflows, which has been shown in other research to be more helpful for inexperienced workers than experienced workers.
Working together in person, which some have suggested could be particularly beneficial in the onboarding phase.

Summary

This page discusses how new hires, defined as those with less than 1 year of experience on a team, tend to have 8% lower productivity compared to more experienced team members. This is likely due to the challenges of starting on a new team, such as acquiring team-specific knowledge and developing a sense of belonging and psychological safety. The authors hypothesize that organizations can help new hires ramp up productivity by providing high-quality documentation, incorporating AI into workflows (which may be especially beneficial for inexperienced workers), and facilitating in-person collaboration during the onboarding phase. Summary:

This page discusses how certain practices affect the productivity of new hires in an organization. The key findings are:

High-quality documentation leads to substantial improvements in productivity for everyone, including new hires. New hires in teams with well-written documentation are 130% as productive as those in teams with poorly written documentation.
AI has minor benefits on an individual’s productivity.
New hires don’t get any special benefits from these practices compared to everyone else.
Flexibility in terms of how, where, and when employees work has a positive impact on productivity. However, it’s unclear if the physical location of work (office vs. remote) makes a difference.
Organizations should not solely optimize for productivity but also consider factors like work-life balance and avoiding burnout.

The page suggests that providing high-quality documentation and offering flexibility to new hires is more likely to improve their productivity than forcing them to work from the office. Here is the text converted to markdown format with a summary at the end:

How will you put the research into practice?

Explore these findings in the context of your organization, teams, and services you are providing to your customers.

Share your experiences, learn from others, and get inspiration from other travelers on the continuous improvement journey by joining the DORA community at https://dora.community.

Final thoughts

Thank you for participating in this year’s research and reading this report. We are always looking for better ways to explore the connections between how teams work and the outcomes they are able to achieve.

The most important takeaway from our years-long research program is that teams who adopt a mindset and practice of continuous improvement are able to achieve the best outcomes.

The capabilities we’ve explored can be used as dials that drive outcomes. Some of those dials are within reach of individuals, while others are only accessible through coordinated effort across your entire organization. Identify which dials need adjusting for your organization, and then make investments in those adjustments.

Improvement work is never done but can create long-term success for individuals, teams, and organizations. Leaders and practitioners share responsibility for driving this improvement work.

Summary

This passage encourages readers to put the research findings into practice within their own organizations. It suggests exploring how the findings apply to your specific teams and services, and sharing experiences with others on the continuous improvement journey through the DORA community.

The key takeaway is that adopting a mindset of continuous improvement leads to the best outcomes. The capabilities discussed act as “dials” that can be adjusted to drive outcomes, some by individuals and some through coordinated organizational efforts. Ongoing improvement work, driven by both leaders and practitioners, is crucial for long-term success. The passage thanks readers for participating in the research. Here is the Acknowledgments section converted to Markdown format:

Acknowledgments

Every year, this report enjoys the support of a large family of passionate contributors from all over the world. All steps of its production—survey question design, localization, analysis, writing, editing, and typesetting—are touched by colleagues who helped to realize this large effort. The authors would like to thank all of these people for their input, guidance, and camaraderie.

Contributors

Core team

James Brookbank
Kim Castillo
Derek DeBellis
Nathen Harvey
Michelle Irvine
Amanda Lewis
Eric Maxwell
Steve McGhee
Dave Stanke
Kevin Storer
Daniella Villalba
Brenna Washington

Editors

Mandy Grover
Jay Hauser
Stan McKenzie
Anna Eames Mikkawi
Mike Pope
Tabitha Smith
Olinda Turner

Survey localization

Daniel Amadei
Kuma Arakawa
William Bartlett
Antonio Guzmán
Shogo Hamada
Yuki Iwanari
Vincent Jobard
Gustavo Lapa
Mauricio Meléndez
Jeremie Patonnier
Miguel Reyes
Pedro Sousa
Laurent Tardif
Kimmy Wu
Vinicius Xavier
Yoshi Yamaguchi

Advisors and experts in the field

Jared Bhatti
Lisa Crispin
Rob Edwards
Dave Farley
Steve Fenton
Dr. Nicole Forsgren
Aaron Gillies
Denali Lumma
Emerson Murphy-Hill
Harini Sampath
Robin Savinar
Dustin Smith
Jess Tsimeris
Dr. Laurie Weingart
Betsalel (Saul) Williamson
Dr. Jeffrey Winer

Summary

This acknowledgments section recognizes the large group of contributors who helped create the Accelerate State of DevOps 2023 report. It thanks people involved in all aspects of producing the report, including the core team, editors, survey localization experts, and advisors/experts in the field. The section expresses gratitude for their input, guidance, and camaraderie in realizing this significant effort.

Authors

Derek DeBellis

Quantitative user experience researcher at Google
Lead investigator for DORA
Focuses on survey research, logs analysis, and measuring concepts that demonstrate a product or feature is delivering value to people
Published on various topics including human-AI interaction, impact of COVID-19 on smoking cessation, designing for NLP errors, role of UX in privacy discussions, team culture, and AI’s relationship to employee well-being and productivity
Current extracurricular research explores ways to simulate the propagation of beliefs and power

Amanda Lewis

DORA.community development lead and developer relations engineer on the DORA Advocacy team at Google Cloud
Built connections across developers, operators, product managers, project management, and leadership throughout her career
Worked on teams that developed ecommerce platforms, content management systems, observability tools, and supported developers
Brings experience and empathy to helping teams understand and implement software delivery and reliability practices

Daniella Villalba

User experience researcher at Google
Uses survey research to understand factors that make developers happy and productive
Studied benefits of meditation training and psycho-social factors affecting college students’ experiences before joining Google
Received PhD in Experimental Psychology from Florida International University

Summary

This page introduces the three authors of the report: Derek DeBellis, Amanda Lewis, and Daniella Villalba. Derek is a quantitative UX researcher at Google who leads DORA and has published on various topics related to human-AI interaction, team culture, and employee well-being. Amanda is the DORA.community development lead who brings her experience in building connections across different roles to help teams implement software delivery and reliability practices. Daniella is a UX researcher at Google who uses surveys to understand factors contributing to developer happiness and productivity, with a background in studying meditation and psycho-social factors in college students. Here is the text converted to markdown format with a summary at the end:

Dave Farley

Dave Farley is the managing director and founder of Continuous Delivery Ltd., author of Modern Software Engineering, and co-author of the best-selling Continuous Delivery book. He is one of the authors of the Reactive Manifesto and a winner of the Duke Award for the open source LMAX Disruptor project. Dave is a pioneer of continuous delivery, a thought leader, and an expert practitioner in CD, DevOps, test-driven development (TDD), and software design. He has a long track record in creating high-performance teams, shaping organizations for success, and creating outstanding software. Dave is committed to sharing his experience and techniques with software developers around the world, helping them to improve the design, quality, and reliability of their software. He shares his expertise through his consultancy, YouTube channel, and training courses.

Eric Maxwell

Eric Maxwell leads Google’s DevOps transformation practice, where he advises the world’s best companies on how to improve by delivering value faster. Eric spent the first half of his career as an engineer in the trenches, automating all the things and building empathy for other practitioners. Eric co-created Google’s Cloud Application Modernization Program (CAMP), and is a member of the DORA team. Before Google, Eric spent time whipping up awesome with other punny folks at Chef Software.

James Brookbank

James Brookbank is a cloud solutions architect at Google. Solutions architects help Google Cloud customers by solving complex technical problems and providing expert architectural guidance. Before joining Google, James worked at a number of large enterprises with a focus on IT infrastructure and financial services.

Dr. Jeffrey Winer

Jeffrey P. Winer, PhD is an attending psychologist, behavioral health systems consultant, and psychosocial treatment developer within the Boston Children’s Hospital Trauma and Community Resilience Center (TCRC) and an assistant professor at Harvard Medical School. With his colleagues at the TCRC, his work is primarily focused on building, testing, disseminating, and implementing culturally-responsive & trauma-informed psychosocial interventions for youth and families of refugee and immigrant backgrounds. He is co-author of the book, Mental Health Practice with Immigrant and Refugee Youth: A Socioecological Framework. He has consulted with programs across the United States and Canada. Psychosocial prevention and intervention tools he has helped develop or adapt are currently used around the world.

Summary

This page provides biographies of four authors:

Dave Farley, an expert in continuous delivery, DevOps, and software design who shares his knowledge through consulting, YouTube, and training.
Eric Maxwell, who leads Google’s DevOps transformation practice advising companies on delivering value faster.
James Brookbank, a cloud solutions architect at Google who helps customers solve complex technical problems.
Dr. Jeffrey Winer, a psychologist at Boston Children’s Hospital focused on developing culturally-responsive interventions for refugee and immigrant youth and families.

Authors of the Accelerate State of DevOps 2023 Report

Kevin M. Storer

User experience researcher at Google
Leads research on how software development teams interact with and through DevOps tools
Ph.D. in Informatics from the University of California, Irvine
Authored publications on human-centered programming, developer experience, information behavior, accessibility, and ubiquitous computing

Kim Castillo

User experience program manager at Google
Leads the cross-functional effort behind DORA, overseeing research operations and publication of the report
Works on UX research for Duet AI in Google Cloud
Previous experience in software delivery, technical program management, and agile coaching
Background in psycho-social research focusing on extrajudicial killings, urban poor development, and community trauma and resilience in the Philippines

Michelle Irvine

Technical writer at Google
Leads research on the impact and production of technical documentation
Previously worked in educational publishing and as a technical writer for physics simulation software
BSc in Physics and MA in Rhetoric and Communication Design from the University of Waterloo

Nathen Harvey

Leads the DORA Advocacy team as a developer relations engineering manager at Google Cloud
Helped teams apply the principles and practices of DevOps and SRE
Co-author of the Accelerate State of DevOps report for the past three years
Co-edited and contributed to “97 Things Every Cloud Engineer Should Know”

Steve McGhee

Reliability advocate, helping teams understand how to build and operate world-class, reliable services
10+ years as a site reliability engineer at Google, learning how to scale global systems in Search, YouTube, Android, and Google Cloud
Managed multiple engineering teams in California, Japan, and the UK
Helped a California-based enterprise transition onto the cloud

Summary

This page introduces the authors of the Accelerate State of DevOps 2023 report. The authors have diverse backgrounds, including user experience research, program management, technical writing, developer relations, and site reliability engineering. They bring expertise from their work at Google and other organizations, focusing on topics such as DevOps, software delivery, and reliability. The authors have contributed to the report through their research, advocacy, and practical experience in the field.

Methodology

This chapter outlines the process of creating this report, from initial ideas to the final product. It aims to answer questions about the report’s generation and provide a blueprint for conducting similar research.

Step 1: Generate important outcomes for high-performing, technology-driven organizations

Determining the desired outcomes is crucial. The research program focuses on guiding people towards their desired ends. To identify these outcomes, a combination of qualitative research (asking people about their goals), surveys, community interaction, and workshops is used. Consistently identified outcomes include:

Outcome	Description
Organizational performance	Producing revenue, value for customers, and benefits for the extended community
Team performance	Application or service teams’ ability to create value, innovate, and collaborate
Employee well-being	Strategies that benefit employees by reducing burnout, fostering job satisfaction, and increasing productivity
Software delivery performance	Teams’ ability to deploy software rapidly and successfully
Operational performance	Shipped software providing a reliable user experience

Summary

This page discusses the first step in creating the report: generating a set of important outcomes for high-performing, technology-driven organizations. The researchers use various methods to identify these outcomes, which include organizational performance, team performance, employee well-being, software delivery performance, and operational performance. The goal is to guide people and organizations towards their desired ends. Step 2. Hypothesize about how, when, and why these outcomes are achieved

In this step, the researchers aim to identify factors that reliably impact the outcomes identified in step 1. They want to establish causal relationships, such as “Holding everything equal, x has an effect on y.” This information can guide practitioners in making data-informed decisions about what changes to implement.

The researchers also explore the conditions under which these pathways have more or less impact, asking “when” and “for whom.” For example, while documentation quality generally reduces burnout, it has the opposite effect on underrepresented respondents, increasing their burnout. Understanding these nuances is crucial because teams and individuals are rarely average.

Furthermore, the researchers hypothesize about the mechanisms that explain why or how these effects occur. Based on previous results and existing literature, they hypothesized that underrepresented individuals experience more burnout. To answer the question of why this happens, they proposed potential mechanisms to test, such as underrepresented individuals taking on or being assigned more toilsome work.

Summary: In this step, researchers identify factors that reliably impact outcomes and explore the conditions under which these effects occur. They also hypothesize about the mechanisms behind these effects to understand why they happen. This information helps guide data-informed decision-making for practitioners, while acknowledging that the effects may vary depending on the specific team or individual.

Hypothetical Model for the Documentation Chapter

Diagram

Summary

This hypothetical model for the documentation chapter illustrates the relationships between various factors and their effects on key outcomes and well-being. Gender influences work distribution, which in turn affects documentation quality. Good documentation quality leads to better knowledge sharing, but underrepresentation can negatively impact this. Underrepresentation also hinders the development of technical capabilities and processes, such as trunk-based development, loosely coupled architecture, code review speed, continuous integration, AI, and continuous delivery. These technical capabilities and processes positively contribute to key outcomes like team performance, organizational performance, software delivery performance, and operational performance. Ultimately, these key outcomes have a positive effect on well-being, including job satisfaction, reduced burnout, and increased productivity.

Step 3. Hypothesize about potential confounds

If you’ve ever discussed data, you’ve probably run into a spurious correlation. A spurious correlation is when two variables appear to be related, but there is no causal connection between them.

A famous example of a spurious correlation is the relationship between per capita consumption of mozzarella cheese and the number of civil engineering doctorates awarded. The data shows a strong positive correlation of 95.86% between these two variables from 2000 to 2009.

However, it’s unlikely that there’s any causal connection between engineering doctorates and mozzarella cheese consumption. The confounding element lurking behind this relationship is time. If both variables trend positively in the same time period, they will likely have a positive correlation, even if there is no direct relationship between them.

Including time in a model, or detrending the data, would probably nullify the relationship. We can represent this in a GOAT diagram:

In this diagram, Time is the confounding variable that influences both Mozzarella cheese consumption and Engineering doctorates, creating a spurious correlation between them.

Summary

This page discusses the concept of spurious correlations, which are apparent relationships between variables that are not causally connected. The example given is a strong positive correlation between mozzarella cheese consumption and engineering doctorates awarded. However, this relationship is likely due to a confounding variable - time. Both variables trended positively over the same period, creating a correlation without causation. The key takeaway is to be cautious of correlations and consider potential confounding factors that may be influencing the relationship.

Accounting for Confounding Variables in Causal Models

When analyzing relationships between variables, it’s important to account for potential confounding factors that may influence the observed association. Failing to do so can lead to spurious relationships, where two variables appear to be related but are actually influenced by a third, unmeasured variable.

For example, if we don’t account for time as a confounding variable, the data might show a spurious relationship between mozzarella cheese consumption and engineering doctorates. To help researchers properly estimate the effect of one variable on another, tools like Dagitty (https://dagitty.net/dags.html) can be used to specify causal models and identify the implications of the model, such as what needs to be accounted for and what should be ignored.

These tools can help researchers conclude that correlation might not imply causation, but it does imply the way someone is thinking about causation. While it’s impossible to capture all the elements that bias researchers’ estimates, efforts should be made to account for biasing pathways to provide accurate estimates of the effects of various activities, technologies, and structures on key outcomes.

Practitioners rely on these models to understand what factors will impact their desired outcomes. Models that fail to account for biases will fail to provide practitioners with the guidance they need, potentially leading to incorrect conclusions and misguided decisions.

Summary

This page discusses the importance of accounting for confounding variables when analyzing causal relationships between variables. Failing to do so can lead to spurious relationships and incorrect conclusions. Tools like Dagitty can help researchers specify causal models and identify biasing pathways, allowing them to provide more accurate estimates of the effects of various factors on key outcomes. Practitioners rely on these models for guidance, so it’s crucial to account for biases to avoid misguided decisions. Here is the page converted to markdown with a summary at the end:

Step 4. Develop the survey

There are three aspects to developing the survey: operationalization, experience, and localization.

Operationalization

We want measures that adequately capture the concepts we’re interested in, and that do so reliably. Translating an abstract concept into something measurable is the art of operationalization. These measures are the ingredients at the base of all the analysis. If our measures are not giving us clear signals, how can we trust the rest of the analysis? How do we measure a concept as elusive as, for example, productivity? What about burnout or operational performance?

First, we look to the literature to see if there are successful measures that already exist. If we can use previously validated measures in our survey, we gain a bridge from the survey to all the literature that has amassed around that question. Our ongoing use of Westrum’s Typology of Organizational Cultures is an example of us reusing previously validated measures.

However, many concepts haven’t previously been validated for the space we do research in. In that case, we’re doing qualitative research to untangle how people understand the concept and we’re looking through the more philosophical literature on the intricacies of the concept.

Survey experience

We want the survey to be comprehensible, easy, no longer than necessary, and broadly accessible. These are difficult goals, given all the questions we want to ask, given the technical understanding required in order to answer these questions, and given the variation in nomenclature for certain practices. We do remote, unmoderated evaluations to make sure the survey is performing above certain thresholds. This requires doing multiple iterations.

Localization

People around the world have responded to our survey every year. This year we worked to make the survey more accessible to a larger audience by localizing the survey into English, Español, Français, Português, and 日本語. This was a grassroots effort, led by some incredible members of the DORA community. Googlers all over the world contributed to this effort, as well as a partner in the field—many thanks to Zenika (https://www.zenika.com) for our French localization. We hope to expand these efforts and make the survey something that is truly cross-cultural.

Summary

This page discusses the three key aspects in developing a survey: operationalization, experience, and localization.

Operationalization involves translating abstract concepts into measurable items. This is done by looking at existing validated measures in literature or conducting qualitative research to understand how people perceive the concept.

Survey experience focuses on making the survey easy to understand, accessible, and concise. This is achieved through remote unmoderated evaluations and multiple iterations.

Localization aims to make the survey accessible to a global audience by translating it into multiple languages. This grassroots effort involved Googlers worldwide and external partners to create a truly cross-cultural survey. Step 5. Collect survey responses

The survey responses are collected through two main channels:

Organic approach
- Blog posts, email campaigns, and social media posts are used to inform people about the survey
- Snowball sampling is employed by asking community members to share the survey
Panel approach
- Used to supplement the organic channel and ensure adequate representation
- Targets underrepresented groups in the technical community
- Aims to get sufficient responses from specific industries and organization types
- Provides control over recruitment, which is not possible with the organic approach

Step 6. Analyze the data

The data analysis involves three key steps:

Data cleaning
- Aims to increase the signal-to-noise ratio by removing noisy responses
- Noisy responses include those from distracted or speeding respondents, or those not answering in good faith
- Care is taken to avoid removing signal or excluding data in a biased manner that validates hypotheses
Measurement validation
Model evaluation

Summary: The survey responses are gathered through organic and panel approaches. The organic approach uses various online channels to promote the survey, while the panel approach targets underrepresented groups and specific industries to ensure adequate representation. The data analysis begins with data cleaning to remove noisy responses while being cautious not to introduce bias. Measurement validation and model evaluation are the next steps in the analysis process. Measurement Validation

In this report, we discuss the concepts we try to measure, which can be called variables. These variables are the elements included in our research models. There are two main ways to analyze the validity of these measures: internally and externally.

Internal Validity:

Looks at what indicates the presence of a concept
Example: Quality documentation might be indicated by people using their documentation to solve problems
Many variables consist of multiple indicators because the constructs are multifaceted
To understand the multifaceted nature of a variable, we test how well the items representing that construct gel together
If they share a high level of communal variance, we assume an underlying concept of interest
Example: Happiness is multifaceted, with expected feelings, thoughts, and actions emerging together when happiness is present
Confirmatory factor analysis is used to test whether indicators show up together, using the lavaan R package
If indicators don’t gel, the concept might need revision or be dropped

External Validity:

Looks at how the construct fits into the world
Expects certain relationships between constructs
Example: Happiness and sadness should have a negative relationship
If a happiness measure is positively correlated with sadness, the measure or theory might be questioned
Constructs with expected positive relationships, like productivity and job satisfaction, should not have too high of a correlation, or they might be measuring the same thing
If the correlation is too high, measures may not be calibrated enough to pick up differences between concepts, or the hypothesized difference may not exist

Summary: This page discusses the importance of validating the measures used in research, both internally and externally. Internal validity looks at how well the indicators of a concept gel together, while external validity examines how the construct relates to other constructs in the world. Confirmatory factor analysis is used to test internal validity, and relationships between constructs are analyzed for external validity. If measures don’t meet these validity criteria, they may need to be revised or dropped from the research. Model Evaluation

In this section, the authors describe their approach to evaluating the hypothetical models built in steps 2 and 3 using the clean data obtained in step 6. They have adopted a Bayesian approach to understand the plausibility of various hypotheses given the data, rather than focusing on the likelihood of the data given the null hypothesis (i.e., no effect present). The main tools used in R for this purpose are blavaan and rstanarm.

The authors aim for parsimony when evaluating models, starting with a simplistic model and gradually adding complexity until it is no longer justified. They provide an example where organizational performance is predicted to be the product of the interaction between software delivery performance and operational performance.

Two models are presented:

A simplistic model that does not include the interaction: Organizational performance ~ Software delivery performance + Operational performance
A second model that adds the interaction: Organizational performance ~ Software delivery performance + Operational performance + Software delivery performance x Operational performance

To determine whether the additional complexity is necessary, the authors use leave-one-out cross-validation (LOOCV) and Watanabe–Akaike widely applicable information criterion, based on recommendations from “Regression and Other Stories” and “Statistical Rethinking”.

Summary: The authors evaluate their hypothetical models using a Bayesian approach and the clean data obtained earlier. They start with a simple model and add complexity as needed, using LOOCV and information criteria to determine if the added complexity is justified. This allows them to understand the plausibility of different hypotheses given the data, rather than just the likelihood of the data under the null hypothesis. Here is the text converted to markdown format with a summary at the end:

Step 7. Report findings

We then reviewed these results as a team. This year, we spent a few days together in Boulder, Colorado, synthesizing the data with the experiences of subject-matter experts. We did this for every chapter of the report, hypothesis by hypothesis. Data interpretation always has the risks of spin, speculation, anecdotes, and leaps. These risks were mitigated by having multiple people with diverse backgrounds in a room that encouraged questioning, divergence, unique perspectives, and curiosity.

With the results in hand, the report authors retreated to their respective corners of the world and wrote. Throughout the writing process, editors and subject-matter experts were consulted. Having these perspectives was vital in helping us communicate our ideas. The person responsible for analyzing this data was responsible for making sure that nothing we said deviates from what the data says.

These chapters were bundled together into a cohesive design by our talented design partners, BrightCarbon.

Step 8. Synthesize findings with the community

We count on community engagement to come up with ways both to leverage and to interpret these findings. We try to be particular in our recommendations, but in the end, there are innumerable implementations a team could try based on the results we uncover. For example, loosely coupled architecture seems to be a beneficial practice based on the outcomes we measure. But there surely isn’t just a single way to establish a loosely coupled architecture. Generating and sharing approaches as a community is the only way to continually improve. Our map of the world is an interpretation and abstraction of the territory and context in which you, your team, and your organization operate.

To participate in DORA’s global community of practice, visit the DORA Community site.

Summary

This passage outlines the final steps in the DORA research process. After collecting and analyzing data, the team meets to review and synthesize the findings. They discuss each hypothesis and chapter of the report, bringing in diverse perspectives to mitigate potential biases in data interpretation. The authors then write up the chapters, consulting with editors and subject-matter experts to ensure accuracy and clarity. Finally, the report is designed and packaged by a professional design firm.

The authors emphasize the importance of community engagement in leveraging and interpreting the findings. While the report provides specific recommendations, there are many ways to implement them in practice. The authors encourage the community to generate and share approaches to continuously improve. They invite readers to participate in the global DORA community of practice to further this collaboration. Demographics and Firmographics

This year, nearly 3,000 working professionals from various industries worldwide participated in the DORA research program’s survey. The survey aims to understand the factors that drive high-performing, technology-driven organizations. The demographic and firmographic questions used in this survey were based on research done by Stack Overflow, which had over 70,000 respondents in their 2022 Developer Survey.

Compared to the Stack Overflow Developer Survey, the DORA survey sample included a higher proportion of women, disabled participants, and participants working in larger organizations. The sample set was similar to Stack Overflow’s in terms of race and ethnicity.

The number of organic respondents in this year’s survey increased by 3.6 times compared to 2022.

Summary: The DORA research program conducted a survey with nearly 3,000 professionals from various industries to understand the factors driving high-performing, technology-driven organizations. The survey questions were based on Stack Overflow’s research, which had a larger sample size. The DORA survey had a higher representation of women, disabled participants, and those working in larger organizations compared to the Stack Overflow survey, but was similar in terms of race and ethnicity. The number of organic respondents increased significantly compared to the previous year. Demographics

Gender:

Gender	% of respondents
Man	81%
Woman	12%
Or, in your own words	2%
Prefer not to say	3%

The proportion of women respondents decreased from 18% in 2022 to 12% in 2023.

Disability:

Disability	% of respondents
None of the disabilities applied	87%
Yes	6%
Prefer not to say / did not respond	7%

Disability was identified along six dimensions following guidance from the Washington Group Short Set. The percentage of people with disabilities decreased from 11% in 2022 to 6% in 2023.

Underrepresented:

Underrepresented	% of respondents
No	77%
Yes	15%
Prefer not to respond	7%

Identifying as a member of an underrepresented group can refer to race, gender, or another characteristic. The percentage of people who identify as underrepresented has decreased slightly from 19% in 2022 to 15% in 2023.

Summary: This page presents demographic data from the 2023 State of DevOps survey. It shows the breakdown of respondents by gender, disability status, and whether they identify as part of an underrepresented group. Compared to 2022, there was a decrease in the proportion of women respondents, people with disabilities, and those identifying as underrepresented. The majority of respondents were men, did not have a disability, and did not identify as part of an underrepresented group. Here is the converted markdown:

Race/Ethnicity	% of respondents
White	37.6%
European	29.2%
Asian	9.8%
North American	8.2%
Indian	8.0%
Prefer not to say	4.9%
South American	4.3%
Hispanic or Latino/a	4.1%
South Asian	2.9%
East Asian	2.3%
Middle Eastern	2.0%
Multiracial	1.7%
Black	1.6%
Or, in your own words	1.6%
Southeast Asian	1.2%
I don’t know	0.7%
Indigenous	0.6%
Caribbean	0.6%
Biracial	0.5%
North African	0.4%
Central Asian	0.3%
Pacific Islander	0.3%
Ethnoreligious group	0.3%
Central American	0.6%

We adopted the question from the 2022 Stack Overflow Developer’s survey. As noted earlier, our sample set is similar with one notable deviation: we have a lower proportion of Europeans.

Demographics and firmographics

Accelerate State of DevOps 2023 v. 2023-12

Summary: This page presents data on the race and ethnicity of respondents to a survey, likely the Accelerate State of DevOps 2023 survey. The largest groups represented are White (37.6%), European (29.2%), Asian (9.8%), North American (8.2%) and Indian (8.0%). The question was adopted from the 2022 Stack Overflow Developer’s survey, and the sample is noted to be similar to that survey, except with a lower proportion of Europeans. The data is presented as part of the demographics and firmographics section of the report.

Years of experience	Percentage of respondents
<= 9 years	25%
10-14 years	25%
15-21 years	25%
>= 22 years	25%

Summary: The data shows the distribution of work experience among the respondents. The respondents are divided into four equal groups (quartiles) based on their years of experience. The bottom 25% have 9 years or less, the middle 50% (interquartile range) have between 10 and 21 years, and the top 25% have 22 years or more. The median (50th percentile) is 15 years of experience. This indicates that the respondents are generally experienced practitioners, with half of them having worked for 15 years or more.

Years of experience on team	% of respondents
<= 1.5 years (Lower 25%)	25%
1.5 - 5 years (Interquartile, middle 50%)	50%
>= 5 years (Upper 25%)	25%

Summary: This page discusses the experience level of respondents on their current teams. Despite overall work experience being high, many respondents are relatively new to their current teams. Half of the respondents have been on their team for less than 3 years, and a quarter have been on their team for 1.5 years or less. Only 25% of respondents have been on their current team for 5 years or more. The author raises questions about whether this reflects a mentality of continuous improvement or instability in the economy.

Role

72% of respondents consist of individuals who either work on development or engineering teams (30%), work on DevOps or SRE teams (18%), work on IT ops or infrastructure teams (8%), or are managers (16%). In 2022, individuals in those roles made up 85% of respondents. The decrease in respondents from those four roles suggests that we were able to reach more individuals in different roles. The proportion of IT ops or infrastructure teams (8%) is back to 2021 levels (9%) after inflecting in 2022 (19%).

Role	% of respondents
Development or engineering	30.5%
DevOps or SRE	18.1%
Manager	16.4%
IT operations or infrastructure	7.6%
C-level executive	6.2%
Platform engineer	4.2%
Consultant, coach, or trainer	3.0%
Other	3.3%
Product management	2.5%
No response	2.5%
Quality engineering or assurance	1.1%
Professional services	1.0%
Informational security	1.0%
Student	0.5%
User experience design or research	0.4%
Sales or marketing	0.4%
Sales engineering	0.3%
Do not belong to any department	0.3%
Release engineering	0.3%
Prefer not to answer	0.3%
Network operations	0.3%

Summary

This page presents a breakdown of the roles of respondents in the Accelerate State of DevOps 2023 survey. The majority (72%) work in development/engineering, DevOps/SRE, IT operations/infrastructure, or are managers. Compared to 2022, there was a decrease in these four roles, suggesting the survey reached a more diverse set of respondents this year. The data is presented in a table showing the percentage of respondents in each role.

Firmographics

Industry

Industry	Percentage
Technology	36.6%
Financial services	13.7%
Consumer	8.4%
Other	6.6%
Industries manufacturing	5.8%
Healthcare pharmaceuticals	5.7%
Telecommunications	4.2%
Media entertainment	4.2%
Government	3.9%
Education	3.3%
Energy	2.3%
N/A	2.3%
Insurance	2.2%
Nonprofit	1.0%

Number of employees

How many employees work at your organization?

Number of Employees	Percentage
10,000 or more	21.4%
1,000 to 4,999	18.5%
100 to 499	17.8%
20 to 99	13.3%
500 to 999	10.5%
5,000 to 9,999	7.3%
2 to 9	3.3%
10 to 19	3.2%
1 to 4	2.4%
N/A	2.4%

Summary

This page presents demographic and firmographic data from the Accelerate State of DevOps 2023 report. The data is broken down into two main categories: Industry and Number of Employees.

In the Industry category, the largest represented sector is Technology at 36.6%, followed by Financial Services at 13.7%. Other notable industries include Consumer (8.4%), Manufacturing (5.8%), and Healthcare/Pharmaceuticals (5.7%).

The Number of Employees section reveals that the majority of respondents work in larger organizations. 21.4% of respondents work in companies with 10,000 or more employees, while 18.5% work in organizations with 1,000 to 4,999 employees. Mid-sized companies (100 to 499 employees) make up 17.8% of the respondents. Here is the converted markdown with a summary at the end:

Country

We are always thrilled to see people from all over the world participate in the survey. Thank you all!

Country
USA	Denmark	Lithuania	Tunisia	Bangladesh	Guatemala
UK	Switzerland	Thailand	Uruguay	Dominican Republic	Honduras
India	Austria	Hungary	Afghanistan	Ghana	Latvia
Canada	Kenya	Israel	Algeria	Hong Kong (S.A.R.)	Lebanon
Germany	South Africa	Viet Nam	Egypt	Kazakhstan	Luxembourg
Australia	Argentina	UAE	Estonia	Myanmar	Maldives
Brazil	Czech Republic	Bulgaria	Iceland	Saudi Arabia	Malta
Not applicable	Belgium	Croatia	Iran	Somalia	Mauritius
Netherlands	Colombia	Ecuador	Nigeria	Sudan	Mongolia
Japan	Finland	Indonesia	Peru	Uganda	Morocco
France	Ireland	Philippines	Slovakia	Albania	Nepal
Spain	China	Armenia	Slovenia	Bahamas	Qatar
Sweden	Romania	Georgia	South Korea	Belarus	The former Yugoslav Republic of Macedonia
Italy	Singapore	Greece	Sri Lanka	Bolivia
New Zealand	Mexico	Malaysia	Andorra	Cambodia	Trinidad and Tobago
Poland	Turkey	Pakistan	Angola	Costa Rica	United Republic of Tanzania
Norway	Ukraine	Russian Federation	Antigua and Barbuda	Djibouti
Portugal	Chile	Serbia	Bahrain	El Salvador	Zimbabwe

Demographics and firmographics

Accelerate State of DevOps 2023 v. 2023-12

Summary

This page presents data on the countries represented in a survey, likely related to DevOps practices. The United States had the highest participation at 28%, followed by the United Kingdom at 11% and India at 8%. In total, over 70 countries were represented, spanning North America, South America, Europe, Africa, Asia, and Oceania. The wide global participation suggests the growing worldwide interest and adoption of DevOps methodologies. Here is the converted markdown with a summary at the end:

Work arrangement

Employment status

88% of the respondents are full-time employees. 10% of the respondents are contractors. Some contractors report vastly different experiences than full-time employees.

Employment status	% of respondents
Full-time employee	88%
Full-time contractor	8%
Part-time employee	2%
Part-time contractor	2%

*For the primary application or service you work on, what best describes your employment status with the organization that owns the application or service?

The different experience might stem from how they fit into the team. Some contractors report being embedded in the team they work with. This means they work closely with team members every day and consider the difference between themselves and a full-time employee to be negligible. 70% of contractor employee respondents either strongly agree or agree with the statement that they are embedded on their team.

Location

The response pattern this year indicates that, despite return-to-office pushes, working from home is still a reality for many workers. Nearly 33% of respondents work almost exclusively from home (less than 5% of time in the office). 63% of respondents work from home more than they work from the office. For the remaining respondents, hybrid work might be the most common arrangement. This is suggested by 75% of respondents spending less than 70% of their time in the office. There are not many people with a strong attachment to the office. Only 9% of respondents are in the office more than 95% of the time.

Summary

This page discusses the work arrangements and demographics of respondents in a DevOps survey. The majority (88%) are full-time employees, with 10% being contractors. Contractors often have different experiences than full-time staff, possibly due to how embedded they are in their teams.

In terms of location, working from home is still common despite return-to-office trends. A third of respondents work almost entirely remotely, and 63% work from home more than the office. Hybrid arrangements are likely for the rest, with 75% in the office less than 70% of the time. Only 9% are in the office over 95% of the time.

The data is visualized in a chart showing the distribution of the percentage of time spent in the office.

The Models

Introduction

Traditionally, we created one giant model. This year we decided to break it down into multiple models for the following reasons:

Reason	Explanation
Unwieldy models	Huge models can become unwieldy, fast. Every added variable changes the way the model functions. This can lead to inaccurate estimates and makes it difficult to locate the reason for a change.
Section-by-section hypotheses	We created our hypotheses section-by-section this year. Thus, it makes sense to just create a model for each section.
Unclear benefit of giant model	It isn’t obvious what the benefit of a giant model is in estimating the effect of X on Y. To understand the impact of X on Y, we used directed acyclic graphs to help understand what covariates we should and shouldn’t include in the model.
Difficult for reader to understand	The number of hypotheses we addressed this year would make it very difficult for the reader to make sense of the giant model. Imagine combining all the visualizations below into one visualization.

Summary

This page discusses the reasons for breaking down the traditional giant model into multiple smaller models this year. The main reasons are:

Giant models can quickly become hard to manage, with each added variable changing how the model works in potentially inaccurate ways.
Hypotheses were created section-by-section, so it makes sense to have a model per section.
The benefit of a giant model for estimating specific effects is unclear. Directed acyclic graphs were used to determine appropriate covariates.
Having addressed many hypotheses, a giant model would be very difficult for readers to understand and visualize. Here is the text converted to markdown format with a summary at the end:

How do I read these diagrams?

Once you learn how to read these diagrams, you’ll find them to be efficient tools for conveying a lot of information.

Variable Category Diagram

Variable is a concept that we tried to measure (for example documentation quality).
A variable category is simply to show that we think of these as a category, but has nothing to do with the analysis. That is, we did statistically evaluate if this is a higher-order construct.

Variable Category
  • Variable
  • Variable
  • Variable
  • Variable

Effect Diagram Key

A hypothesized effect that the data did not substantiate.
Part of the mediation pathway that we explicitly analyzed.
A negative effect, which simply means decreases, not that it is bad.
A positive effect, which simply means increases, not that it is good.

Warning: the models are general summations!

We categorize some variables together for ease of reading. This categorization strategy makes it possible for the arrow going to a variable category, from a variable category, or both, to be the general pattern of results, but it might not be true for every variable in the category. For example, knowledge sharing has a positive impact on most key outcomes. Therefore, we would draw an arrow with a plus symbol (+) on it from knowledge sharing to the key outcomes variable category. Knowledge sharing, however, doesn’t have a positive impact on software delivery performance. To get into the details, please visit the relevant chapters.

Moderation example

Moderation is a tough concept to grasp in statistics, but in the real world, moderation amounts to saying, “it depends.” Let’s do a quick example to clarify the concept of moderation in the context of this report.

In season 3 of Curb Your Enthusiasm, Larry David says, “I don’t like talking to people I know, but strangers I have no problem with.” This is something that provides us with a quick diagram to discuss:

The models Accelerate State of DevOps 2023 v. 2023-12

Summary

This page explains how to read the diagrams used in the report to convey statistical relationships between variables and variable categories. Key points:

Variables are concepts that were measured, variable categories group related variables
Arrows show hypothesized effects, with + and - indicating positive and negative effects
Moderation means the effect of one variable depends on the level of another variable
The diagrams summarize general patterns, individual variable relationships may vary

The diagrams efficiently summarize complex statistical models, but checking the detailed chapters is advised to understand specific variable relationships. An example using a quote about social interaction preferences illustrates the concept of moderation. Here is the converted markdown with a GOAT diagram:

The models"I don’t like talking to people I know, but strangers I have no problem with." Larry David

This diagram shows that, for Larry, conversation has a positive impact on displeasure. Positive here simply means increase, not that it is necessarily a good thing. This is demonstrated by the solid black line between conversation and displeasure with the arrow pointing to displeasure. This arrow suggests that we believe the causal flow is from conversation to displeasure. From what we can tell, conversations tend to cause Larry displeasure.

The second thing to note is that stranger (here to represent the boolean stranger yes / no) doesn’t point to another variable. Instead, it points to an effect, an arrow. This means we think that stranger modifies not a variable, but an effect. That is why we draw the arrow from stranger to another arrow, not to another variable. We’re saying that whether or not Larry is talking to a stranger impacts the effect of conversation on displeasure. Put differently, we’re saying the effect of conversation on displeasure depends on whether the person Larry is conversing with is a stranger. When the person is a stranger, the effect of conversation is something Larry “has no problem with.” We might say that strangers mitigate the displeasure Larry feels while conversing.

Summary: This page explains a diagram modeling how conversation affects Larry David’s displeasure. Conversation increases displeasure for Larry, but this effect is mitigated if he is talking to a stranger rather than someone he knows. The diagram uses arrows to show the causal relationships between conversation, displeasure, and talking to strangers.

Moderation Type	Effect
Amplifies	Makes positive effects more positive and negative effects more negative
Attenuates	Weakens the effect
Mitigates	Makes positive effects less positive and negative effects less negative
Reverses	Makes positive effects negative and negative effects positive
Modifies	The effect changes in a way that can’t be summed up nicely in one word, often with categorical variables as causes

Mediation is about understanding why or how an effect occurs. We can test for mediation in statistics to determine if the effect of X on Y is explained or partially explained by M.

Mediation Example:

In this example, the effect of the sun on a plant’s height is explained by photosynthesis.

Summary: This page discusses two important concepts in statistics: moderation and mediation. Moderation refers to how a third variable can change the relationship between two other variables in various ways, such as amplifying, attenuating, mitigating, reversing, or modifying the effect. Mediation, on the other hand, is about understanding why or how an effect occurs. By testing for mediation, we can determine if the effect of one variable on another is explained or partially explained by a third variable. The example given shows how photosynthesis mediates the effect of the sun on plant growth.

Technical capabilities and processes	Continuous delivery	Key outcomes
- Trunk-based development	- Continuous delivery	- Team performance
- Loosely coupled architecture		- Organizational performance
- Code review speed		- Software delivery performance
- Continuous integration		- Operational performance
- AI

Summary: This model shows how technical capabilities and processes, such as trunk-based development, loosely coupled architecture, code review speed, continuous integration, and AI, lead to continuous delivery. Continuous delivery then positively impacts both well-being (job satisfaction, less burnout, more productivity) and key outcomes (team performance, organizational performance, software delivery performance, operational performance). In essence, strong technical practices enable continuous delivery, which in turn drives better employee well-being and organizational outcomes.

The Impact of Documentation Quality on Team Performance and Well-being

The diagram illustrates the relationships between documentation quality, technical capabilities, knowledge sharing, well-being, and key outcomes in software development teams.

| Factor | Impact

Chapter 5’s model: Reliability unlocks performance

This chapter explores the central role of operational performance in improving well-being, key outcomes, and amplifying the effect of software delivery performance. The relationship between reliability practices and operational performance is nonlinear.

Summary: This chapter highlights the importance of reliability practices in improving operational performance, which in turn leads to better well-being, software delivery performance, and key outcomes for teams and organizations. The relationship between reliability practices and operational performance is nonlinear, suggesting that investing in reliability can have a significant impact on overall performance. The chapter recommends consulting the details provided to better understand these relationships and their implications for DevOps practices. Here is the converted text in markdown format:

Chapter 6’s model

Flexible infrastructure is key to success

Cloud computing has impacts on key outcomes because it provides a more flexible infrastructure. Cloud computing also leads to better well-being.

The models Accelerate State of DevOps 2023 v. 2023-12

Summary

This page discusses a model from Chapter 6 which shows that cloud computing, whether private, public, hybrid or multi-cloud, leads to a more flexible infrastructure. This flexible infrastructure in turn improves key outcomes like team performance, organizational performance, software delivery performance and operational performance. Cloud computing also directly improves employee well-being, increasing job satisfaction and productivity while reducing burnout. The model is from the Accelerate State of DevOps 2023 report.

Chapter 7’s model

None of this works without investing in culture

We can see that culture is at the center of so much in this diagram. We find that culture has a positive relationship with technical capabilities, key outcomes, and well-being.

Key outcomes

Team performance
Organizational performance
Software delivery performance
Operational performance

Culture

Westrum generative culture
Organizational stability
- Job security
- Flexibility
Knowledge sharing
User-centrism
Work distribution

Technical capabilities and processes

Trunk-based development
Loosely coupled architecture
Reliability practices
Continuous integration
Continuous delivery

Well-being

More job satisfaction
Less burnout
Productivity

Summary

This diagram illustrates how culture is at the heart of an organization’s success. A positive, generative culture directly influences key outcomes, technical capabilities and processes, and employee well-being.

Key outcomes include various aspects of performance, while well-being encompasses job satisfaction, reduced burnout, and productivity. Technical capabilities and processes involve modern development practices like trunk-based development, loose coupling, reliability, and continuous integration/delivery.

The model suggests that investing in a healthy, knowledge-sharing, user-centric culture with job security and flexibility can lead to improved outcomes across the board. Here is the converted markdown with a summary at the end:

Chapter 8’s models

How, when, and why who you are matters

There are two models in this section. One explores why and when people who identify as underrepresented, and people who do not identify as men, experience higher levels of burnout. The other model explores whether documentation quality, work location, or AI can help new hires be more productive.

Summary

This page discusses two models related to diversity, equity and inclusion in the workplace. The first model suggests that people from underrepresented groups and those who do not identify as men tend to experience higher levels of burnout. This is partly due to unfair work distribution and higher levels of toil (tedious, manual work).

The second model looks at factors that can help new hires be more productive. It finds that good documentation and the use of AI tools can boost productivity for new employees. Work location (remote vs in-office) does not seem to impact productivity for new hires.

In simple terms, these models highlight how a person’s identity and background can affect their experience at work, and identifies some ways companies can help set up new hires for success, regardless of their background. Here is the text converted to markdown format with a summary at the end:

Appendix

The previous way of asking about recovery times did not allow for a distinction between a failure initiated by a software change and a failure initiated by something like an earthquake interrupting service at a data center. We had a hypothesis that the more precise language would allow us to compare similar failure types to one another, and that the language would be more statistically aligned with the other three measures of software delivery performance.

We are now using the term “Failed deployment recovery time” to distinguish our measure from the more generic “time-to-restore” that we’ve used in the past and sometimes abbreviated to “MTTR.” MTTR has caused some confusion in the community: is that “M” for mean or median? Additionally, practitioners seeking to learn more from failures, such as those in the resilience engineering space, are moving past MTTR as a reliable measure for guiding learning and improvement.

The newly added question and a new metric, failed deployment recovery time, are more in line with the spirit of measuring software delivery performance.

Summary

This appendix explains a change in terminology from the generic “time-to-restore” or “MTTR” to the more specific “failed deployment recovery time”. This new term allows distinguishing between failures caused by software changes versus external factors like natural disasters. It aims to provide a clearer, more precise metric that aligns better with other measures of software delivery performance. The appendix also notes that MTTR has limitations and resilience engineering practitioners are moving beyond it for guiding learning and improvement after failures. The math behind the comparisons

This section explains the statistical methodology used throughout the report to compare the impact of different variables on various outcomes. The process involves creating a regression model, selecting high and low values for the variable of interest, calculating the mean for covariates, and then using the model to predict outcomes for the high and low values while holding other variables constant. The ratio of these predicted outcomes indicates the relative impact of the variable of interest.

The example provided demonstrates this process:

A regression model is created to predict happiness based on sunshine and temperature.
High and low values for sunshine are selected as one standard deviation above and below the mean.
The mean value for the covariate (temperature) is calculated.
The regression model is used to predict happiness for the high and low sunshine values, keeping temperature constant at its mean value.
The ratio of the predicted happiness values for high and low sunshine is calculated.
This ratio suggests that high levels of sunshine lead to 10% higher levels of happiness relative to low levels of sunshine, all else being equal.

Summary: This appendix explains the statistical methodology used to compare the impact of different variables on various outcomes throughout the report. The process involves creating a regression model, selecting high and low values for the variable of interest, and using the model to predict outcomes while holding other variables constant. The ratio of these predicted outcomes indicates the relative impact of the variable of interest, allowing for clear comparisons that isolate the effect of a single variable. Here is the text converted to markdown format with a summary at the end:

What is a “simulation”?

It isn’t that we made up the data. We use Bayesian statistics to calculate a posterior, which tries to capture “the expected frequency that different parameter values will appear.” The “simulation” part is drawing from this posterior more than 1,000 times to explore the values that are most credible for a parameter (mean, beta weight, sigma, intercept, etc.) given our data.

“Imagine the posterior is a bucket full of parameter values, numbers such as 0.1, 0.7, 0.5, 1, etc. Within the bucket, each value exists in proportion to its posterior probability, such that values near the peak are much more common than those in the tails.”

This all amounts to our using simulations to explore possible interpretations of the data and get a sense of how much uncertainty there is. You can think of each simulation as a little AI that knows nothing besides our data and a few rules trying to fill in a blank (parameter) with an informed guess. You do this 4,000 times and you get the guesses of 4,000 little AIs for a given parameter. You can learn a lot from these guesses. You can learn what the average guess is, between which values do 89% of these guesses fall, how many guesses are above a certain level, how much variation is there in these guesses, etc.

You can even do fun things like combine guesses (simulations) across many models. When we show a graph with a bunch of lines or a distribution of potential values, we are trying to show you what is most plausible given our data and how much uncertainty there is.

Summary

A simulation in Bayesian statistics refers to repeatedly drawing samples from the posterior distribution, which represents the plausible parameter values given the observed data. Each simulation is like a guess from a little “AI” that only knows the data and some basic rules. By running thousands of simulations, we can explore the range of credible values for parameters like means, weights, and variances. This helps quantify the uncertainty in the estimates. Simulations allow us to interpret the data in light of this uncertainty and make probabilistic statements. Graphs of simulation results visualize the most plausible values and their distribution.

Making sense of the State of DevOps Report 2023

Accelerate State of DevOps Report 2023

Introduction

Key Findings

Conclusion

Executive Summary

Prelude

Summary

Culture Aspects

Key Culture Aspects

Summary

Chapter 1 Takeaways

Summary

Operational performance

User-centricity

Summary

Summary

Technical Capabilities Predict Performance

Summary

Technical capabilities predict performance

Summary

Technical capabilities predict performance

Artificial intelligence (AI)

Importance of AI

AI contribution to technical tasks

Summary

Documentation is foundational

Takeaways

Introduction

Summary

Results

Summary

What’s behind this positive effect on the three key outcomes?

Is documentation tied to decreased well-being for some people?

Summary

The Impact of Documentation Quality on Burnout

Resources to Get Started

Summary

Reliability unlocks performance

Takeaways

Introduction

Reliability practices

Summary

Results

Confirming the J-curve of reliability practices

Reliability practices and well-being

Operational performance

Reliability practices amplify team and organizational performance, through operational performance

Operational performance affects well-being

Reliability unlocks performance

Summary

Organizational performance and team performance amplify operational performance

Operational performance amplifies software delivery performance

What’s missing, what’s next?

Summary

Scaling SRE Teams at Google

Summary

Flexible Infrastructure is Key to Success

Takeaways

Introduction

Summary

Computing Environments and Their Impact on Performance

40 Flexible infrastructure is key to success

Cloud type and Organizational performance

Summary

Flexible infrastructures predict higher performance on key outcomes

Flexible infrastructure is key to success

Summary

Cloud Computing and Flexible Infrastructure

Summary

Cloud computing improves well-being

Flexible infrastructure is key to success

Summary

None of this works without investing in culture

Takeaways

Summary

What did we find and what does it mean?

Healthy culture improves key outcomes

Summary

The Impact of Cultural Aspects on Performance