Skip to content
Knowledge beta

Flow Metrics Guide

“Standardized work is a framework for continuous improvement. People are not doing their job if they leave standardized work unchanged for a month.” — Taiichi Ohno

Flow metrics help software teams visualize and optimize the flow of value through their delivery system. By measuring a single piece of value’s journey from idea to customer, teams can identify bottlenecks, improve predictability, and deliver more consistently.

This guide provides a practical approach to implementing flow metrics, starting with essential measurements and gradually building sophistication as your team matures.

Why Flow Metrics Matter

Software development methodologies typically focus on product features and the coordination of teams to deliver them in a timely fashion. They measure the final outcomes generated by the teams’ efforts so only results are visible. The inner workings of the team are both opaque and framed in the language of their particular specializations. It’s a “black box” to the outside organization; it can only see the behavior, not the structure.

Within the teams, there is a need to understand the flow of work through their development process so they can accurately represent the delivery capacity of the system outward, and potentially make changes internally to improve it. There are several desirable properties of development:

  1. Value Orientation: Ensure activity and outputs align with customer outcomes and business impact.
  2. Capacity: Accurately know how much work can be done in a given timeframe.
  3. Predictability: Understand both the duration and variability to deliver typical work.
  4. Continuous Improvement: Processes change over time to maintain or improve the first three items.

Flow metrics are output metrics; they measure raw outputs of a system, without consideration of the value delivered. As such, they are ideal for the last three properties of that list, as they can quickly classify the health of delivery in a way that is easily communicated to others. It is up to the team to maintain the alignment with outcomes by controlling the work that flows through the delivery system.

Implementation Practices

“The fact that we can set numeric objectives, and track them, is powerful, but in fact it is not the main point. The main purpose of quantification is to force us to think deeply, and debate exactly, what we mean so that others, later, cannot fail to understand us.” — Tom Gilb

Metrics are a tool. They become useful when they are part of a closed-loop feedback system that measures a process and makes incremental, empirical changes. This section outlines baseline expectations for using measurements to drive improvement experiments.

For Leaders: Virtual Gemba Walks

A Gemba walk brings leaders to where work actually happens, allowing them to observe, understand, and improve. In software, this means visiting the digital spaces where code is written, reviewed, and deployed. We consider this an essential practice for managers (typically EM and Director-level).

The Five Steps:

1. Clarify Your Purpose. Choose one focus per walk: flow efficiency, developer experience, release readiness, or customer feedback loops. Focused observation builds trust; scattered observation feels like surveillance.

2. Understand the Workflow. Before observing, understand the intended process: review the team’s workflow (backlog, develop, review, test, deploy), understand working agreements (Definition of Done, WIP limits), and know the tools (Jira, GitHub, Jenkins, etc.). You can’t identify gaps without knowing what good looks like.

3. Go to the Place (the “digital gemba”). Visit where work actually happens:

  • Team board: How does work flow? Where does it stick?
  • Pull requests: Review cycle time, comment quality
  • CI/CD dashboard: Build success rate, deployment frequency
  • Monitoring tools: Error rates, performance metrics
  • Support queue: Customer-reported issues

Focus on work in progress, not just completed items.

4. Observe and Compare. Look for patterns, not exceptions. PRs consistently waiting 2+ days for review? Stories aging in “In Progress”? Builds failing more frequently? Ask open questions: “I noticed deployments dropped last week; what happened?” Curiosity unlocks insights; judgment creates defensiveness.

5. Collaborate on Improvements. Frame opportunities as team-owned. Don’t say “Why aren’t reviews happening faster?” Do say “How might we reduce review wait time?” Start small: one experiment at a time, measure impact before adding more changes, celebrate learning not just success.

For Teams: Prerequisites for Success

1. Define “Done” Clearly. Does your code have appropriate test coverage? Does it meet code quality standards? Is it deployed to production? Is there monitoring in place?

2. Size Work Appropriately. Maximum: 1 week of work. Target: 1-3 days. Split larger items. See Small Slices of Value.

3. Make Work Visible. Single source of truth (board). All work types represented. Clear state definitions.

Team Checklists

Weekly Metric Review (15 minutes, end of sprint):

Retrospective Deep Dive (monthly, end of release):

  • Analyze flow time outliers
  • Review flow distribution
  • Adjust WIP limits if needed

Avoiding Common Traps

  • Measuring Without Action: Don’t collect metrics without using them for improvement.
  • Feedback, not Target: Remember Goodhart’s Law; when a metric becomes a target, it ceases to be a good measure.
  • Too Many Metrics: Don’t add complexity before mastering basics.
  • Perfect Data: Don’t wait for ideal measurement instead of starting with good enough.
  • Individual Metrics: Don’t use flow metrics to evaluate people rather than systems.
  • Comparison Without Context: Don’t compare teams without considering their different contexts.

Essential Metrics for Lean/Agile Teams

Use these core metrics initially to provide the most benefit without taxing the team’s focus. Stabilize these metrics before expanding into more complex measurements.

1. Flow Load (Work in Progress)

Definition: Count of work items in active or waiting states at any point in time.

Why it matters: Too much WIP increases context switching, delays delivery, and hides problems. This is Little’s Law in action.

Target: 1-2 items per person for development work.

Implementation:

  • Set WIP limits for your board columns (e.g., max 3 items “In Progress”)
  • Leave 20% slack in sprint commitments for unexpected work
  • Limit product backlog size to maintain focus

Example: A 5-person team might set a total WIP limit of 8 items, ensuring some capacity for collaboration and review.

2. Flow Throughput (Throughput, Velocity)

Definition: Count of items completed in a given time period (sprint, week, month).

Why it matters: Measures your actual delivery capacity, essential for planning and forecasting.

Target: Consistency is more important than volume. Look for steady or gradually improving trends.

Implementation:

  • Track items completed per sprint (velocity for Scrum teams)
  • Use rolling averages (last 3-4 sprints) for planning
  • Count items, not points, as you mature

Example: A team that consistently delivers 15-20 items per sprint is more predictable than one that oscillates between 10 and 30.

3. Flow Time (Lead Time)

Definition: Days lapsed from engineering starting a work item until it is completed.

Why it matters: Directly impacts customer satisfaction and business agility.

Target:

  • Features: 1-2 weeks
  • Bugs: 1-3 days
  • Emergency fixes: < 1 day

Implementation:

  • Measure from “Ready for Work” to “Done”
  • Track percentiles (50th, 85th, 95th) not just averages
  • Investigate items exceeding the 85th percentile

Example: If your 85th percentile is 10 days, you can confidently tell stakeholders most work is completed within 2 weeks.

4. Flow Distribution

Definition: Percentage breakdown of completed work by type (features, bugs, tech debt, security).

Why it matters: Ensures balanced investment across value delivery, quality, and sustainability.

Target: Varies by context, but a healthy balance might be:

  • Features: 60-70%
  • Bugs: 15-20%
  • Tech debt: 10-15%
  • Security/compliance: 5-10%

Implementation:

  • Tag all work items by type
  • Review distribution monthly
  • Adjust if drifting from the intended strategy

Examples:

  • A team spending 50% of its efforts on bugs likely has quality issues to address.
  • A team spending less than 10% of its efforts on improvement is unlikely to change its delivery capability.

5. Sprint Completion Rate

Definition: Percentage of committed work completed within the sprint (control limits).

Target: 80-90%. Regularly achieving 100% might indicate over-conservative planning. Never exceeding 100% might indicate over-aggressive planning.

Why it matters: Indicates planning accuracy, focus, and a mature refinement workflow.

6. Average Item Size

Definition: Typical size/complexity of work items your team accepts.

Evolution path:

  1. Start with story points (1, 2, 3, 5, 8)
  2. Correlate points to actual flow time
  3. Evolve breaking work to roughly equal items (“right-sizing”)
  4. Eventually just accept items as “small and valuable enough” and count them

Target: 1-3 days per item for mature teams.

Key Visualizations

Cumulative Flow Diagrams

Cumulative flow diagrams (“CFDs”) are a visualization of the flow of work through a process with multiple sequential steps. (These steps are often performed in parallel; do not confuse them with stages.) The count of items in each step are stacked vertically, with time as a horizontal axis. The heights, widths, and slopes of the bands indicate the relevant flow in each state or group of steps:

  • Lead Time — The horizontal distance from the start of one step to the end of a step. The total width across all steps is also referred to as “Flow Time.”
  • Cycle Time — The horizontal distance across all process steps for work that had been started.
  • Backlog — The height of planned work that has not yet been started. Depending on how the process steps are defined, the backlog may rise through multiple steps.
  • Work in Progress (WIP) — The height of a single step or set of steps that identify work that has entered the system but has not yet exited it.

From these raw measurements, several additional values can be derived:

  • Arrival Rate — The amount of work entering a step per system per time interval. This is the leading slope of any step. The slope of the top edge of the graph indicates the rate of work entering the whole system.
  • Completion Rate (Throughput) — The amount of work leaving a given step per time interval. This is the trailing slope of any step. The rate of work moving to “Done” represents the total throughput of the system.

The power of this diagram is that the absolute values for the axes are largely arbitrary; the shape of the curves reveals the nature of the system at a glance. Deeper inspection of the process steps based on the characteristics of the curves can reveal a variety of process adjustments, including communication improvements, rebalance of team roles, automation opportunities, or changing process steps altogether.

Reading Cumulative Flow Patterns

Once flow measurements can be visualized, several characteristics of the diagrams indicate typical behaviors of the system:

  • Parallel bands indicate steady, predictable flow. Most systems rarely maintain this level of stability, but it represents the ideal.
  • A “bulge” appears when work arrives in a process step faster than it departs. This is a bottleneck, indicating a lack of sufficient effort, resources, or material for this step. This represents a high level of Work in Progress. Attaching WIP limits to steps can detect this problem earlier.
  • Narrowing bands show that work is departing a step faster than it arrives. Over time, this creates starvation for later steps. Typically this indicates the need to improve flow by shifting effort to or improving efficiency in the upstream steps.
  • Diverging bands show a system that is overwhelmed: throughput has slowed, as a step is ingesting more work than it is capable of handling. This is a system under pressure, and due to Little’s Law, the amount of work in progress will lead to both longer cycle and longer delivery times. WIP limits are badly needed here.
  • Large gap between top and bottom bands illustrates how pre-planned backlogs start a large amount of work up-front without completion. Consider adding a step that shows actual items that are in scope, or deferring planning until the last responsible moment. Maximize the work not performed.

Caveat: These diagrams all have the property that no work is ever removed from the system or flow backward. If work processes allow this, some short-term slopes or progress can appear negative. If this is undesirable, a workaround is to track a terminal “Discarded” state and stack it under “Done.”

Control Charts

For processes with variable-size work sizes or durations (either human efforts or automated systems), understanding and compensating for that variability can improve the throughput of a system. A control chart is a tool that assists in detecting trends in erratic data to place it in context. Typically, values are plotted horizontally along a time axis, with three measurements calculated:

  1. Average — The arithmetic mean of all the values over time
  2. Upper data value — The maximum value ever measured
  3. Lower data value — The minimum value ever measured

Two Control Lines (Upper/Lower) are added as “smoke lines”: When values are observed outside of these limits, the team or organization makes adjustments to the system to bring the values back within the acceptable range.

Averages, High/Low value, and control limits can be drawn as instantaneous over the whole of time, or as a historical graph to help identify larger trends.

Burnup Charts

Development teams typically track remaining effort, since the question “How much work will be completed by the deadline?” is a common refrain from stakeholders outside the team. This represents an output-focused view fitting the internal context, but doesn’t describe the value delivered. Burndown charts (line trends right and toward zero) are ideal for displaying remaining effort.

In the business context, it’s convention that “line goes up” is a good result, and “line goes down” is not. When reporting value delivered (either in terms of completed features or money generated), a burnup is more welcomed.

Burnup charts may also show the total expected value as a horizontal line above the tracking, representing the theoretical total value that might be delivered. This can also be adjusted historically to show where new goals were added, or removed.

Causal Feedback Loops

When considering what metrics to use, there are situations where:

  • No single measurement represents the goal
  • A metric might be used as governing constraints to limit waste
  • Multiple applicable metrics might contradict each other
  • Precise metrics might be unavailable or expensive to collect

In these situations, it can be valuable to create a map of how different factors of the system influence each other. One of these visualization types is the causal loop diagram, which illustrates the relationships between measures, particularly how they interact with or against each other.

For example, a causal loop diagram showing how increasing total work in progress slows down a system illustrates Little’s Law visually. Increasing WIP increases cycle time, which increases time to deliver. Increased cycle time also increases technical debt, which further reduces product investment capacity, which feeds back into longer cycle times. Contrarily, limiting WIP and investing in maintaining product coherency in the codebase reduces overall delivery time.

With a causal relationship established, the following are now possible:

  • Lower-order leading (output) metrics might act as proxies for hard to accumulate or lagging indicators
  • Multiple metrics might be used as bound on an improvement effort
  • More nuanced metrics may be calculated in order to more accurately represent the tradeoffs chosen for an experiment

Quick Reference Card

When sharing cumulative flow information outward, adding a “key” to the data helps align context. These values set expectations so that someone reading the diagram can answer “Is that expected/typical?” with minimal assistance.

MetricFormulaHealthy RangeRed Flags
Flow Load (WIP)Count of active items1-2 per person>3 per person
ThroughputItems completed/sprintSteady or improving>25% variation without attributable cause
Flow TimeStart date to Done date3-8 DaysIncreasing trend
Flow Distribution% by type60%+ features>30% bugs
Sprint CompletionCompleted/Committed80-90%<70% or 100%

Note: The numerical values in this table are examples and must be adjusted to match the team’s own policies.

Automation

Most flow metrics are readily available through tracking tools (Jira, etc.). However, there are a couple guiding principles for expanding team data collection:

  1. Data collection requires a small amount of rigor on the part of a team (for instance, annotating work item types for categorization). Where possible, make the annotations small, easy to add, and easy to change. Accept that some small amount of error will always be present.
  2. When metrics require additional manual tracking, analysis, or engineering to collect, use a proxy metric if possible while running initial experiments. Verify the correlation with the expected measurements as the experiment completes. Only undertake the effort of building the larger metric once the value of collecting the data is verified.

Resources

References