Every operations team has a version of this problem. Ours came from a mid-sized logistics company based in the north of England. Their operations team β five people β was starting every Monday with the same ritual: manually pulling shipment data from three different carrier portals, copying it into a master spreadsheet, cross-referencing it against their internal order management system, flagging exceptions by hand, and then building a status report that went out to clients every Tuesday morning.
By the time we spoke to them, this process was consuming over 40 hours per week across the team. That's one full-time person, every week, doing nothing but moving data from one system to another β data that was already digital, already structured, already sitting in systems that had perfectly functional APIs.
We built a Python automation pipeline that eliminated that 40 hours. This post is a full account of what we built, how it works, and what we'd do differently.
Understanding the Problem Before Writing Code
The first thing we always do before writing a single line of automation code is map the manual process in detail. This sounds obvious, but most automation projects fail because they automate the existing process without first questioning whether the process itself makes sense.
We spent two days shadowing the operations team. What we found was a process that had grown organically over several years as the business expanded from one carrier to three, and from one client type to four different reporting formats. Nobody had ever stepped back to look at it as a whole. The team had simply added steps as the business needed them.
The full manual process looked like this:
- Log into three separate carrier web portals and export shipment status data as CSV files β one per carrier, done daily
- Open each CSV, reformat the columns to match the internal naming convention (each carrier used different field names for the same data)
- Copy the reformatted data into a master spreadsheet workbook
- Cross-reference against the order management system export to match carrier tracking references to internal order IDs
- Manually flag exceptions: delayed shipments, failed deliveries, items marked as delivered that hadn't been confirmed by the client
- Build four different client report formats from the master workbook β each client had slightly different requirements for what they wanted to see and how they wanted it presented
- Email the reports manually
Before we built anything, we made one recommendation that saved more time than any code we wrote: two of the four client report formats were nearly identical and could be standardised into one. The clients agreed immediately when it was put to them β nobody had ever asked.
That single conversation reduced the reporting workload by around 20% before we touched the codebase.
The Pipeline Architecture
The automation is a Python pipeline with five distinct stages. Each stage is independently runnable, testable and observable β a design principle we apply to every pipeline we build.
Stage 1 β Data Ingestion
The three carrier portals all had different interfaces. One had a well-documented REST API. One had an older SOAP-based API that required a fair amount of XML handling. The third had no API at all β just a web portal where a human had previously been logging in and clicking "Export".
For the first two, we built API client modules in Python that authenticate, request the relevant date range of shipment data, and return a normalised list of shipment records. For the third, we built a headless browser automation script that logs in, navigates to the export function, downloads the CSV to a known location, and parses it. The whole ingestion stage runs on a scheduler β once at 6am every day, well before the operations team starts work.
All raw data is stored immediately after ingestion, before any processing happens. This is non-negotiable in any pipeline we build: if something goes wrong downstream, you can always reprocess from the raw data without having to re-fetch from the source.
Stage 2 β Normalisation
Three carriers, three different data schemas. Carrier A calls it tracking_reference. Carrier B calls it shipment_id. Carrier C calls it consignment_number. Delivery status is expressed as a numeric code by one, a string by another, and a nested object by the third.
The normalisation stage maps all three schemas to a single internal schema: a clean, consistent Python dataclass with validated field types. Every record that comes out of normalisation looks identical regardless of which carrier it came from. Downstream stages don't need to know anything about carrier-specific data formats β they just consume normalised records.
We also run data quality checks at this stage: are required fields present? Are date fields parseable? Are status codes within the expected set? Records that fail validation are quarantined and flagged for human review rather than silently dropped or silently passed through with bad data.
Stage 3 β Enrichment and Matching
Carrier data contains carrier tracking references. The client's order management system uses internal order IDs. Matching them is the core of the pipeline β the step the team was previously doing with manual spreadsheet lookups.
We built a matching service that queries the order management system's API (it had one, though the team hadn't known about it β the system vendor's documentation was poor) and maintains a local cache of the reference-to-order-ID mapping. Most matches are resolved from the cache in milliseconds. New references that aren't in the cache trigger a fresh API lookup and update the cache for future runs.
Unmatched references β ones that exist in carrier data but have no corresponding order in the system β are flagged as exceptions. This is intentional: every exception gets human eyes, not automated guessing.
Stage 4 β Exception Detection
This was the stage the team was most excited about. Previously, someone had to scan through hundreds of rows in the master spreadsheet looking for the ones that needed attention. It was slow, error-prone, and deeply tedious.
The exception detection stage applies a set of rules to the normalised, enriched data and automatically classifies each shipment:
- On track β status consistent with the delivery timeline, no action needed
- At risk β last scan was more than X hours ago for a shipment due today, or current location is inconsistent with an on-time delivery
- Exception β delayed β carrier has marked as delayed, or delivery date has passed without a delivered scan
- Exception β failed delivery β carrier has recorded a failed delivery attempt
- Exception β status conflict β carrier shows delivered but client system has no receipt confirmation
The rules are configurable β stored in a simple configuration file rather than hardcoded β so the operations team can adjust thresholds themselves without needing a developer.
Stage 5 β Report Generation and Distribution
The final stage generates the client reports and sends them. Each client has a report template β a Python object that defines which fields to include, what order to present them in, what the summary section should contain, and where the report should be sent.
Reports are generated as formatted HTML emails with an attached spreadsheet for clients who need one. The distribution list, send time, and format for each client are all configuration, not code. Adding a new client or changing a reporting format is a configuration change, not a development task.
Reports go out automatically at 7:30am every day. The operations team receives a daily digest showing what ran, what succeeded, and what (if anything) needs their attention. Exceptions are highlighted and linked directly to the relevant records.
The Part Most Pipelines Skip: Observability
We spent almost as much time on the observability layer as on the pipeline itself. This surprises some clients, but it's the investment that makes everything else sustainable long-term.
Every pipeline run produces a structured log: how many records were ingested from each source, how many passed normalisation, how many were matched, how many were classified as exceptions, how many reports were generated and delivered. These logs feed a simple internal dashboard that shows pipeline health over time.
Alerts fire when:
- A pipeline stage fails β the team is notified immediately, not the next morning
- The record count from any source drops more than 30% compared to the previous 7-day average β a sign that something has changed at the source
- The exception rate spikes significantly β which could indicate a carrier problem, a data quality issue, or a genuine operational problem worth knowing about quickly
- A report fails to send
The pipeline has been running in production for over a year. The observability layer has caught three issues in that time β one carrier changed their API authentication method, one portal had a temporary outage, and one client's email domain changed. In each case, the team was notified before anyone noticed the problem manually. Two of the three were resolved the same day.
What It Delivered
The numbers, after 12 months of production operation:
- 40+ hours per week recovered β the operations team now spends approximately 2 hours per week on data-related work, down from 40+. Most of that 2 hours is reviewing exceptions flagged by the pipeline
- Reports delivered by 7:30am daily β previously, Tuesday was the only day clients received a status report. Now they receive one every morning
- Exception detection rate improved β the manual process was catching roughly 70% of exceptions (the team estimated, based on client complaints that had slipped through). The automated process catches 100% by definition β every shipment is evaluated against every rule on every run
- One new carrier added in 4 hours β the client expanded to a fourth carrier six months into the engagement. Adding the new carrier to the pipeline β writing the ingestion module, mapping the schema to the normalised format β took one developer half a day. Under the manual process, adding a new carrier would have meant hiring additional headcount
What We'd Do Differently
Three things we'd approach differently on a similar project today.
1. Involve the operations team in the exception rules from day one. We designed the initial exception detection rules ourselves, based on what we observed during the shadowing sessions. Two of them needed significant revision after the first week in production β the team's definition of "at risk" was more nuanced than we'd captured. In future projects, we workshop the rules with the team before building them.
2. Build the configuration interface earlier. For the first two months, changing a rule or a report template required a developer to edit a file and redeploy. We built the configuration interface later in the project. It should have been built earlier β it's what gave the team genuine ownership of the system.
3. Negotiate API access at the start, not mid-project. Discovering the order management system had an API two weeks into the project β after we'd already designed a workaround involving CSV exports β cost us time. Now we spend the first conversation with any client asking specifically about API access to every system in scope before scoping the project.
When Does This Kind of Project Make Sense?
Not every manual process is worth automating. The investment in building and maintaining a pipeline like this needs to be justified by the time it recovers and the errors it prevents.
A rough framework we use to evaluate whether automation is worthwhile:
Strong signals that automation is worth it:
- The task is genuinely repetitive β the same steps, the same data sources, the same output format, run on a regular schedule
- The task consumes more than 5 hours per week of skilled team time
- The data involved is already digital and structured β you're not automating handwriting recognition or unstructured document processing
- Errors in the manual process have real consequences β wrong data in a client report, missed exceptions, delayed responses
- The process is likely to grow in volume β more carriers, more clients, more data sources β so the manual version will get progressively more expensive to run
Signals that automation may not be the right first step:
- The process is poorly defined or changes frequently β automating an unstable process means constant rework
- The data is unstructured or highly variable β free-text fields, inconsistent formats, documents that require human judgement to interpret
- The volume is low enough that the manual process takes under an hour a week β the maintenance overhead of an automated pipeline may exceed the time saved
The Takeaway
The average knowledge worker spends over a quarter of their working week on manual, repetitive tasks. In most cases, those tasks involve data that is already digital, already structured, and already moving between systems that could talk to each other β if someone built the connection.
The operations team in this case study didn't have a data problem. They had a connectivity problem: three carrier systems and one internal system that had never been joined up. The 40 hours a week wasn't the cost of running the business β it was the cost of not having built the plumbing.
If your team has a process that looks anything like this β regular, structured, repetitive, data-heavy β it's almost certainly automatable. The question is usually not whether it can be done, but whether someone has prioritised doing it.
If you'd like a second opinion on whether a process in your business is a good automation candidate, we're happy to take a look. We can usually give a meaningful assessment in a single conversation.
