Predict eComm Sales With ML + Weather APIs

Every year, the same questions reappear.

Key Questions

When will I make more money, when should I prepare for growth, and, what triggers the shift in demand and how early can I see it coming?

It’s not one question - it’s several, tightly connected. And behind them is a deeper one: can we turn patterns from the past into reliable guidance for the future?

That’s what this article is about.

We’re going to walk through the end-to-end process of building a seasonal prediction system. One that uses historical data, enriches it with weather context, extracts commercial patterns, and then projects forward.

The steps include:

Capturing and cleaning order data
Mapping each order to its weather environment
Structuring the combined data into a daily record
Engineering features for machine learning
Training regression models on weather vs. performance
Forecasting sales and orders using simulated or real weather
Visualising predictions in a way that's clear, contextual, and actionable

You’ll see how each script, file, and dataset fits together - not in theory, but in real application.

By the end, you’ll have a complete blueprint for a predictive sales system - and the confidence to ask smarter questions about the future.

Let’s get started.

From Siloed Insight to Integrated Prediction

Many of the components involved in this process - sales data, analytics, operations, even logistics - typically live in isolation.

In larger organisations, they exist as distinct departments. Each has its own data, its own dashboards, and its own set of priorities.

But prediction doesn’t work that way.

To forecast with any real accuracy, you need to unify these streams.

Not in theory, but in practice. That means:

Sales data that connects to customer logistics
Traffic patterns that inform fulfilment planning
External signals (like weather) tied to internal readiness

The goal isn’t just to model the past - it’s to create a system that reflects how your business actually functions.

That’s what we’re building: a prediction machine - one that:

Uses past data
Compares key commercial periods
Identifies recurring patterns
Connects those patterns to specific outcomes
Forecasts future performance based on structured input

It’s not a dashboard. It’s not a report.

It’s a system.

And now that we’ve framed the challenge, let’s begin where every good prediction starts - with what we already know.

Step 1: Capture Past Sales Data

The first step is building a clean, reliable foundation.

We start by exporting every order placed between January 2020 and today - five full years of transaction history. This is done in quarterly batches from the CMS, filtered to include only successful orders. Once downloaded, we combine all files into a single CSV, resulting in a master dataset containing 48,184 unique records.

We retain the following columns:

Order ID - to uniquely identify each transaction
Date/Time - to anchor each order to a specific moment (critical for weather matching)
Cart Details & Item Codes - to verify uniqueness and help classify order complexity
Delivery Postcode - required to fetch weather data per order location
Revenue (inc. VAT) - to calculate and compare total revenue over time
Status - used to filter only completed, successful transactions

However, raw exports are rarely clean.

Warning: Always validate your data exports before processing. Raw CSV files often contain formatting inconsistencies, special characters, or encoding issues that can corrupt your analysis if not handled properly.

In this case, the CSVs had occasional data spillage - typically caused by special characters (commas, quotes, etc.) in the cart detail fields. These would throw off column alignment and push data into the wrong headers.

To fix this, we built a Python script that scans for misaligned rows using known formatting patterns. When it finds errors, it realigns the columns by searching for anchors: recurring item code formats, expected field lengths or columns that should always have the same values.

This preserves the original data while restoring structural consistency.

The result is a single, validated CSV of orders.

Structured, clean, and ready for pairing with external context.

Step 2: Compare Period With Real Weather Data

With clean sales data in place, the next layer is context - environmental conditions at the time of each transaction.

This step allows us to match what customers bought with the real-world conditions they experienced. By doing so, we open the door to understanding seasonality, demand triggers, and the external factors shaping performance.

Here’s how the process works:

Postcode Validation - First, every delivery postcode is checked against a UK postcode database to ensure it’s valid and mappable.
Coordinate Lookup - Each postcode is converted to latitude and longitude to comply with the weather API’s format.
Timestamp Conversion - Order times are translated into UNIX format, which most weather APIs require for querying historical records.
API Caching - To save time and cost, API calls are cached. Identical timestamps and coordinates are only queried once.
12AM Averaging - Weather data is collapsed into daily averages at 00:00. This ensures consistent granularity across all entries and smooths out fluctuations within the same day.

The result is a clean JSON dataset where each order is now enriched with its corresponding weather data - temperature, humidity, UV, wind speed, and more.

This gives us a fuller picture: not just what was sold, but when, where, and under what conditions.

Step 3: Find Patterns In The Data

Once weather and sales data are matched, we structure everything around a core unit: the day.

Rather than working at the transaction level, we aggregate into daily snapshots. Each date becomes a single JSON object that includes:

{
  "2021-06-14": {
    "total_orders": 317,
    "total_revenue": 18324.98,
    "weather": { ... },
    "orders": {
      "#12345": { "price": 59.99, "cart_items": [...] },
      ...
    }
  }
}

This format captures three critical layers:

Commercial performance - total orders and revenue
Environmental context - a daily weather snapshot
Order-level detail - including prices and cart structure

It’s now easy to inspect trends, run comparisons, and feed the data into training pipelines.

By structuring everything around the day, we reduce noise, keep the model’s temporal focus tight, and create a consistent format that scales across years.

From here, we move toward regression - and real prediction.

Step 4: Connect Events Using Regression Analysis

With sales and weather data structured, we now shift from collection to modelling.

This step begins by eliminating any variables that would unfairly inflate the model’s accuracy - revenue and order count among them. These are the targets, not the inputs. Including them directly would cause data leakage and invalidate the predictions.

Next, we begin feature engineering.

This transforms raw data into signals the model can learn from:

Day of the week and month help identify seasonality.
Weekend flags expose behavioural shifts tied to leisure time.
Rolling averages create a smoother version of the data - ideal for tracking trends.
Lagging values (e.g., yesterday’s sales, last week's UV index) help the model understand inertia, momentum, and time-based dependencies.

Each of these is computed via dedicated functions within the pipeline - modular, reusable, and clearly scoped.

For example, generate_lag_features(df, lag_days=[1, 7, 14]) builds lag-based predictors for different time frames. Another function, add_time_features(df), injects time-based tags like month, weekday, and weekend status.

These scripts aren’t standalone - they operate as stages in a sequence. Once the master JSON is parsed into a structured DataFrame, we pass it through this transformation pipeline before splitting it into training and validation sets.

This step transforms our dataset from historical records into predictive engine fuel.

The model isn’t just learning what happened - it’s learning why.

Step 5: Forecasting the Future

With the training complete, we’re ready to look forward.

Forecasting isn’t just about filling future dates with guesses. It’s about generating synthetic-but-plausible inputs - then seeing how the model interprets them.

Here’s how it works:

We take the trailing two weeks of real weather data and build a forward-looking weather sequence. This becomes our proxy forecast. The script generate_synthetic_forecast.py creates this file - typically output as simulated_weather_forecast.csv - which acts as the new input for prediction.

We use two pre-trained models:

model.joblib - predicts revenue
orders_model.joblib - predicts order count

Both models load the forecasted weather features, run through their respective pipelines, and return predicted outcomes for each future day.

These forecasts are then stored in a structured output - a DataFrame of daily predictions, often including confidence intervals or rolling windows to smooth volatility.

The logic behind this setup is modular:

Forecast generator → produces plausible inputs
Predictive models → consume and score those inputs
Output renderer → stores structured results

The entire pipeline can be run via a single orchestration script (run_prediction_pipeline.py), which ensures reproducibility and sequence integrity.

It’s not magic. It’s math, structure, and intent - layered precisely.

Step 6: Visualising the Forecast

Predictions are valuable. But they become actionable when seen.

Instead of inspecting a spreadsheet full of numbers, we generate a time-series graph that overlays predicted revenue and predicted order volume across the forecast window.

This is handled by predict_and_plot_forecast.py, which uses Plotly to render:

Daily revenue predictions
Daily order volume
Time from forecast start to year-end

It’s fully interactive, making it easy to inspect peaks, compare segments, or isolate anomalies.

This step brings the system full circle. You started with raw orders and weather data. You transformed them into a feature-rich model. Then you generated a forecast. And now, you can see the outcome - clearly, visually, and in context.

This isn’t a static report. It’s a living tool. Something you can re-run, adjust, and iterate as conditions change.

And that’s the power of engineering over guessing.

Conclusion

This project began with a question: When will I make more money? From that, we built a forecasting system grounded in real data — not intuition.

We extracted five years of order history, aligned it with weather data, cleaned and structured it into a usable format. From there, we engineered relevant features, trained regression models, and produced forward-looking forecasts based on simulated weather conditions.

The result isn’t a static report. It’s a modular, versioned system — one that can evolve as inputs change. Add paid media, stock levels, or new feature logic, and the model adjusts accordingly.

It’s not perfect. But it’s functional, testable, and repeatable.

And in many cases, that’s enough to make better decisions — earlier, and with more confidence.