How Rate Limiting from an ESP Caused Email Delivery Delays and the Priority Queue System I Built to Ensure Timely Notices

In the fast-paced world of automated systems and digital communication, reliability isn’t just a nicety—it’s a necessity. This is especially true for platforms that deal with critical event-driven notifications such as account alerts, system warnings, and transactional emails. This article explores how unexpected rate limiting from our Email Service Provider (ESP) disrupted our communication infrastructure, causing significant delays in email delivery. I will also unpack the priority queue system I designed to mitigate future disruptions and maintain guaranteed delivery of urgent notices.

Contents

TLDR:Understanding the Problem: When Reliability Breaks Down Analyzing ESP Rate Limits The Case for Prioritization Core Principles of the New Queue System Architecture of the Priority Queue Deployment and Observability Lessons Learned Conclusion

TLDR:

We experienced noticeable delays in our email delivery system due to unforeseen rate limiting by our ESP. The rate limits weren’t well-documented or monitored, exposing a blind spot in our infrastructure. To address this, I built a scalable priority-based message queue that classifies emails according to criticality, ensuring that time-sensitive messages take precedence even under load. This system has not only improved reliability but also offered deeper insights into delivery performance.

Understanding the Problem: When Reliability Breaks Down

Everything was functioning normally until early one Monday morning when we began receiving a cascade of internal notifications alerting us of failed or delayed email sends. Our monitoring scripts—designed to flag unusual behaviors in transactional routines—had noticed that time-sensitive emails, such as password reset links and fraud alerts, were taking several minutes or even hours to reach recipients. Upon further investigation, logs revealed that our ESP was silently rate limiting our outbound messages, leading to exponential retries and message queue backlogs.

While rate limiting by an ESP is a known variable, the issue stemmed from two core failings:

Insufficient observability: The ESP gave no immediate or actionable feedback on delivery delays. The API responses returned 2xx statuses, making it appear as though messages were accepted successfully.
Lack of prioritization in our sending pipeline: All messages were treated equally, regardless of urgency or impact. A weekly newsletter would have the same queue weight as a fraud-alert email.

Without any mechanism to distinguish and fast-track critical communications, our system choked under load, creating a bottleneck that spiraled into delayed services and frustrated users.

Analyzing ESP Rate Limits

Most ESPs implement rate limits to protect their infrastructure and optimize throughput. In our case, the official documentation provided only vague information about maximum requests per second and failed to mention how specific message types might trigger internal throttling.

What complicated matters further was the non-deterministic behavior of the rate limiting. Messages were rejected inconsistently, causing exponential backoff algorithms to delay non-critical emails unnecessarily and fill our retry queue.

To diagnose this, we began aggregating API logs, summarizing per-minute request volumes, and cross-referencing SMTP-level data with our internal metrics. Armed with this data, it became clear that we needed a smarter message classification system.

The Case for Prioritization

Not all emails are created equal. Some must reach users instantly—think of two-factor authentication codes, password resets, and security alerts. Others, like promotional announcements or internal reports, can tolerate latency. This realization led to the fundamental design shift needed in our architecture.

I introduced a Priority Queue System to replace our single-threaded, first-in-first-out (FIFO) message pipeline. Inspired by job scheduling algorithms in operating systems, this system aimed to categorize and sequence email sends based on business-criticality.

Core Principles of the New Queue System

Priority Classification: Emails were grouped into three tiers—Critical, Important, and Informational. Each category had its own delivery SLAs.
Concurrency Control: Separate worker pools managed delivery tasks for each priority class, with more resources allocated to higher priority emails.
Rate Limit Awareness: We implemented a dynamic backpressure mechanism that adjusted message flow based on real-time feedback from the ESP API and SMTP responses.

Architecture of the Priority Queue

The new system was built using a Redis-backed job queue (we used BullMQ for Node.js) with separate queues for each priority tier. Workers pulled tasks from high-priority queues first. If those queues were empty, they would move to the next lower priority.

Each email send task included metadata such as:

Email type (transactional, promotional, alert, etc.)
User impact level
Failure retry policies

We then established a simple YAML-based policy file that defined routing and behavior rules for each email category:

email_alerts:
  priority: critical
  retry_policy:
    max_attempts: 5
    backoff_strategy: linear
newsletters:
  priority: informational
  retry_policy:
    max_attempts: 2
    backoff_strategy: exponential

Critical tasks had stricter retry limits and faster escalation in case of ESP-side issues. Informational emails, by contrast, had more relaxed policies and would be dropped from queue if they failed repeatedly.

Deployment and Observability

After development and in-house testing, we deployed the new priority queue system behind a feature flag. This allowed us to monitor performance without full migration and quickly revert if issues arose.

Within the first 24 hours, we recorded a significant drop in average delivery latency for high-priority emails—from a peak of 800 seconds to under 20 seconds—even under identical outbound traffic volumes.

To provide complete transparency into ongoing operations, we built a dedicated dashboard displaying:

Per-queue message counts and age distribution
Error rates per priority class
Real-time rates of ESP response codes (2xx, 4xx, 5xx)
Retransmission attempts and failure logs

This observability layer allowed our support team to proactively diagnose delivery issues and eventually led us to negotiate higher throughput tiers with our ESP based on concrete usage metrics.

Lessons Learned

The experience was not just a technical challenge but an operational wake-up call. It highlighted how fragile communication systems can become when built on assumptions about external vendor behavior. Here are some key lessons:

Never rely solely on HTTP/S API response codes for success verification. Always validate delivery downstream using SMTP logs or bounce tracking.
Design for failure before failure happens. Incorporate fallbacks and prioritization even if everything looks stable today.
Good observability changes everything. Real-time insights empower quicker and smarter decisions.
Rate limits aren’t evil—they’re signals. They tell you when it’s time to scale intelligently or negotiate better terms with providers.

Conclusion

What started as a frustrating email outage became a transformative moment for our infrastructure. By shifting away from a flat delivery pipeline and embracing prioritization, we safeguarded our most important communications and delivered better service to our users. More importantly, we turned invisible backend processes into measurable, improvable engineering systems.

In a world where digital systems grow more interconnected, letting a third-party bottleneck define your user’s experience is a risk no serious engineering team can take lightly. Rate limits may be out of your hands—but how you handle them is always within your control.