Skip to content

fix(conn): handle overdue timers under Tokio budget exhaustion#2677

Open
syszery wants to merge 2 commits into
quinn-rs:mainfrom
syszery:fix/overdue-timers
Open

fix(conn): handle overdue timers under Tokio budget exhaustion#2677
syszery wants to merge 2 commits into
quinn-rs:mainfrom
syszery:fix/overdue-timers

Conversation

@syszery

@syszery syszery commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This fixes a correctness issue where QUIC protocol timers could fail to fire within a scheduling round under load when Tokio’s cooperative task budget is exhausted.

drive_timer() previously relied on AsyncTimer::poll() to determine whether a deadline had elapsed. However, Sleep::poll() may return Poll::Pending for already-expired deadlines once the task has exhausted its cooperative budget. In that case, handle_timeout() would not be called until a later scheduling round, despite the deadline having already passed.

This addresses the timer starvation aspect of #753.

Fix

The timer logic is changed to:

  • Always check runtime.now() >= deadline before polling the async timer
  • Use the async timer only for future wakeups (waker registration)

@djc

djc commented Jun 10, 2026

Copy link
Copy Markdown
Member

(Not sure why this is draft? Will hold off on reviewing for now.)

@syszery

syszery commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Thanks. I just double checked something. The PR is ready for review now.

@syszery syszery marked this pull request as ready for review June 11, 2026 17:21
@djc

djc commented Jun 15, 2026

Copy link
Copy Markdown
Member

I'd like to have this split in two commits:

  • One to use let-else instead of match
  • Another to change the logic

A bunch of comments were removed -- were those removals motivated or just LLM collateral?

@syszery

syszery commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

The PR is split into two commits now.

The comment changes were intentional:

  • The first two described the old tangled match structure, which I think is easier to follow after the let...else extraction.
  • The "store expiration time" comment was just restating the line below it.
  • The final two became slightly inaccurate after the fix.

syszery added 2 commits June 15, 2026 21:32
drive_timer() used AsyncTimer::poll() to determine whether a protocol
deadline had elapsed. Under Tokio's cooperative task budget,
Sleep::poll() may return Poll::Pending for an already-expired deadline
once the task's budget is exhausted, which can happen when
process_conn_events() drains a busy channel.

As a result, handle_timeout() is not called even though the deadline has
already elapsed. For QUIC, timers such as PTO, loss detection, and idle
timeouts are correctness-critical and should not be deferred to a later
scheduling round.

Fix this by checking runtime.now() >= deadline before consulting the
async timer. The clock is not subject to cooperative budgeting. The
timer remains responsible only for registering a wakeup when the
deadline lies in the future.
@syszery syszery force-pushed the fix/overdue-timers branch from 1821a05 to 881664d Compare June 15, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants