Semantic Loop Detection: Catching Stuck AI Agents
The Problem: Stuck Agents That Look Productive
Semantic loop detection exists to catch a failure that hash-based loop detection cannot see: an AI agent that keeps changing what it says while making zero real progress. Your agent is trying to fix a bug. It generates a patch, runs the tests, they fail. It reads the error, generates another patch - different variable names, different line numbers, same underlying logic. Tests fail again. Third attempt: it wraps the fix in a try/except. Still fails. Same root cause, untouched. By attempt seven, it has consumed 40k tokens, written and reverted the same file four times, and is no closer to a solution than it was on attempt one.
A naive loop detector - a seen_actions set that hashes the action string - reports no loop. Each action string was different. The agent is technically not repeating itself. But it is absolutely, structurally stuck.
Takeaways:
- Hash-based detection misses stuck agents because syntactic novelty masks zero progress.
- A four-dimension semantic fingerprint stays stable under rephrasing but moves on real progress.
- A two-step escalation ladder (escalate, then halt) bounds the tokens a stall can burn.
This is the class of failure that hash-based detection cannot catch: semantic stalls, where syntactic novelty masks zero progress. The agent burns tokens and time, and depending on what it’s allowed to touch, can damage state while doing so.
Why Hash-Based Detection Misses It
The hash changes when the text changes. That’s it. That’s the entire failure mode.
Consider three consecutive actions from a stuck agent. The first edits a config file to change a timeout from 30 to 60. The second edits the same line of the same file, rephrasing the change as setting the timeout to 60 seconds where it was previously 30. The third shifts to a different file and applies the same timeout value at the call site instead.
Three different SHA-256 hashes. Three different action strings. A seen_actions set sees three distinct entries and reports no loop. But look at what actually happened: all three actions are attempts to solve the same timeout misconfiguration. Actions 1 and 2 are literally the same edit with rephrased prose. Action 3 shifts the edit site but leaves the actual bug - say, a hardcoded default upstream - untouched. The error on the next test run will be identical.
The fundamental distinction is between syntactic novelty (the string changed) and semantic progress (the underlying situation changed). Loop detection needs to operate at the semantic level. That requires a richer representation of what the agent is actually doing, not just a hash of how it described it.
The Multi-Dimensional Fingerprint
KARIMO (github.com/opensesh/KARIMO) publishes a four-dimensional fingerprint approach: action_class, files_touched_hash, state_hash, and normalized_error_hash. Each dimension catches a different class of stall.
action_class is a coarse category - “file_edit”, “test_run”, “shell_exec”, “search” - not the full action string. This collapses syntactic variants of the same operation into a single token. Actions 1 and 2 from the example above both become ("file_edit", ...). The phrasing difference disappears.
files_touched_hash is a sorted, normalized hash of the file paths involved. It captures where the agent is working. An agent that keeps touching config.py and client.py in every attempt is structurally looping, regardless of what it claims to be doing.
state_hash is passed in by the caller - a hash of whatever constitutes the agent’s current environment state (test results, file checksums, conversation context). If the state isn’t changing between actions, nothing is working. Two identical state hashes across different action strings is a hard signal.
normalized_error_hash is the key one for bug-fix stalls. You strip line numbers, file paths, and memory addresses from the error output, then hash what remains. The error type and message template stay. This means AssertionError at line 47 and AssertionError at line 52 (after a refactor shifted lines) hash to the same value. The agent is still hitting the same error, even if the traceback shifted.
Combine all four into a tuple. Hash that tuple. Now you have a fingerprint that is stable under syntactic variation but sensitive to genuine progress.
The Escalation Ladder
Detection without response is just logging. KARIMO’s pattern pairs the fingerprint with a count-based escalation ladder:
- count >= 3: The agent has fingerprint-matched three times. Escalate to a more capable model or a different prompt strategy. The current approach is not working, but a stronger model might break the stall.
- count >= 5: Halt and surface to a human. The stall has persisted through escalation.
The asymmetry in costs justifies the two-step design. At count=3, a false positive costs one unnecessary escalation - slightly more expensive, but the agent continues. At count=5, a false positive interrupts a run that could have completed; a false negative lets an agent burn resources with no path to progress. That asymmetry makes 5 a defensible halt point for most task types and 3 a cheap early warning.
The Design
The whole detector fits in a small module with no dependencies beyond the standard library, and it decomposes into three concerns: normalization, fingerprinting, and streak tracking.
Error normalization does the heavy lifting for the bug-fix stall case. Before an error message contributes to a fingerprint, you strip the tokens that change between attempts but carry no semantic weight: line numbers collapse to a placeholder, hex addresses collapse to a placeholder, absolute file paths reduce to bare filenames, and ISO timestamps collapse to a placeholder. What survives is the error type and message template - the part that signals “still the same failure.” Two tracebacks that differ only in line numbers after a refactor normalize to the same string.
Fingerprinting is a pure function: it takes the coarse action class, the sorted-and-hashed set of files touched, the caller-supplied state hash, and the normalized-and-hashed error, joins those four components, and hashes the result down to a short, compact identifier. Same inputs always yield the same fingerprint; there are no side effects. This is the representation that stays stable under rephrasing but changes the moment the underlying situation genuinely changes.
Streak tracking is the stateful part. The detector keeps a small circular buffer of recent fingerprints (useful for debugging and replay, though not strictly required for the logic), the most recent fingerprint, and a running count of how many consecutive times that fingerprint has repeated. Each recorded action computes a fresh fingerprint; if it matches the previous one the streak increments, otherwise the streak resets to one. The detector returns a signal carrying the current count plus two booleans derived from it - escalate once the streak reaches three, halt once it reaches five - and a reset method clears the streak history after a genuine state change so intentional iteration doesn’t accumulate a false stall.
Demo Walkthrough
The agent is stuck because its edits never change the environment. The action strings differ - rephrased, annotated differently - but the four-dimensional fingerprint is identical each time because the files touched, the state hash, and the normalized error are all unchanged. This is exactly what naive string-hashing cannot catch.
Picture five attempts against the same config file, all hitting the same timeout assertion failure, with the environment state hash never moving. The agent’s descriptions vary on every pass - change the timeout from 30 to 60, set it to 60 seconds, increase the request timeout parameter, bump it in config, update the value because the previous fix was incomplete - but each attempt feeds the detector the same action class, the same file, the same state, and the same normalized error. The detector sees the streak climb: the first two attempts pass through as routine, the third trips the escalate threshold and signals “try a stronger model,” the fourth stays escalated, and the fifth reaches the halt threshold and surfaces to a human.
Five syntactically distinct action strings. One semantic fingerprint. The naive seen_actions set reports five unique actions and raises no alarm. The KARIMO fingerprint sees the same (action_class, files_touched_hash, state_hash, normalized_error_hash) tuple five consecutive times and halts at count=5.
Where This Misses
Intentional retry patterns - exponential backoff on a flaky API, or a polling loop waiting for a job to complete - will trigger false positives if the error message and state are stable across retries. You need to either set a looser threshold, call detector.reset() on known retry patterns, or pass a different action_class (“api_retry” vs “file_edit”) to isolate them.
Planned iteration cycles (write test -> run -> fix -> run) will produce alternating fingerprints and won’t trigger the streak, which is correct behavior. But if the fix step stops making progress while the test/run steps keep looping, you may need a secondary detector scoped to just the fix-class actions.
Threshold tuning is task-specific. An agent doing exploratory code search might legitimately re-examine the same files three times from different angles. A threshold of 3 is too tight there; 7 or 8 might be more appropriate. Wire the thresholds through config rather than hardcoding, and tune per task type.
Citations and Repo Structure
The four-dimensional fingerprint pattern - the combination of action class, a hash of the files touched, a state hash, and a normalized error hash - is published by KARIMO at github.com/opensesh/KARIMO. Verify the current license at the repo before reuse. The approach described here is an independent rendering of that pattern in minimal standard-library Python, not a port of KARIMO’s code.
A clean implementation breaks into three pieces: a compact detector module holding the streak logic and signal type, a short synthetic walkthrough that drives a stuck agent through the escalation ladder, and a test suite exercising streak counting, error normalization, the threshold edges, reset behavior, and false-positive isolation for legitimate retry and iteration patterns.