Prior-Art Ledger for Patent-Safe Coding

That scene has never happened to me. Nothing here is a war story I lived. It is the scene I was imagining the night I decided I could no longer read other people's code the way I had been reading it. I build things that might be patentable, and I live in open source, and for a long time I treated those two facts as if they had nothing to say to each other. The imagined deposition is what made me stop and build a workflow instead.

Because here is the thing I had wrong. I thought the danger was copying. It is not, or not mainly. The danger is that I could read a competitor's implementation honestly, internalize a pattern, build my own version six weeks later from scratch, file on it, and then, three years out, have no contemporaneous record of what I understood and concluded on the days in between. The merits might be entirely on my side. It would not matter much. I would be litigating my own mental state with nothing in hand but memory, and memory loses to a timestamp every time.

The two ways this goes wrong

There are two failure modes, and they pull in opposite directions, which is what makes this hard.

The first is ignorance. You build something genuinely interesting, you file, and during prosecution the examiner surfaces a repo from 2022 that implements your core mechanism under an MIT license. Now you are spending money arguing around prior art you could have accounted for, or your claims get rejected outright. Filing fee, attorney time, the whole prosecution arc: wasted, or worse, wasted and a little embarrassing.

The second is the deposition I opened with. Contamination. You read the competitor's code while researching, you build later, and the dates line up badly enough that someone can tell a story where you adopted a pattern from prior art and filed on it as if it were yours.

For a while I thought the lesson was "read less." Stay clean by staying ignorant. That instinct is exactly backwards, and it took me a while to see why.

Why reading the code is the protective move

Novelty in patent law is defined against the prior-art landscape. If you do not know that landscape, you do not know where your novelty actually lives. You will write claims that are broader than the art allows, not because you copied anyone, but because you never looked. Those claims get rejected, you narrow them under pressure, and you walk away with weaker protection than if you had done the homework first.

The deeper point is the one that flipped me. Reading prior art with discipline gives you contemporaneous documentation of your own novelty analysis. A dated line that says, in effect, on this day I read this approach and here is exactly how mine differs. That entry, stamped before your filing, is evidence. It is the difference between being ambushed by prior art during prosecution and walking in already holding your distinction, written down, dated, never reconstructed under time pressure.

Avoiding the art does not protect you. It just moves the collision somewhere more expensive and strips you of the one document that would have helped. The builders who run clean IP read everything relevant. They just record what they read and what they concluded.

The license gate goes first, always

Before I read any open-source code as part of IP-sensitive research, I check the license. Two minutes. Skipping it can manufacture a problem you cannot talk your way out of later.

The rule is blunt: GPL, AGPL, and LGPL carry patent-related risk that MIT and Apache-2.0 do not. The GPL family ships explicit patent-retaliation clauses, where bringing a patent claim related to the licensed software can terminate your license to use it, and some readings extend that to work merely interoperating with GPL code. AGPL adds network-use provisions that can spread copyleft obligations in ways that interact badly with a patent strategy. MIT and Apache-2.0 are the safe-to-read tier, and Apache-2.0 even carries an express patent-license grant that cuts in your favor when you are reading it as documented prior art.

So the first field in any ledger entry is the license. If it comes up GPL or AGPL, I stop and get explicit legal sign-off before reading. That is not a call I make on my own. The interaction between copyleft and patent strategy is fact-specific and needs professional advice. What I can do mechanically, without a lawyer, is recognize the gate and not walk through it without checking. It applies to small repos, to "just the README," to archived projects. The check is always first.

What a ledger entry actually is

It is not a notes file. It is a structured, dated record with one job: contemporaneous evidence of what I read, what I pulled from it, and where I drew my novelty boundary at that point in time.

Each entry pins a sequential id so I can cross-reference it from an invention disclosure later. It records the date read, which is the evidence anchor, the thing that proves the reading predates the filing. It names the source precisely, full URL, commit hash, DOI, patent number, because vague references are weak evidence. It logs the license and an explicit note that the license was checked, an affirmative record that I ran the gate. Then three substantive fields in my own words: a short description of what the system does, the abstract patterns I extracted, and the field that carries the whole weight, my novelty-boundary analysis. Something like: the individual mechanisms here are prior art as of this date; my contribution is the composition and the priority-resolution logic that binds them. A status field tracks where the entry sits, documented when written, disclosed once it has been formally handed to a patent attorney for an Information Disclosure Statement.

Store the file in version control. The commit timestamps are part of the evidence. Never backdate an entry. The moment you backdate one, the whole ledger stops being a record and becomes a liability.

What I extract, and what I refuse to

This is the line that keeps the ledger clean and the position defensible.

I extract ideas. Architectural patterns, algorithmic approaches, interface contracts, problem framings, the conceptual mechanism underneath a system's behavior. The abstract things a skilled engineer would recognize from reading. I do not extract implementation. No copy-paste, no line-by-line translation into another language, no reproducing the specific sequence of operations at the code level.

That maps onto how the law splits ideas from expression. An algorithm is an idea. A specific implementation is an expression of it. Prior-art analysis lives at the idea level. The real question is never "did you copy this code." It is "did you arrive at this approach independently, or did reading it contaminate your conception." So if I read a fingerprint-based loop-detection approach and later build my own from first principles, working from what the mechanism achieves rather than the code in front of me, that is independent development of a concept that happened to already exist. The ledger documents exactly that: read it, understood it, recorded it as prior art, noted it in the novelty boundary before writing a line. What I refuse to do is read code and then write code that walks the same structural sequence. That is the thing that looks bad in hindsight no matter what my intent was.

A worked example (fictional, on purpose)

To keep this concrete without exposing anything real, here is a fictional invention: a routing system using preference-weighted ensemble voting for multi-model task assignment, with dynamic weight decay based on outcome recency. It is not a real product. The point is the paper trail it leaves.

Before writing code, I run prior-art searches. First, multi-armed bandit LLM routing, which surfaces the ADWIN drift-detection work, window-based statistical change detection used in some routing systems (Bifet and Gavaldà, SIAM SDM 2007). Academic publication, no code-license issue. Entry PA-001 documents the mechanism and draws the boundary: ADWIN handles distribution shift via threshold-crossing window detection and reselection; my weight decay is continuous and recency-weighted and needs no threshold event. Same problem, different mechanism. Second search, ensemble voting across models, surfaces llm-council, an MIT-licensed library that routes queries to several models and aggregates via peer-review voting. Entry PA-002: llm-council addresses post-response aggregation; mine addresses pre-dispatch routing assignment. Different layer. Third search, preference learning for model selection, surfaces two RLHF-based preference-modeling papers. Entries PA-003 and PA-004.

Four dated entries before I write a line of implementation. The boundary is explicit: the sub-mechanisms exist in prior art; the composition and the specific weighting approach are the claim. Three months later, when the patent attorney asks what the prior art is, I hand over four dated entries with analysis. Claims get drafted to the composition. Prosecution goes faster. The novelty argument is pre-built instead of improvised.

Why this strengthens prosecution

An examiner's job is to find art that anticipates or renders your claims obvious. If you have already found the relevant art, dated your analysis, and disclosed it, you have done a real piece of that work and shown your claims were written to account for it. It flips your posture from reactive, the examiner found X and now we argue around it, to proactive, we disclosed X and here is the distinction. Examiners respond differently to those two situations.

There is also a duty-of-disclosure dimension. Applicants are required to disclose material prior art they are aware of, and a maintained ledger helps you meet that systematically rather than from memory months later. Discuss the specifics of your IDS obligations with your patent attorney. That part is not optional. But the engineering workflow that produces the documentation is entirely within your control.

The asymmetry, and the honest caveat

The value is lopsided in your favor. If you never file, the ledger cost you a couple of hours per significant review session and left you a useful research artifact. If you do file, the dated record of your novelty analysis is the kind of thing that changes prosecution outcomes and makes any future litigation posture dramatically cleaner. Start the ledger before you think you will need it. The date on the first entry is the evidence.

And then there is the deposition I opened with, the one that never happened to me. The difference between that being a bad afternoon and being a catastrophe is whether, when counsel lays your git history next to your browser history, you have a dated entry that already says what you read, when, and exactly why your work is distinct. I would much rather hand them that than try to reconstruct what I was thinking twenty-two days before a commit, under oath, from memory.

This post describes an engineering workflow, not legal advice. Patent prosecution, IDS obligations, license risk analysis, and inequitable conduct exposure are fact-specific legal questions that require professional counsel. Nothing here should be read as a substitute for working with a registered patent attorney on any specific filing or IP strategy decision.