
Ask an agent to fix a bug in your code and you’ll end up staring at the blinking prompt, watching it slowly hack away at the task. You can’t leave it, since it stops every couple minutes and asks for guidance, and you can’t do other things either; the constant interruptions fragment your focus. So, you have to sit and press “Yes” every couple minutes. Or do you?
In this article, I’ll show an alternative way. I don’t claim it is original nor will it likely stay useful for very long, but it’s the best way to make use of agents in my work at this time.
The first order of business is to reduce how often the agents interrupt your focus. Instead of asking you to run commands or check their work, they need to be able to run these checks on their own. The idea is simple and powerful in equal measures: any engineer can set it up so that program output ends up in a place the agent can read, and once it can read it, it can iterate much more independently.
How to set it up for a “pure” program (one that takes some input and produces
some output, without significant side effects) is obvious. It’s only a little
bit more work when there is physical or remote hardware involved, but the idea
is simple: let the agent program the hardware, write test data to it, read the
outputs. If tests require coordination between multiple devices, say an FPGA and
an oscilloscope, then the test server coordinates between multiple
devices—again nothing exotic. See my
test_serv as a slightly more involved
example.
Closing the loop in this way only helps if the agent is not impeded by misguided security restrictions. Read on ...
The security model of current AI coding tools appears to have been devised by lawyers: simply ask the user for confirmation of all potentially dangerous actions. You’ll automatically press “Yes” to anything, but the AI companies are safe; after all, you authorized the action so it’s your fault if things break or you lose all your data.
Luckily, the tools also offer the far more sensible alternative: grant the agents all permissions to do anything whatsoever they want to do:
claude --dangerously-skip-permissions
codex --dangerously-bypass-approvals-and-sandbox
I thought it goes without saying that with agents off the leash in this way, they need to be contained in a restricted environment:
No access to important or non-public files
No passwords or cryptographic keys
No access to corporate databases or shared drives
The easy way to sandbox is to create an unpriviledged account and rely on the operating system to do its part. Fancier options include containers (which also rely on OS-level separation), virtual machines, or simply dedicate an old computer to running AI only.
In my case, I have a dedicated computer with an unpriviledged user set up per agent team. To make best use of the setup, let’s next define agent teams.
Progress so far: an all powerful agent has full access to the code and hardware and will happily work at a task for about half an hour. Big improvement over every-five-minutes interruptions, but not quite autonomous yet.
While I have no hard evidence for this theory, I’ve come to believe that the agents are designed to stop every 15–30 min in order to prevent them from getting stuck in infinite loops. However, engineering is an infinite loop: we iterate on our jobs forever and so should our agents. Recent models (e.g. GPT 5.5) already offer a big step forward in terms of autonomy, but it’s still useful to be explicit about it when setting up the prompt.
I’ve been experimenting with different team compositions, but the following set of agent roles seems to work best:
Orchestrator communicates with me, spawns all other agents, and passes messages between them.
Worker does all the hard work: diagnose what causes a bug, design and implement a new feature, do a quick test that it works.
Verifier checks that the Worker did not stop halfway, that its work did not break something else, that all other tests still work.
With a good “feedback network”, this pattern is able to run unsupervised for hours to days at a time. Even though the Orchestrator is long running, its task is simple and does not consume much of its “context window”. On the other hand, the Worker needs to read a lot of code, form and test various hypotheses, and other such tasks which consume a lot of memory. Thus, the Worker needs to be spawned fresh every half an hour, or wherever it stops. Orchestrator takes care of that so we don’t have to.
The Verifier can run the baseline set of regression tests, or the Orchestrator can. I have not found a big difference either way.
With some tasks I have found that both Workers and Verifiers become lazy and dishonest, disabling or “fixing” tests rather than fixing real root causes of issues. In that case, it may be useful to have a Police agent. It’s prompt instructs it to an adversarial review: check that the tests actually test what they claim to test, check that the other agents did not implement some hidden “shortcuts”. My success with that is mixed; only on a few occasion did the Police agent uncover shady practices—maybe it’s a waste of tokens.
Now that we have a team of agents dedicated to our work we need to ask: who assigns them the work? The human engineer?
With agentic power at our disposal, we need to give it something to do. A year back, I’d write a function call signature and outline and ask ChatGPT to fill in the details. A few months back, I’d give precise instructions describing program-level behavior to implement, and then steer the agent along the route of debugging. In almost every case, this resulted in a great deal of anger: “I told you to not do this, why are you doing that, fix that issue already!?” Lately I’ve been settling on a more peaceful approach: the mission file.
Open a new file and write down the key milestones that need to be accomplished. As a recent example, I have a microcontroller connected to an FPGA via their SPI interfaces, and I’d like to learn what is the fastest reliable data rate that can be transmitted between the two. The “mission statement” in the file, for example, could be: demonstrate that the SPI connection can sustain arbitrary data patterns and sizes in excess of 100 Mbit/s.
The mission is relatively large in scope: program the microcontroller, program the FPGA, validate the FPGA code in simulations and formal verification, make it actually work on hardware. All of that most likely does not fit in the “brain” of a single Worker. They will happily accept the commission, and then hours later, hundreds of thousands of tokens wasted, nothing will be done. The task is simply too large.
Enter the Manager: an agent whose role is to study the next unfinished task in the “mission file”, break it down into smaller, testable steps, and pass the result to the Orchestrator to spawn a fresh Worker with the narrow-scope task. The Manager writes to the mission file and adds the smaller tasks, and if the Worker fails at the smaller, it can re-evaluate and perhaps break the work down even further.
The setup so far—agent teams attacking a big task—will work independently for hours at a time, churn out volumes of good code and rapidly demonstrate meaningful progress towards “mission accomplished”. Then, something happens that makes it seem as if the agents all got drunk: previously working code doesn’t work anymore, Verifier starts accepting bogus solutions as “verified”, the Manager is pursuing tasks quite orthogonal to the stated mission objectives. What to do!?
Much anger and ALL CAPS SHOUTING will obtain when previously docile agents stage an apparent mutiny. At first they did a month’s work in five minutes, and now they can’t move an image on a website an inch higher!? First they correctly implemented the JEDEC flash protocol on the FPGA and now they can’t get a “Hello, world!” to compile anymore!? The codebase is a mess, we’re 35 commits ahead of main and 25 behind, and who knows which version of the code works, if any?
The situation is one that must be avoided from the start rather than fixed after the fact. The key insight is to connect the “mission file” described in the previous section directly with the automated tests. There should ideally be a simple script that mechanically follows a recipe and outputs either a big green “PASS” or a red “FAIL”. On every iteration through the Manager–Worker–Verifier loop, the script must be run to ensure the prior tests all still succeed. On every commit to the repository, the full tests suite must pass. Thus, at any point in the history of the codebase, it should be clear exactly what works and what does not.
When agents get desperate to get stuff done, they will justify to themselves (often even fooling me!) that a certain test must be modified. If they have direct access to the test routines, they will simply remove the offending test and claim success. Thus, the tests must be locked in a way that prevents that, while still allow the agents to add new test cases. One way to do it is to calculate a SHA256 hash of all passing tests, write these hashes to a file that agents cannot modify, and add a commit hook that checks that these “locked tests” are still present and still pass.
If the Manager has done a good job breaking down the work, the agents will be wonderfully productive. They work best with a fresh context and when handling small changes. The testing scheme described above helps to lock this in: each test demonstrates a small, narrowly scoped feature, and if a test case fails, a fresh agent can be easily dispatched to address it. If a regression cannot be fixed, then simply roll back the repository history to a previous snapshot where that test works, and try again.
Agents are chatty and argumentative and it’s very tempting to engage them in dialogue. They’ll complain that the hardware is broken when it’s not, they’ll say they’re missing some permissions they don’t need, and so on. Don’t take the bait.
Instead, take a step back and understand that the agents stopping for a question before their work is done represents a failure of the work pipeline. They are supposed to have all the resources they need to implement the task given. If they do stop, it means that the prompt may have been unclear, the roles ill-defined, or a supervisory agent either inadequate or missing altogether.
When there is a break, therefore, think what the real underlying issue is. Are the role descriptions clear that stopping is not allowed? Is the testing infrastructure really broken? Fix that, clear the agents’ context, and restart the pipeline.
The goal is to have them work independently longer and longer each iteration. Don’t fix their immediate problems; rather, improve the process so the agents get empowered to fix them without disturbing you.
All the tips above combined allow one to make very good (and quick) work of the available tokens. But what to do with oneself? Is there still a role for humans in the creative process?
Plenty of it in fact: read the code produced by the agent when they reach a natural stopping point, such as “mission accomplished” as per the mission files described above. I don’t like reading the code immediately as it gets produced, since there’s simply too much of it and it’ll get changed anyway. But once it’s starting to reach some kind of a final form, it’s a good time to review it.
Second task for human is to be the technician on the bench: connect new hardware devices to the tests, make connections between those devices, create new hardware for the agents to play with.
Third, if there’s still “time and tokens” left over, write new mission files and get new agentic feedback loops started.
Fourth, inevitably the agents will try and stop for a “questions break”. Debug these breaks in such a way that next time it takes them longer to stop.
Fourth, once a significant milestone is achieved, manually inspect that the tests do what the agents claim they do. They are (currently) not to be trusted.
Fifth, write about your experiences and share with the world!

Some problems should not be solved, because they cannot be solved, and therefore must not be solved. Such problems are instead to be let go of.
Examples include:
Changing other people when they have proven themselves unwilling to change again and again. Stop trying to change them and accept them as they are, or else let go of them entirely or, less extreme, distance yourself as appropriate.
Making improvements in products and services that are not strictly necessary. These small improvements come at the cost of more important features or even whole new products. Stop pursuing diminishing returns; it’s an infinite amount of work for a finite and modest benefit. (See also: 80/20 rule.)
Thinking and worrying about things outside my control or knowledge. Either bring it within my control or knowledge, or let go of it entirely.
Letting go of insoluble problems is not laziness; it’s wisdom.

In the previous two articles, we compiled a “blink” test program and started it on the ADSP-21569 eval board. When we observed the blinking on the board, this signalled success. Now we can “close the loop” by allowing the computer to read program output from the DSP, allowing automated testing and agentic coding.
The test setup splits the coding and testing between two separate computers. The “test” machine obtains a program to test, loads the code on the ADSP-2156 eval board, and returns the results back to the “code” machine.
The REST API is delightfully simple in Python. The test machine server outline is as follows:
import os
import random
from http.server import HTTPServer, BaseHTTPRequestHandler
class Handler(BaseHTTPRequestHandler):
<Get>
<Post
HTTPServer(("127.0.0.1", 8080), Handler).serve_forever()The class defines just two functions. The function selects one of the
files at random from the inputs/ directory, moves it to done/, and sends it
to the remote client. (Return 204 if there’s no more test files.)
def do_GET(self):
names = os.listdir("inputs")
if not names:
self.send_response(204)
self.end_headers()
return
name = random.choice(names)
src = os.path.join("inputs", name)
data = open(src, "rb").read()
os.rename(src, os.path.join("done", name))
self.send_response(200)
self.end_headers()
self.wfile.write(data)When the client returns the response, the server just writes it to a filename determine by the POST endpoint:
def do_POST(self):
digest = self.path.rsplit("/", 1)[-1]
n = int(self.headers["Content-Length"])
with open(os.path.join("outputs", f"{digest}.txt"), "wb") as f:
f.write(self.rfile.read(n))
self.send_response(200)
self.end_headers()On the client side, we likewise need two function. On to get the next load stream to test:
def get_job(port=8080):
r = urllib.request.urlopen(f"http://localhost:{port}")
if r.status == 204:
return None
return r.read()And another to submit the response:
def post_resp(ldr, msg, port=8080):
sha = hashlib.sha256(ldr).hexdigest()
urllib.request.urlopen(urllib.request.Request(
f"http://localhost:{port}/{sha}",
data = msg.encode(),
method = "POST"
)).read()For continuous testing/developement use, the client can repeatedly poll for new jobs. If a job exists, the client tests it and immediately requests a new one. If there’s no more jobs, then the client just polls again in a few seconds or minutes.
Blink, as many other programs, runs forever, making it impossible to load any other program without a hard reset. No obvious “reset over USB” mechanism stands out to me in the EV-SOMCRR-EZLITE and EV-21569-SOM schematic diagrams.
However, the EZLITE board comes with three LEDs connected to the GPIO expander that are entirely decorative: DS6, DS7, DS8. We can repurpose one of these (I chose DS8) to reset the whole board by connecting R172 (expander side) to the S3 RESET pushbutton. Then blinking that LED results in a whole board reset, allowing us to “re-flash” with new firmware.
To close the loop entirely, the programs should produce some output. If they
print to UART0, then the Python “poller” can capture the output and send it back
via post_resp(). Simple, but powerful!
The point of this exercise is to allow coding agents to test their output on real hardware without compromising the computer the hardware is connected to. Thus a single computer could control multiple boards and equipment without risking inteference between the agents associated with different hardware. Each set of agents can be separately boxed and run with all permissions granted, while essentially powerless to escape the confinement.

There’s an idea floating around: that speaking too soon—or too publicly—can distort intention, dissipate effort, or corrupt meaning. Yet what silence protects differs: power, purity, clarity, or effectiveness. Here we take a ten-second look at a couple interesting specimens.
Eliphas Levi insists that magical operations must be kept secret because publicity disperses will and invites opposition; the act of speaking converts a focused intention into a social object, weakening its efficacy as an operation of directed will.[1]
Aleister Crowley codifies silence as a discipline of the magician: one must avoid discussing one’s Work because speech leaks energy and entangles the will in ego and external reactions, thereby degrading the precision required for successful magical action.[2]
Franz Bardon treats silence as a technical requirement of mental training: revealing intentions or progress disrupts concentration and allows external influences to interfere with the equilibrium necessary for effective practice.[3]
Napoleon Hill advises keeping plans private until they are realized because external opinions introduce doubt and erode persistence; silence protects the fragile early stage where belief must be maintained without contradiction.[4]
Neville Goddard emphasizes inner conviction over external discussion; speaking about a desire before it is realized shifts attention from imagination to social validation, weakening the sustained assumption required to bring it about.[5]
Jesus explicitly commands that prayer, fasting, and charity be done in secret so that the act is not redirected toward human approval; speaking or displaying it replaces devotion with performance and nullifies its spiritual value.[6]
St. John of the Cross warns that mystical experiences should not be spoken of lightly because language distorts them and invites ego inflation; silence preserves the authenticity of the interior transformation.[7]
This text teaches that God cannot be approached through concepts or speech; silence is required because verbalization imposes false clarity on what must remain beyond understanding.[8]
In the early discourses, the Buddha avoids answering speculative metaphysical questions; silence prevents engagement with views that do not lead to liberation and keeps attention on practical insight.[9]
Dogen treats language as inherently secondary to realization; speaking about insight risks replacing direct experience with conceptualization, so silence helps prevent mistaking description for attainment.[10]
Gollwitzer shows that publicly stating goals can create a premature sense of completion through social recognition; silence preserves motivation by preventing this substitution of talk for action.[11]

The state of an embedded instrument can be represented as a set of parameters with names, values, and other attributes. These parameters change in various ways (user control, programmatically, sensor input) and need to be accessible on a front panel, remote interface, and in the internal subsystems. Some never change (serial number), some rarely (firmware version), some occasionally (user settings), and some need real-time updates (current measurement value). I’m looking for a way to manage these parameters that consumes the least processor time, i.e. adds the least overhead.
If the entire device runs a single program, then the program can use a simple in-memory array (list, hash map) of parameters and use them as appropriate. That’s a less attractive option for more complex instruments where there’s a GUI, several remote interfaces, a web server perhaps. Maybe better to divide the firmware into several tasks?
The different application concerns can be implemented as tasks on a real-time operating system. FreeRTOS, barely more than a “user-space” threading library, is an attractive option because it’s lightweight and easy to use. The tasks share memory and can access a global array of parameters, coordinating the access using a mutex to coordinate access.
This solution works for as long as we are happy to keep the whole instrument firmware compiled into a single executable. But then the smallest change anywhere requires the whole program to be re-flashed and the instrument rebooted. When there’s a lot of little adjustments this quickly becomes slow and annoying. It’d be much nicer to have truly independent programs for different parts of the instrument.
If the instrument supports a full-featured operating system like Linux then we can divide the work into several programs. One could be in charge of drawing the graphical user interface, another would serve as a “SCPI shell”, fielding the remote commands received from USB or Ethernet interfaces. Now that the various programs no longer share the same memory space, the problem arises of sharing the parameters between the programs. If data is received by the sensor monitoring program, how can the GUI app access it to display on the LCD panel?
Two solutions come to mind immediately: POSIX shared memory, or a single central “param server” process that sends/receives data over Unix-domain sockets. The first comes closest to “zero overhead” and is comparable to the bare-metal and RTOS solutions; the second separates domains more cleanly but may come with a performance price. It would be “premature optimization” to discard a conceptually cleaner solution for fear of overhead, so let’s implement both and see how much they cost.
In the “shared memory” approach, the parameter list is defined as a header or library that is compiled into each program that needs access to the parameters. It defines the memory location, structure, and the synchronization primitives that make it possible for several programs to share the data.
For concreteness, let’s assume that a parameter has a name and a value:
struct param {
const char *name;
double val;
// other attributes as needed
};A header file declares an array of these parameters:
static struct param params[] = {
{"parameter_name_1", 0.0},
{"parameter_name_2", 0.0},
// and so on, about 200 items
}We have a choice of several synchronization methods:
No synchronization at all. If the parameters were simple small numbers, the reads and writes are naturally atomic at the hardware level. However, for anything larger (such as the struct above), we’d get torn reads. Nonetheless, this gives the absolute floor for the overhead measurement.
Mutex. The straightforward approach: one pthread_mutex_t with
PTHREAD_PROCESS_SHARED per table or per parameter. Readers and writers both
lock/unlock. Simple, correct, but a single global mutex becomes a bottleneck
when many processes contend — everyone blocks on every access.
Read-write lock. To allow multiple concurrent readers, can use
pthread_rwlock.
Writers take an exclusive lock, readers share. Helps when reads vastly
outnumber writes, but still blocks readers while a writer holds the lock.
Sequence counter. The writer increments a sequence number before and after writing. Readers check the sequence before and after reading. If it changed or is odd, they retry. Writers never block on readers, readers never block on each other. The writer still needs a mutex if there are multiple writers. Seqlocks are cheap for the read path (no syscalls, no atomics beyond loads), but require active spinning.
Next, declare the shared memory layout corresponding to the six ways of synchronizing (mutex, rwlock, seqlock; per-parameter or whole-table locking):
#define MAX_SLOTS 256 // shared memory capacity
#define SHM_NAME "/param_bench"
struct param_slot {
double val;
#if defined(MUTEX_PARAM)
pthread_mutex_t lock;
#elif defined(RWLOCK_PARAM)
pthread_rwlock_t lock;
#elif defined(SEQLOCK_PARAM)
_Atomic unsigned seq;
#endif
};
struct shared {
#if defined(MUTEX_TABLE)
pthread_mutex_t lock;
#elif defined(RWLOCK_TABLE)
pthread_rwlock_t lock;
#elif defined(SEQLOCK_TABLE)
_Atomic unsigned seq;
#endif
struct param_slot slots[MAX_SLOTS];
};The actual locking/unlocking functions are very standard, so I’ll show only the per-table mutex case:
#if defined(MUTEX_TABLE)
static inline unsigned _rbegin(struct shared *s, int i) {
pthread_mutex_lock(&s->lock);
return 0;
}
static inline int _rend(struct shared *s, int i, unsigned q) {
pthread_mutex_unlock(&s->lock);
return 0;
}
static inline void _wlock(struct shared *s, int i) {
pthread_mutex_lock(&s->lock);
}
static inline void _wunlock(struct shared *s, int i) {
pthread_mutex_unlock(&s->lock);
}
#endifNow we write two programs:
Randomizer takes any number of command line arguments and runs a loop at about 60 Hz to write random values for these parameters
Displayer prints all parameters that change to the standard input, also at 60 Hz.
We can investigate the potential bottlenecks by varying the synchronization method and the number of Randomizers and Displayers, and the number of parameters changed by each Randomizer.
On STM32MP135 eval board, with two Displayers and two Randomizers (on randomizing all parameters, the other only two parameters) the results are about the same for all three synchronization primitives, whether per-parameter or per table:
The Randomizer, whether randomizing one parameter or all 200, takes about 0.0%
CPU; i.e., too little to show up in top.
Displayer takes up about 12% if displaying all parameters, which is presumably mostly just the printing overhead rather than parameter access.
Load average varies in the 1.2 to 1.8 range when observed over a few minutes.
In other words, updating 200 parameters at 60 Hz is too light a load to matter! Any locking method is fine and should be decided based on programming convenience—but there they are about the same as well.
We can modify the programs to not throttle the update rate and not do any printing, just count parameter accesses per second. Let’s setup the same configuration as before (two Displayers, two Randomizers: one randomizes all 200 params, the other just 2). Now we can directly report the number of parameter access in Mops/s (million operations per second) across all synchronization methods:
| Method | Granularity | Displayer | Randomizer 2 | Randomizer 200 | Load Avg |
|---|---|---|---|---|---|
| mutex | param | 1.8 | 1.2 | 0.9 | 3.43 |
| mutex | table | 2.2 | 0.7 | 1.3 | 3.61 |
| rwlock | param | 2.0 | 0.7 | 0.3 | 2.81 |
| rwlock | table | 2.6 | 0.3 | 0.6 | 2.55 |
| seqlock | param | 0.0 to 1.5 | 0.8 | 1.4 | 4.03 |
| seqlock | table | 1.2 to 5.0 | 0.8 | 1.4 | 4.06 |
Now the differences show up. Mutexes appear to be “best behaved”: decent overall performance. Both mutexes and rwlocks seems to prioritize readers with variable writer performance: if we lock per-param, it’s best to write fewer params; if we lock per-table, it’s best to write more params.
Seqlocks are the weirdest: the read performance is highly variable, sometimes choking to zero read accesses, sometimes outperforming the mutexes and rwlocks. Strangely, for seqlocks it doesn’t matter whether locks are per-table of per-param. There is a clear explanation: seqlocks have no fairness mechanism, so a flat-out writer can starve readers indefinitely. When the writer runs continuously, it increments the sequence counter on every iteration. The reader captures seq, reads the value, then checks—but by then the writer has already incremented again. The bursts up to 5.0 happen when the OS scheduler preempts the writer and the reader gets a few uncontested iterations in.
The unthrottled measurement consumes altogether 100% of the CPU and represents an upper bound for how many independent parameters the firmware could read and write. For streaming high-speed data, rather than adding more parameters, one would most likely consider a different architecture altogether. However, the firmware designs I have in mind have only tens to hundreds of parameters, leaving us free to consider a less efficient but perhaps cleaner architecture: a single parameter server process communicating to clients over sockets.
Let there be three kinds of programs:
Parameter Server is the keeper of the in-memory parameter list and controls read and write access to other programs via Unix domain sockets
Readers, any number of them, request the values and metadata for some or all of the parameters
Writers, any number of them, modify the value of some of the parameters
Of course some programs will be both readers and writers. The GUI, for example, displays the latest measurements, and allows the user to change the settings.
For benchmarking, we can again consider the Randomizer / Displayer example. Unthrottled, each of the two Displayers handle about 0.4 Mops/s, taking up 12% of the CPU. The 200-param Randomizer handles 0.4 Mops/s and takes up 16% CPU as well. The 2-param Randomizer handles 0.0036 Mops/s using 12% of the CPU—very inefficient! Note that 3.6 kops/s is equivalent to 60 parameters updated at 60 Hz. Adding or removing readers and writers slows down or speeds up the system as expected, but no matter what, the Parameter Server takes up about 45% of the CPU.
Throttled to 60 Hz, Displayers again take up 12% CPU, most of which is printing. Randomizers both oscillate between 0.7% and 1.3% CPU, and the Parameter Server takes up 2.0%. Closing one of the Displayers, the Parameter Server needs only 1.3%. Closing both Displayers, the Server needs between 0.0% and 0.7%. With just the 200-param Randomizer, the Server and Randomizer both need between 0.0% and 0.7% CPU.
For high-speed data served one number at a time, sockets would not work. We could of course try to increase the throughput by sending a lot of data in a single socket call. If pushing the limits of performance, zero-copy alternatives using shared memory is the way to go, using one of the locking mechanisms. Mostly likely mutexes: least confusing (to me), well understood, simple.
For a GUI-throttled set of a few ten or hundred parameters, the client–server architecture using Unix-domain sockets makes for a very clean design: send all changed parameters in one request per frame, not one request per parameter. No need to worry about mutual exclusion; in effect, the Parameter Server is the synchronization mechanism.