Async Execution for Spatial Workloads

Within the Local Spatial Processing Patterns framework, async execution is the scheduling discipline that keeps high-velocity geometry off the event loop so a constrained gateway never blocks its own ingestion. Field-deployed IoT nodes pull GPS traces, LiDAR returns, and sensor polygon updates at rates that saturate small ARM SoCs in seconds; processing those streams synchronously stalls the loop, starves the main thread past the hardware watchdog window, and lets telemetry back up until the device resets. The fix is structural: non-blocking acquisition runs on an asyncio loop, while CPU-bound transforms are deferred to isolated background workers that can fail, time out, or be killed without taking the loop down with them.

This pattern sits between the cheap screening done by on-device geometry filtering and the upstream handoff governed by message queue management at the edge. It assumes a Python runtime on a Linux gateway (Raspberry Pi class through Cortex-A industrial modules), 512 MB to 2 GB of RAM, and a CPU that shares cycles with a cellular modem.

Async ingestion feeds a process pool; chunks that exceed the timeout are dropped to protect the loop.

Constraint Mapping

Every knob in this pattern traces back to a hardware limit. The async loop solves an I/O-concurrency problem, but the workers are bounded by physical RAM, core count, and thermal headroom — the same envelope catalogued under device constraints and resource limits. Map the constraint to its lever before tuning anything:

Constraint	Where it bites	Symptom under load	Lever in this pattern
RAM ceiling (512 MB–2 GB)	Per-worker RSS scales linearly with pool size	OOM killer reaps a worker mid-chunk	Cap `MAX_RSS_MB` via `RLIMIT_AS`; bound `MAX_WORKERS`
GIL serialization	CPU-bound geometry on the loop blocks all coroutines	Ingestion latency spikes, queue grows	Offload to `ProcessPoolExecutor`, never compute inline
Thermal throttle (ARM SoC)	Sustained load drops clock, lengthens per-chunk time	More timeouts, then watchdog reset	Shrink `CHUNK_SIZE` when `scaling_cur_freq` falls
Hardware watchdog (5–10 s)	Main thread starvation past the kick interval	Spontaneous reboot, lost in-flight data	Hard `WORKER_TIMEOUT`; drop, don’t await indefinitely
Queue / buffer depth	Producers outrun consumers	Heap growth, then memory exhaustion	Bounded `asyncio.Queue(maxsize=...)` backpressure
FFI context lifetime	Re-creating GEOS/PROJ handles per geometry	GC pauses, allocation churn	Cache `GEOSContext` per worker at init

The non-negotiable number is per-worker RSS: on a 1 GB gateway running four workers plus the modem stack and the OS, roughly 150 MB per worker is the ceiling before the kernel starts reaping processes. Everything else is tuned around keeping each worker under that cap.

Execution Model: Async I/O Against a Process Pool

Spatial operations are CPU-bound and memory-intensive. Coordinate transformations, topology validation, and raster-vector intersections will stall a single-threaded runtime, exhausting cycles and missing real-time I/O deadlines. Python’s asyncio handles non-blocking network and serial I/O efficiently, but it cannot accelerate compute-bound tasks — running a tight geometry loop directly on the event loop blocks every other coroutine until it returns.

The production split routes telemetry ingestion through the async loop and dispatches geometry processing to a concurrent.futures.ProcessPoolExecutor. Processes (not threads) are the correct primitive here: they sidestep the CPython GIL and use multiple cores, at the cost of linear memory growth and serialization overhead at the call boundary. That trade is why chunking matters — batching 50–200 geometries per dispatch amortizes the inter-process copy and lets the OS reclaim a worker’s heap between cycles instead of fragmenting it. The coordinate work itself assumes inputs already normalized per coordinate reference systems at the edge, so workers never pay for CRS discovery at runtime.

The loop drains the queue, builds a chunk, and awaits the pool rather than computing inline. While the workers grind, the loop is free to keep accepting telemetry:

import asyncio
import concurrent.futures
import os
import resource
import logging
from typing import List, Dict, Any

# Edge constraints
MAX_WORKERS = min(os.cpu_count() or 2, 4)
CHUNK_SIZE = 128
QUEUE_LIMIT = 500
WORKER_TIMEOUT = 4.0  # seconds
MAX_RSS_MB = 180

logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")

def worker_init():
    """Initialize FFI contexts and enforce memory limits per worker process."""
    # Set virtual address space limit to cap RSS on constrained gateways.
    # RLIMIT_AS covers virtual memory; use RLIMIT_DATA for heap-only limits on Linux.
    soft, hard = resource.getrlimit(resource.RLIMIT_AS)
    new_limit = MAX_RSS_MB * 1024 * 1024
    resource.setrlimit(resource.RLIMIT_AS, (new_limit, hard))
    # Pre-load GEOS/PROJ contexts here to avoid per-geometry initialization overhead

def process_chunk(geometries: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """CPU-bound worker: FFI-optimized geometry processing. Runs in isolated process."""
    results = []
    for geom in geometries:
        try:
            # Direct coordinate transformation & topology validation via CFFI/ctypes
            processed = apply_crs_transform(geom["coords"], geom["src_crs"], geom["dst_crs"])
            if validate_topology(processed):
                results.append({"id": geom["id"], "status": "valid", "geom": processed})
        except Exception as e:
            results.append({"id": geom["id"], "status": "error", "msg": str(e)})
    return results

async def spatial_ingest_loop(ingestion_queue: asyncio.Queue, executor: concurrent.futures.Executor):
    """Async event loop: drains telemetry, chunks, and dispatches to workers."""
    while True:
        chunk = []
        for _ in range(CHUNK_SIZE):
            try:
                chunk.append(ingestion_queue.get_nowait())
            except asyncio.QueueEmpty:
                break

        if not chunk:
            await asyncio.sleep(0.05)
            continue

        try:
            loop = asyncio.get_running_loop()
            await asyncio.wait_for(
                loop.run_in_executor(executor, process_chunk, chunk),
                timeout=WORKER_TIMEOUT
            )
        except asyncio.TimeoutError:
            logging.warning("Spatial worker timeout; dropping chunk to preserve event loop")
        except Exception as e:
            logging.error(f"Worker pool failure: {e}")
            await asyncio.sleep(1.0)  # Backoff to prevent thermal runaway

# Deployment initialization
executor = concurrent.futures.ProcessPoolExecutor(
    max_workers=MAX_WORKERS, initializer=worker_init
)

The initializer=worker_init argument is what makes the pool edge-safe: it runs once per worker process at spawn, capping address space with RLIMIT_AS and pre-loading native contexts before a single geometry is touched. The wait_for wrapper is the watchdog contract in code — a chunk that overruns WORKER_TIMEOUT is abandoned, the loop logs and moves on, and the device keeps kicking its hardware watchdog.

FFI Context Isolation Inside Workers

Python wrappers around GEOS and PROJ allocate heavily when instantiating thousands of geometry objects in tight loops. For edge deployments, direct FFI via cffi or ctypes cuts Python-level churn and minimizes GC pauses. Pre-compiling CRS transformation matrices and caching GEOSContext handles at worker initialization eliminates repeated C-level setup — exactly the work worker_init exists to do.

Thread-safety is the trap. Compiled spatial handles are not safe to share across coroutines or threads: sharing a GEOSContext across concurrent async tasks will segfault under load. The process-pool boundary enforces isolation for free — each worker owns its own context, instantiated once, used only by that single process. Use the reentrant (_r) GEOS entry points and bind one context per worker:

# Runs inside the worker process only. One context per process, created at init,
# never shared across the async loop or other workers. ctypes binding to libgeos_c.
import ctypes

_geos = None  # module-global, but the module is loaded per-process by the pool

def init_geos_context():
    """Called from worker_init(): bind libgeos and create one reentrant context."""
    global _geos
    lib = ctypes.CDLL("libgeos_c.so.1")
    lib.GEOS_init_r.restype = ctypes.c_void_p
    lib.GEOSContext_setErrorMessageHandler_r.argtypes = [ctypes.c_void_p, ctypes.c_void_p, ctypes.c_void_p]
    ctx = lib.GEOS_init_r()          # per-worker handle, lives for the worker's lifetime
    _geos = (lib, ctx)
    return _geos

def prepared_contains(geom_a_wkb: bytes, geom_b_wkb: bytes) -> bool:
    """FFI predicate using the worker-local context. No allocation of Python geom objects."""
    lib, ctx = _geos
    # GEOSPreparedContains_r(ctx, prepared, geom) -> char; cache the prepared geom
    # for the reference layer rather than rebuilding it per call.
    ...

Filtering and joins share this FFI surface, so the C-level details — compile flags, memory alignment, ray-casting layout — are covered once in implementing polygon containment checks in C++ and reused here. For thread-safe context semantics, the GEOS C API documentation is the authoritative reference.

Configuration & Tuning

The five constants at the top of the implementation are the whole tuning surface. Set them from the device’s measured envelope, not from defaults copied off a workstation:

MAX_WORKERS — start at min(os.cpu_count(), 4), then back off by one if the modem or logging stack needs a core. More workers means more RSS, not always more throughput.
CHUNK_SIZE — 50–200 geometries. Smaller chunks shorten worst-case latency and lower the timeout-drop rate under thermal throttle; larger chunks amortize the inter-process copy. Treat it as dynamic, not fixed (see field diagnostics).
QUEUE_LIMIT — bound the ingestion queue with asyncio.Queue(maxsize=QUEUE_LIMIT) so producers await put() and naturally throttle upstream sensors. An unbounded queue trades a clean stall for an OOM kill. Pair it with retry and backoff for unstable networks on the sync side so a full queue never becomes lost data.
WORKER_TIMEOUT — set strictly below the hardware watchdog interval, with margin (e.g. 4 s against a 10 s watchdog). This is a safety limit, not a performance target.
MAX_RSS_MB — the RLIMIT_AS cap. Size it as (total_RAM − OS − modem) / MAX_WORKERS, then subtract a safety margin.

Pin the resources at the OS layer too. Run the gateway process as a systemd service with MemoryLimit= and CPUQuota= so a runaway worker is contained by cgroups rather than the OOM killer, and read live clock state from /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq to drive adaptive chunking. If you build the FFI shared object yourself, compile the geometry hot path with -O2 -fno-exceptions for a smaller, branch-predictable binary.

Verification & Field Diagnostics

Confirming this pattern works on a deployed device — not in a lab — means watching the loop’s responsiveness and the workers’ resource behaviour directly:

Loop liveness. Enable Python’s asyncio debug mode in staging to surface slow callbacks and unclosed transports. In the field, emit a heartbeat from the loop on a fixed cadence; a missed heartbeat means a coroutine blocked and the watchdog is about to fire.
Queue depth. Log ingestion_queue.qsize() against QUEUE_LIMIT each cycle. Sustained depth above ~80% is the early warning that processing is falling behind ingestion.
Thermal headroom. Poll scaling_cur_freq and the thermal zones; when a zone exceeds ~75 °C, reduce CHUNK_SIZE dynamically rather than letting per-chunk time creep past WORKER_TIMEOUT.
Per-worker RSS. Read /proc/<pid>/status or psutil.Process().memory_info().rss per worker to verify the RLIMIT_AS cap holds. Rising RSS across cycles points at heap fragmentation from repeated GeoJSON parsing or shapely instantiation — use tracemalloc in staging to find the hotspot, then migrate it to FFI-backed array ops.
Pre-filter hit rate. Before dispatching to workers, apply a lightweight bounding-box or coordinate-range check; on-device geometry filtering reduces worker payload by 40–70% in field conditions. Log the discard ratio — a falling ratio means more junk is reaching the expensive stage.

Reference layers that workers join against (administrative boundaries, road networks) should be loaded into read-only memory-mapped arrays so they are shared, not duplicated, across processes; the memory-efficient join strategy lives in spatial joins in constrained environments and its companion on reducing RAM usage for GeoJSON parsing on Raspberry Pi.

Failure Modes & Recovery

Each failure in this pattern has a distinct signature and a safe fallback. The design goal is that no single failure cascades into a crash loop on a duty-cycled device.

Worker OOM kill. A chunk exceeds RLIMIT_AS or aggregate RAM runs out; the kernel reaps a worker and the pool quietly respawns it via worker_init. Detect: a BrokenProcessPool exception or a gap in worker PIDs. Recover: shrink CHUNK_SIZE and MAX_WORKERS, and verify the per-worker RSS budget against the table above.
Timeout drops climbing. Throttle or an oversized chunk pushes processing past WORKER_TIMEOUT. Detect: rising count of the “worker timeout” log line. Recover: the loop already drops and continues; the adaptive chunk-size reduction in step 3 is the standing mitigation. Log dropped chunk IDs so upstream reconciliation knows what was lost.
Watchdog reset. A geometry loop ran inline, or an await was forgotten, and the main thread starved past the kick interval. Detect: spontaneous reboot with no clean shutdown. Recover: audit for any synchronous CPU work on the loop; everything heavy must go through run_in_executor. Add shutdown hooks that flush in-flight buffers and close memory-mapped regions so a reset never corrupts the local store.
Queue overflow. Producers outran consumers and the bounded queue is full. Detect: producers blocking on put(), or dropped sensor reads. Recover: this is backpressure working as designed — throttle the sensor sampling rate (drop to 1 Hz), and shed low-priority telemetry with threshold-based event mapping before it ever reaches the queue.
Thermal runaway. Sustained worker load keeps the SoC hot, which lengthens every chunk, which causes more retries, which adds more load. Detect: clock pinned at minimum, timeout drops and temperature both rising together. Recover: the await asyncio.sleep(1.0) backoff on pool failure exists precisely to break this loop; pair it with a duty-cycle ceiling on MAX_WORKERS.

On-Device Geometry Filtering — the bounding-box and containment pre-screen that shrinks worker payloads before async dispatch.
Spatial Joins in Constrained Environments — memory-mapped reference layers and grid indexes the workers join against.
Threshold-Based Event Mapping — shedding and prioritizing telemetry upstream of the ingestion queue.
Implementing Polygon Containment Checks in C++ — the compiled FFI predicate that runs inside the process workers.
Message Queue Management at the Edge — how processed results leave the gateway over an unreliable link.
Local Spatial Processing Patterns — the on-device processing layer this scheduling discipline belongs to.

Async Execution for Spatial Workloads

Constraint Mapping #

Execution Model: Async I/O Against a Process Pool #

FFI Context Isolation Inside Workers #

Configuration & Tuning #

Verification & Field Diagnostics #

Failure Modes & Recovery #

Related #

Go deeper

Related in Local Spatial Processing Patterns