SensorSynth FM:
Your Body as the Interface
The Idea
Most instruments have a closed loop: you touch a surface, it responds with sound. The interface is a boundary. SensorSynth FM starts from a different premise: what if the instrument was open to the environment? What if the way you hold the device, tilt it, lean toward it, move through space with it, was itself the performance?
SensorSynth FM is an iPad FM synthesizer that treats every available device sensor as a modulation source. Motion, environment, camera, spatial, and touch inputs all feed the FM engine simultaneously. The approach draws from tools like ZIG SIM Pro, which proved that treating a mobile device as a full sensor bundle (not just a screen with an accelerometer) opens up an entirely different design space. The more environmental data feeding the engine, the more each performance becomes a product of where you are, how you move, and what surrounds you. No two patches sound the same because no two moments are physically identical.
This is my MS UX capstone at Kent State, built with AudioKit and SwiftUI, and it's also a research artifact in its own right: a comparative study of embodied versus touch-grid interaction in live music performance.
The Design Thesis
Paul Dourish's work on embodied interaction draws a distinction that matters here: the difference between interfaces you learn to operate and interfaces that feelinevitable: where the mapping between action and outcome becomes transparent through use. Gesture instruments succeed when you stop thinking about what to do and start just doing it.
The design question I'm pursuing is: what does it feel like when your body isthe instrument? Not when your body is pressing buttons that trigger sounds. When the physical fact of being in your body, moving through space, leaning in and pulling back, is itself the expressive act.
That question changes every UX decision downstream: how onboarding works, how gesture calibration is communicated, what visual feedback means in a performance context, how you design for a mapping that needs to feel discovered rather than taught.
The Architecture
FM Synthesis Engine
The synthesis core is a 4-operator FM engine with 8 algorithms, built on AudioKit 5. FM synthesis was the right choice for this project: it's computationally efficient on mobile, capable of enormous timbral range from a small set of parameters, and it responds to continuous modulation in ways that feel musically coherent, qualities that matter when the modulation source is a human body in motion.
Sensor Mapping Layer
The modulation system ingests every sensor the iPad exposes, organized into four input classes:
Motion: Accelerometer (tilt, orientation), gyroscope (rotational velocity), quaternion (full 3D orientation as a single smooth value), and gravity vector (device orientation separated from user movement). Environment: Magnetometer (compass heading, magnetic field disturbances), barometer (altitude, atmospheric pressure), ambient light sensor (environmental brightness), microphone (ambient amplitude, spectral content), GPS (latitude, longitude, altitude), proximity sensor, and battery level. Camera/Spatial: LiDAR depth mapping (Pro models), TrueDepth camera with ARKit face tracking (52 blend shapes: eyebrow raise, jaw open, smile, each individually mappable), ARKit body tracking (full skeleton pose data), and ARKit device tracking (world-space position and orientation). Touch: Touch coordinate, touch radius, touch pressure, and Apple Pencil input (pressure, tilt, azimuth) for fine-grained stylus control on Pro models.
Each feeds FM parameters (carrier frequency, modulation index, operator ratios, amplitude) through a smoothing layer that prevents abrupt jumps. Sensor polling rate is user-configurable (1, 10, 30, or 60 Hz), which is a design lever, not just a technical setting: higher rates produce more responsive, jittery modulation while lower rates yield smoother, dreamier textures.
The key insight: the more environmental variables you pipe into the synthesis engine, the more semi-random and unrepeatable each patch becomes. Barometric pressure shifts as weather changes. Magnetometer readings fluctuate near metal structures. Ambient light varies with time of day. GPS drifts. These are not noise to be filtered out. They are the environmental fingerprint that makes every performance unique to the moment and place it happens. The instrument doesn't just respond to the performer. It responds to the world the performer is in.
The mapping design is not about precision. It's about feel. A small tilt should produce a subtle drift, not a pitch jump. The calibration parameters for each sensor are documented and tunable, because the right values for a seated studio session are different from the right values for a standing performance.
Three-Screen Interface
The UI is organized into three views across a landscape iPad layout: a Performance View for live playing, an FM Engine View for dialing in the synthesis parameters, and a Sensor Modulation View for mapping physical gestures to audio parameters. The visual language is dark, minimal, and oriented toward live use: high contrast, readable at a glance, nothing you need to hunt for during a performance.
Where It Stands
The architecture is fully documented and the design decisions are made. The Xcode project compiles and runs. The SwiftUI screens are built and working, a native audio engine is confirmed producing sound in the simulator, and AudioKit is integrated as a dependency. The next step is replacing the test sine wave with the actual FM oscillator.
The gap between documentation richness and running code is intentional. This is a documentation-first build, which means the hard decisions (4-operator structure, 8 algorithms, sensor smoothing parameters, threading model, audio safety rules) are already made and recorded. That makes implementation faster and more deliberate than the alternative.
No sensors are flowing to audio yet. No sequencer exists. MPE hasn't been implemented. The project is being honest about that on this page because being honest about process is the point. What's here is a foundation, not a finished product.
Building Without a Development Background
I don't have a software development background. This project is being built with Claude Code as a primary implementation collaborator: I provide the domain expertise, the design decisions, and the judgment about what's musically and experientially right. Claude writes the code.
The workflow documentation is unusually detailed because it has to be: the CLAUDE.md file in the repository contains audio thread safety rules, architectural constraints, and communication preferences that keep the AI collaborator aligned across sessions that don't share context. It's a form of design documentation that most solo developers never need to write. It turns out to be valuable precisely because it forces precision about what the system should and shouldn't do.
The question I'm exploring in practice: can deep domain expertise plus AI collaboration replace a development team for an instrument of this ambition? The honest answer so far is: it can, but the designer has to bring more rigor to the architectural decisions, not less. When you can't read the code to check it, you have to be precise about what you ask for.