Open Sound Control is a binary protocol for exchanging data between machines.
Open Sound Control is just about the worse possible name for a protocol since I'd thought for a long time that this protocol could only be used to control sounds. Nothing is further from the truth. OSC should be renamed OAC -- Open Anything Control - which would be a far better name since it can be used to control *anything*
OSC-over-UDP is just OSC packed data sent over a UDP connection. My first serious encounter with OSC-over-UDP was when I attended strangloop and talked about The Mess We're In
I bumped into Sam Aaron, the unstoppable force behind Sonic Pi and said that it would be really cool to control Sonic Pi from Erlang. ...
Sam told me that to control Sonic Pi all I had to do was send it OSC encoded messages over UDP.
I was first rather skeptical to using OSC-over-UDP but have rapidly came to believe that this is a really-really good way to proceed. My initial dislike of OSC-over-UDP was colored by my JSON-over-TCP experience. TCP is session based which make life easy (for some of us) and JSON is popular. On the other hand UDP can suffer from packet loss and OSC is an obscure protocol - digging deeper I found myself being more and more attracted to OSC-over-UDP.
To start with OSC is beautiful and what I call “simple by design” - more on this later. A complete OSC encoder is tiny and can be written in any decent language in a few pages of code. My (incomplete) implementation osc.erl is tiny. It's trivial to implement and extremely efficient. JSON on the other hand is not so easy to implement.
I've always been attracted to things that are simple and powerful. Simplicity means they can be implemented in few lines of code. Fewer lines of code mean less possibility of errors and less to maintain.
Session based TCP programs are trivial to write in Erlang but a bugger in any sequential language. TCP uses the idea a session and the only rational way to program a session is as a process or (horrors) as a thread.
In Erlang session management is easier than falling off a log - one session equals one controlling process and as many sub-processes as you feel like.
In most other languages session management involves mutable state concurrency. A program that is a dozen lines of Erlang escalates into a mess of locks and mutexes or callbacks which in most languages is a thin layer over a pthreads implementation.
For those of you who haven't written a multi-threaded TCP socket server in C using pthreads I can only say “don't go there, it's not a pleasant experience” I've been there done that, and have the grey hairs to prove it.
UDP is a lot simpler than TCP and is far easier to use in sequential languages. There are no sessions in UDP only messages. Servers don't have to be multi-threaded to support sessions but can be interrupt driven and run in a single thread. This is wonderful news for systems like Node.js whose concurrency model is non-existent and whose idea of having a fun time is throwing promises into the future and hoping that nobody breaks them.
Well as anybody who has watched Casablanca knows promises can be broken.
For many years I've been saying that it would be nice to collaborate by sending message to each other and not by grocking around in each other's messes. I basically don't want to know how you've implemented your pile of shit, I just want to send it message to tell it what to do. I'd naively assumed that something like Anything-over-TCP would be fine and easy to implement, so I made a system called UBF which is a layered on top of TCP and hoped that everybody would use it.
I guess I'd underestimated the difficulty of implementing Anything-over-TCP in a sequential language. Just because it's really really easy in Erlang doesn't mean to say it's easy in sequential languages.
TCP sessions map 1:1 onto Erlang programs and the programs are trivial. But in a sequential language they map N:1 onto a single process, which is a pain in the butt.
For easy of implementation nothing I know of beats OSC-over-UDP - it's really easy.
My first experience with this was at the Strangeloop conference in St Louis in 2014 - I bumped into Sam Aaron and we got talking. I asked him if I could remote control the Sonic Pi - and he said "sure just send it some OSC messages" we sat down, I Googled a bit and found an OSC library for Erlang and ten minutes later we were collaborating.
I could collaborate with him by sending him OSC messages over UDP and didn't have to understand one iota of how his application was built and structured. It didn't matter all I had to do was understand the messaging protocol.
Time passed and I'd almost forgot ot this - but the other day I started wondering how to make sounds - I'd done this in Fun with Swift and Midi and wondered idly how the Sonic Pi did this.
I though to myself “I know what I'll do, I'll build the Sonic Pi from the sources and in doing so get to understand how the sound generating works.” Well as we all know building from the sources is not easy. I tried and failed and tweeted
I'm now at an impasse - I'm mailed some guys on the Sonic Pi list who have what we think to be an identical setup - only the build works for them and not for me. Does that sound familiar? You bet your sweet lives it does. And why does it work for them and not for me? Answer: Because the initial state of our machines is different and we have know way of describing what the initial state is.
A little more digging in the Sonic Pi code revealed that it was really the SuperCollider scsynth program that was making the noises and that this talked to the Sonic Pi by sending OSC-over-UDP messages - I wonder where Sam Arron got this idea from.
Sam helped me trace the OSC messages to the SuperCollider and after a few hours hacking I could send messages to sysnth with OSC-over-UDP messaging. Now I can build my project without having to build nor understand that internal structure of the scsynth or even the Sonic Pi.
Sam wants to add a few features to the Sonic Pi and we'll try and do this with an OSC-over-UDP component written in Erlang. If this works it will show that we can collaborate without messing with each others code.
All of this made me realize that the conventional way of collaboration is to mess with each others code, simply because it's technically rather complicated to build session based servers using Anything-over-TCP semantics so the way we collaborate is an unintentional consequence of a bad (or nonexistent) concurrency model.
Now I'm quite excited - the SuperCollider, Sonic Pi and Pure Data are all insanely great projects - if we can get them all talking together through communication channels when we can make a new way of interworking not based on the silly idea of performing open brain surgery on other peoples code.
We can send messages to things and ask them to do things.
Sending message to things to get them do do things is the *central* idea in OO programming - As Alan Kay wrote
Pity nobody does this properly for purely local applications.
I've always thought that people should be allowed to program in their favorite programming language - if they like Badtran-7 they they should program in Badtran-7 but If want to collaborate with them I should not be forced to program in Badtran-7.
I like to write my code in Erlang so to collaborate I'll write my code in Erlang you write your code in Badtran-7 and we'll communicate in Anything-over-Whatnot. For ease of implementation OSC-over-UDP looks really good.
To explain why I like OSC I'll first back off and talk about Tag-Length-Value encodings.
Tag-Length-Value (TLV) encodings are used to describe data structures in packets that can be send “on the wire”.
TLV data structures are simple and look like this:
Tag says what the type of the data which follows is, Length is the size of the data and Value the data itself.
One slight problem with TLV encodings is alignment. If we're sending 4 byte integers or 8 byte IEEE floats we'd want the items to be aligned on 4 byte boundaries.
For languages that don't care about word alignment (like Erlang, Smalltalk, and a few others) byte aligned TLVs are efficient and extremely easy to implement.
For word aligned languages, we want to align on word boundaries.
The OSC protocol (Open Sound Control) protocol takes a different approach. It it's Verb-Tag*-Value* encoded.
First comes a Verb which is a zero terminated string padded to a four byte boundary. Then comments a sequence of tags (which is also encoded as a string) then a sequence of values - each value corresponds to a single tag.
The tags are i for an integer d for a double s for a string and so on.
So the tag string iisif means that the values in the packet are int32 int32 string int32 float in that order. Both the encoder and decoder know how these data types are encoded so no additional information is necessary.
The tag string also suffices as a type descriptor that accurately describes the type of the data in the message - *yes OSC is strongly typed*.
The interesting thing about OSC encoding is that:
The last point is interesting - it means that we'll have to restrict our messages to flat data structure built from simple things like integers and strings.
To my mind this is a good thing - this is simplicity by design. Most applications that I have seen do not require deeply nested complex data structures in the communication protocols - and if they use such data structures they've probably been designed by a committee (and yes 3GPP I'm looking at you :-).
Let's compare this to JSON - JSON is flexible, untyped, tricky to parse and represent and wasteful of space “on the wire“. In other words JSON has everything that a wire line protocol should not have.
2014 was the tipping point, where more people access the Internet though mobile terminals (phones) than wire-line terminals (fixed computers). For mobile data, every bit counts. The radio spectrum is a limited resource. Within a given mobile cell the total bandwidth available is a finite and fixed amount, and this must be divided by the number of device in the cell that are simultaneously communicating. This is why everything slows down in peak periods when everybody is connected up at the same time.
It is therefore essential not to waste bandwidth - I think it is totally crazy to send JSON or XML “over the air” since this will degrade the performance of the applications giving a bad user experience and higher bills - since ultimately we pay for every bit of data.
Even in fiber nets we pay one way or another - here the costs are in terms of energy - it uses more energy to encode/decode verbose data structures than well designed ones.
In the Telcomms Industry there's been a great deal of effort to minimize the overheads in communication protocols - ASN.1 sweats blood to save bits - which are then wasted by programmers sending JSON down the wire.
Not only does JSON/XML on the wire waste energy, and costs more - the user experience in a congested net is degraded - applications that minimize net bandwidth will there be more attractive in a congested net than applications that waste bandwidth.
When writing a distributed application where the components send messages to each other, you'd better know well in advance exactly what messages you're going to send and receive and what their types are.
OSC messages with type signatures seems to be the perfect balance between power and expressiveness.
They are expressive - but not too expressive (limiting the types to flat sequences of atomic types) seems a good idea - it certainly gets the job done and are “good enough” for most purposes.
If a protocol cannot be expressed in sequences of OSC messages it probably should not be used.
Finding an appropriate level for encoding messages is difficult.
At a low level of abstractions we could just send integers over the wire but this would be too low level. At a higher level we could use some form of S-expression (like XML or JSON, which are just verbose S-expressions) but this is *too* expressive.
OSC seems to strike the right balance.
OSC has an additional advantage - the internal representation of an OSC message in the programming language of your choice is easy - why is this? Precisely because OSC does not have deeply nested recursive data structures.
If you parse XML or JSON you need to map the parse tree onto some object structure in your language, and since the parse trees in complex, the object in your programming language will be complex.
The intrinsically flat structure of OSC is attractive, since not only the protocols are simple, but the code to handle then will have a simple flat structure - again simplicity by design rather than accident.
We could also stick OSC messages in files, which would be easy to parse and again have the balance of expressiveness contra simplicity.
One measure of how good a protocol is is the size of the implementation and the time it took to write it. As I said implementing OSC is really easy, thanks mainly to the simplicity of the design.
I've written a number of XML parsers in my time, and it is not easy and there are some unpleasant edge cases. JSON parsers are also cumbersome beasts.
To see just exactly how easy this is I've made a lttle GitHub procject where to test these ideas. The (incomplete) OSC encoder is in osc.erl.
The library is being used to connect to three diffent programs and is described in Controlling Sound With OSC Messages.