Elixir for Pythonistas pt. 1
Tagged: python elixir
In this post I want to describe what is Elixir and two big reasons why it's interesting for Python developers. In a later post I want to show side by side comparisons of solving certain problems in Python and Elixir.
Elixir first appeared around 2012 so it's a relatively young language. It compiles directly into Erlang's VM (BEAM) bytecode so it leverages everything that Erlang has to offer - stuff like super easy scaling, reliability, and speed. Its syntax is heavily inspired by Ruby and so it's very easy to pick up for someone that's used to scripting languages.
Why do I think it's interesting for Python developers specifically? The first reason is how Elixir approaches parallelism and working with limited resources in the real world compared to Python. The second reason is that Elixir is a functional language.
These two reasons, I feel, translate best into the two greatest pastimes of the Python community - creating web applications and working with data.
The Elixir community offers an industrial strength solution to creating web applications: Phoenix. Due to how Elixir handles things like network IO and parallelism, it is possible to build applications that easily handle everything that the modern web throws at you - tens of thousands of connections, efficient websockets, and easy scaling on multiple machines. Thanks to the script-like syntax, developer productivity is as high as when working with Python.
All of the above is also possible with Python, however I've found that it requires extra work and adds an extra layer of complexity. There's the Twisted and Tornado libraries as well as Python 3's asyncio module. They boost performance pretty well at the cost of learning how to think in callbacks or deferreds and you're still glued to a single thread and no easy way to distribute the load. Another solution here is to run your application on multiple servers behind a load balancer. This works great, but again, it comes with the cost of additional infrastructure.
As much as I love Python for this area of programming, I think Elixir and Phoenix offer too many advantages to ignore.
As for working with data there are two sides to it.
The bad side is that Elixir lacks the rich machine learning/data crunching tool ecosystem that Python has built up over the years. Forget about things like numpy, pandas, scipy, ipython and all the rest of the gang. It's simply absent.
On the good side, Elixir can bring to bear the whole strength of the functional programming paradigm on this problem domain. This means Elixir is great at scraping data, cleaning it, translating it into other formats, and all the other low level operations.
As I've already mentioned above, Python's approach to IO resource usage is pretty good, though awkward. You have tools like ThreadPools/ProcessPools that go a really long way in making this line of work easier, but not without costs such as juggling shared states among threads or having to use primitives like semaphores or mutexes. It takes a while to understand the mental models behind these tools and while they provide a decent boost to performance, it's very easy to get caught on the edge cases and have your script blow up.
Elixir goes a step further and provides the developer with tools for working with parallelism. Check this SO topic to understand the difference between parallelism and concurrency. Creating and managing processes is very easy in Elixir and the Erlang VM do a great job of managing them efficiently for you. Elixir processes are extremely light weight and there's no problem with running hundreds of thousands of them. Thanks to immutable datastructures, we avoid a lot of painful debugging and mysterious thread/process deaths. Not having to deal with classes and objects, you get to work with data like it usually appears in the wild - lists of things that need processing.
Thanks to these traits, Elixir has an edge in processing data compared to Python. You still get the syntax that makes building and exploring easy and you get the tools to make this process fast and production-ready.
Taking into account the downside of missing important data science libraries, I think Elixir could fit in the front of the data pipeline - the part of the pipeline that receives, cleans, and preprocesses data before throwing it to the mature Python ecosystem of machine learning tools.
I hope I piqued your interest. Elixir is a newcomer that promises to make several problem domains much easier to work with; Having toyed around with it for the past few weeks, I'm betting on it. It won't replace Python but I think they complement each other well.
If you're interested for a more down and dirty comparison, I'm working on a post that compares and discusses some code snippets. Stay tuned!