python

Further adventures in parse-land

I’ve tried a few more parser tools to try and achieve what I think is a fairly simple parser, and it feels like we’re in some kind of before time or something. Performance, ease of use, and end-user-friendliness: pick half of any one of these.

Tree-sitter is a really great choice if your project is in- or adjacent-to-javascript. Rust-sitter looks interesting, but has some severe drawbacks and I dislike how it surfaces the use of recursion to implement repetition the way it does.

I think what I’m going to do is create a small collective project for trying out different parse tools in different languages (C++, Rust, Go, possibly Python or Ruby for contrast, and maybe I’ll throw in tree-sitter and antlr although the threat of a Java runtime always makes me step away from actually using Antlr.

Pro-tip: Write Python like Python

My last post accused Python of being The Slow of the Internet, not because Python is bad but because bad Python is awful.

In many cases, Python is really not slow for the reasons you think it is

Python is a great glue language, a terrific scripting language, because it provides fantastic facilities for manipulating bulky amounts of data. The terrible language that makes our day-to-day lives slower and more miserable is actually anti-Python.

There are two sides to the Python problem: non-engineers using it to write runtime descriptions of data manipulations performed by non-python backends, and engineers writing it as an expose of their non-python backends.

Between the two groups, nobody is really here for Python.

Python, the slow of the internet.

Unpopular Opinion: CPython is stupidly slow. CPython is the Python you’re using if you don’t know which Python you use.

Before Go, Python had taken a firm hold of the systems admin coding, and huge amounts of Linux tooling is written in Python.

During the Great Python 3 Migration of 2019, Python libraries bloated with people introducing bidirectional compatibility, generally by just grabbing some 3rd-party libraries to minimize the footprint of change.

I’m not going to rant about people not knowing the standard ‘dis‘ module exists, or they don’t know about timeit/%timeit… It’s not really an “optimization” issue tho.

Today’s Linux admin activities are agonizingly slow because so many Python developers hear adages about not optimizing Python code they think that you never need to worry about it, so they have no idea how expensive some very common practices are.

Sadly, CPython makes no-need-for-performance-thinking untrue in one really unfortunate detail, one detail that has been agonizingly inflated by the bloat of compatibility code:

Function call overhead :(

The code from this post is in a Jupyter notebook in my github, here.

If you want to interact with it (run it for yourself), you can either use an online notebook viewer (e.g https://nbviewer.jupyter.org/), or Visual Studio Code has really nice support for notebooks, now.

The golang example is here.

Erlang

A little while ago I bought Seven Languages in Seven Weeks because I’m a language geek. Being pressed for time lately, I’ve not really had chance to more than dabble with it.

I dipped my toe into Prolog a bit, and finally got my head around it – and realized it’s of no practical use to me.

I’ve struggled with the book, though, because the author comes across as one of those most annoying types of Java programmers, a believer: even as he notes that Erlang is about robustness, reliability and fault tolerance, he notes that it is “not […] on the [Java Virtual Machine]” (page 207, Integration) and that “the JVM does come with baggage, such as a process and threading model that’s inadequate for Erlang’s needs. But being on the JVM has a set of advantages , too, including the wealth of Java libraries and the hundreds of thousands of potential deployment servers.” (page 207, Integration)

Taken on it’s own, he could just be looking for cons to list in his pro/cons wrap up for Erlang. But, the Java refrain lasts throughout the book. It just seems inappropriate for a book that otherwise appeals to me as a well thought out exploration of some of the more interesting current languages.

The investigations of each language are fairly short, but where this book pays off is in the sort of shared exploration of those languages. If you can take the languages in order, there’s also a plan to the madness. The venture into Prolog turns out not to be worthless; rather it helps provide a foundation for the ventures into Erlang and Haskell etc.

Python: Not the way to write Python, it seems ;)

(This is not a hate-on-Python by a Python-hater, this is a smirk-at-Python by a recent Python convert)

I’d mentioned yesterday that Python comes with a hefty cost for function calls.

In my foot-wetting with Python, it has seemed that “Pure Python” modules are often highly prized. But glancing at the Python 3.1 What’s New notes, it seems that writing Python in Python isn’t the best way:

The new I/O library […] was mostly written in Python and quickly proved to be a problematic bottleneck […] the I/O library has been entirely rewritten in C and is 2 to 20 times faster depending on the task at hand.

Note: This post is tounge-in-cheek; I’m well aware the real reason Pure Python modules are valued is an extra level of flexibility they deliver through the propagation of various Python language facilities.

Take that, Python.

Yes, I’ve started writing code in Python. The big breakthru for me was realizing that I can overcome one of my worst issues with python, the lack of visible scope annotation, with comments.

def myfunction(somearg):
#{
    print("Yep this is my function")
#}

How come I’m programming in Python at all?

Sorting sucks.

Sorting and parallelization… Yeuch.

You can get significant performance gains by parallelizing various sort algorithms cleverly, but ultimately there’s some data on the left that needs to be on the right, and vice versa.