Ryan Tomayko’s I like Unicorn because it’s Unix should be required reading for anyone doing anything involving networks or unixes these days. Like Ryan, I share a deep appreciation for the dark art of Unix system calls, and like Ryan I’m a bit dismayed to see them relegated to the dusty corners of our shiny dynamic languages.
So I read I like Unicorn because it’s Unix with glee; it’s perhaps the cleanest, clearest explanation of how preforking socket servers work, and I enjoyed seeing Ruby’s twist on the old standard.
Since I’m a Python hacker, though, and since I had a couple minutes, I thought it’d be an interesting exercise to port Ryan’s code to Python. So, with no further ado, here we go:
I’ve tried to keep the control flow the same as the original, even going so far as to copy some of Ryan’s comments verbatim so you can see what matches up, and what doesn’t.
Update:
As suggested in the comments, I’ve written a more Pythonic version would use Python’s built-in SocketServer library. I still prefer the above for didactic purposes: it does a much better job showing off the low-level concepts and system calls.
There’s some interesting differences — Ruby’s use of a block to differentiate the forked child is beautiful — but the important takeaway is that this stuff is easy in any modern programming language. If you do stuff on the ‘net, you should know this POSIX stuff.
Dunno about you, but I’m going to spend my evening reading Unicorn. Looks like there’s lots of nifty stuff to learn there.
Comments:
You should try and pick up some Unix books from before ~1997.
eg. Unix Programming Environment by Brian Kernighan and Rob Pike circa 1984.
Another good one would be Advanced Programming in the Unix Environment by Richard, Wright, and Stevens circa 1992.
I did similar stuff in plain C and Java back in the college days, and sure... it's beautiful how the idioms translate into a higher level language.
The Python version is, really, like a C one, but without the non-sense or plain boring parts. I like how some included batteries (os, socket) expose low-level functionality while keeping the simplicity.
I'm not sure what I'm doing wrong, but your echo server isn't behaving quite like the Ruby example you based it on.
When I run yours, the client hangs till I terminate it, and then the server prints what the client sent, but doesn't echo it back to the client.
I tried this with both Python 2.5 and 2.6 on Ubuntu 9.04. Am I doing something wrong?
If I read Java's documentation correctly (http://java.sun.com/j2se/1....)), you do need to explicitly close the socket on exit, because you just might be running on Jython or some other non-CPython implementation and garbage collection might not happen on exit, so close(2) might never get invoked on the socket.
What sort of box did you write/run this on? No flush(), no attempt to turn off buffering, and read() instead of readline()?
See Dave K's recommendations -- the Unix APIs are powerful, but a little care is needed to use them correctly.
The trick is that a lot of these bits are hard to get right in terms of platform neutrality. My favorite system call that Python exposes is mmap(2) simply because of its versatility.
If you are able to target a common environment and you don't mind getting your hands dirty, the ctypes library allows you to do some really interesting stuff without having to dust off your gcc skills.
I would slightly disagree with Erik's comment about platform neutrality. With few exceptions, most of Python's exposed system calls are part of the POSIX standard. Thus, if you stick to those, code is likely to work on virtually all Unix-based systems.
Also, you definitely want to get a hold of books by W. Richard Stevens for advanced usage of these features. I'm pretty sure that preforking servers are covered in detail in his "Unix Network Porgramming, Vol 1" book.
Joe: thanks for the suggestions. I did indeed miss a flush() call, and you're right that using readline() is more Pythonic. That might explain what Jon's seeing.
These are certainly tricky APIs to get right, and frankly I didn't try very hard. This isn't meant to be a full-featured server, just a simple demo of how this stuff fits together. Ryan's article -- especially the parts after the code -- go into the next steps in making something that actually, you know, works.
Jacob, nice work with the port. After reading Ryan's article I was interested in porting it to python as well, but you beat me to it :). I was also experiencing the hanging problem that Joe was talking about. Turns out you were missing a close() call to flo. I updated the gist here:
http://gist.github.com/204099
Thanks, Jeff. Yeah, it was just the missing close() call. Your updated gist works fine.
Definitely a good port, and a nice working example of socket APIs. Some people recommended some earlier books that dig into some of these hardcore details. I'd highly recommend any book by W. Richard Stevens. Unix Network Programming used to be my Unix, socket, threading bible.
Just for your information, waitpid won't actually wait for all children. It'll wait for one child, then return that PID and its exit status (which you can then use to extrapolate the process's return code with os.WEXITSTATUS among other things).
What you should do is make a set of all child PIDs, then make a loop, for each returned child, remove from the set, and when empty, exit.
Alternatively, you could set up a signal handler for SIGCHLD which would then reap the children in a similar way. This would allow the master process to, say, watch for changes in code and restart its children upon such an event.
Also, if I'm not mistaken, makefile() will increment the reference count of the underlying file descriptor. If so, your code would leak file descriptors.
Just to close the conceptual loop: Unix is C. http://gist.github.com/204301
Maybe you kids should just have read C10K ten years ago (that's not an exaggeration) and you wouldn't be marveling at network programming basics? :) http://www.kegel.com/c10k.html
One thing that isn't documented at unicorn is unix or in this code is the parameter given to the listen() call. That's the backlog parameter described in the listen(2) man page.
The backlog parameter specifies the depth of the queue of incoming connections on the socket.
As long as we're tossing about recommended reading, Understanding Unix/Linux Programming, by Bruce Molay, is up there with any other text mentioned.
A simple question: what is the idea behind the exit call after the loop block in the ruby version?
PHP version of the script - http://gist.github.com/240095
Leave a comment: