Saturday, January 27, 2018

Kicking into gear from a distance

The last few weeks I have reworked the way Evennia's startup procedure works. This is now finished in the develop branch so I thought I'd mention a little what's going on.

Evennia, being a server for creating and running text-games (MU*s), consists of two main processes:
  • The Portal - this is what players connect to with their clients.
  • The Server - this is the actual game, with the database etc. This can be shutdown and started again without anyone connected to the Portal getting kicked from the game. This allows for hot-adding new Python code into the running Server without any downtime. 
Since Evennia should be easy to set up and also run easily on Windows as well as on Linux/Mac, we have foregone using the linux process management services but instead offered our own solution. This is how the reload mechanism currently looks in master branch:

Here I've excluded connections irrelevant to reloading, such as the Twisted AMP connection between Portal and Server. Dashed lines suggest a more "temporary" connection than a solid line.

The Launcher is the evennia program one uses to interact with the Server in the terminal/console. You give it commands like evennia start/stop/reload.

  • When starting, the Launcher spawns a new program, the Runner, and then exits. The Runner stays up and starts the Portal and Server. When it starts the Server, it does so in a blocking way and sits waiting in a stalled loop for the Server process to end. As the Server and Portal start they record their current process-ids in .pid files
  • When reloading, the Launcher writes a flag in a little .restart file. The Launcher then looks up the Server's .pid file and sends a SIGINT signal to that process to tell it to gracefully shut down. As the Server process dies, the Runner next looks at the Server's .restart file. If that indicates a reload is desired, The Runner steps in its loop and starts up a new Server process. 
  • When stopping, everything happens like when reloading, except the .restart file tells the Runner that it should just exit the loop and let the Server stay down. The Launcher also looks at the Portal's .pid file and sends a SIGINT signal to kill it. Internally the processes catch the SIGINT and close gracefully.
The original reason for this Server-Portal-Runner setup is that the Portal is also reloadable in the same way (it's not shown above). But over time I've found that having the Portal reloadable is not very useful - since players get disconnected when the Portal reloads one can just as well stop and start both processes. There are also a few issues with the setup, such as the .pid files going stale if the server is killed in some catastrophic way and various issues with reliably sending signals under Windows. Also, the interactive mode works a little strangely since closing the terminal will actually kill the Runner, not the Server/Portal - so they will keep on running except they can no longer reload ...
It overall feels a little ... fiddly.

In develop branch, this is now the new process management setup:


The Portal is now a Twisted AMP server, while the Evennia Server and Launcher are AMP clients. The Runner is no more.

  • When starting, the Launcher spawns the Portal and tries to connect to it as an AMP client as soon as it can. The Portal in turn spawns the Server. When the Server AMP client connects back to the Portal, the Portal reports back to the Launcher over the AMP connection. The Launcher then prints to the user and disconnects. 
  • When reloading, the Launcher connects to the Portal and gives it a reload-command. The Portal then tells the Server (over their AMP connection) to shutdown. Once the Portal sees that the Server has disconnected, it spawns a new Server. Since the Portal itself knows if a reload or shutdown is desired no external .restart (or .pid) files are needed. It reports the status back to the Launcher that can then disconnect.
  • When stopping, the Launcher sends the "Stop Server" command to the Portal. The Portal tells the Server to shut down and when it has done so it reports back to the Launcher that the Server has stopped. The Launcher then sends the "Stop Portal" command to also stop the Portal.  The Launcher waits until the Portal's AMP port dies, at which point it reports the shutdown to the user and stops itself.
So far I really like how this new setup works and while there were some initial issues on Windows (spawning new processes does not quite work they way you expect on that platform) I think this should conceptually be more OS-agnostic than sending kill-signals. 

This solution gives much more control over the processes. It's easy to start/stop the Server behind the portal at will. The Portal knows the Server state and stores the executable-string needed to start the Server. Thus the Server can also itself request to be reloaded by just mimicking the Launcher's instructions.
 The launcher is now only a client connecting to a port, so one difference with this setup is that there is no more 'interactive' mode - that is the Server/Portal will always run as daemons rather than giving log messages directly in the terminal/console. For that reason the Launcher instead has an in-built log-tailing mechanism now. With this the launcher will combine the server/portal logs and print them in real time to easily see errors etc during development.

The merger of the develop branch is still a good bit off, but anyone may try it out already here: https://github.com/evennia/evennia/tree/develop . Report problems to the issue tracker as usual.

Friday, January 5, 2018

New year, new stuff

Happy 2018 everyone! Here's a little summary of the past Evennia year and what is brewing.

(Evennia is a Python server- and toolbox for creating text-based multiplayer games (MU*)).

The biggest challenge for me last year Evennia-wise was the release of Evennia 0.7. Especially designing the migration process for arbitrary users migrating the Django auth-user took a lot of thought to figure out as described in my blog post here. But now 0.7 is released and a few initial minor adjustments could be made after feedback from daring pilot testers. The final process of migrating from 0.6 to 0.7 is, while involved, a step-by-step copy&paste list that has worked fine for most to follow. I've gotten far fewer questions and complains about it than could be expected so that's a good sign.

Working away on the boring but important behind-the-scenes stuff made me less able to keep up with more "mundane" issues and bugs popping up, or with adding new "fun" features to existing code. Luckily the Evennia community has really been thriving this year; It feels like new users pop up in the support channel all the time now. The number of pull requests both fixing issues and offering new features and contribs have really picked up. A bigger part of my time has been spent reviewing Pull Requests this year than any other I think. I would like to take the opportunity to thank everyone contributing, it's really awesome to see others donating their time and energy adding to Evennia. The Hacktoberfest participation was also surprisingly effective in getting people to create PRs - I have a feeling some were just happy to have an "excuse" for getting started to contribute. We should attend that next year too.

One thing we added with 0.7 was a more formal branching structure: Evennia now uses fixed master and develop branches, where master is for bug-fixes and develop is for new features (things that will eventually become evennia 0.8). This is simple but enough for our needs; it also makes it easier to track new from old now that we are actually doing releases.

Now that Twisted is at a point where this is possible for us to do, we also now have a sort-of plan for finally moving Evennia to Python 3. I won't personally be actively working on it until after 0.8 is out though. I don't expect both Evennia 0.8 and 0.9 (which will be pure py3) to get released this year, but we'll see - so far contributors have done all the work on the conversion.

At any rate, this coming year will probably be dominated by catching up on issues and edge cases that are lining our Issue tracker. One side effect of more newcomers is more eyes on the code and finding the creaky-bits. At least for me, most of my Evennia-time will be spent resolving bugs and issues. The fun thing is that unlike previous years this is not only up to me anymore - hopefully others will keep helping to resolve issues/bugs to broaden our bandwidth when it comes to keeping Evennia stable. The faster we can handle the backlog of issues the faster we can focus on new shiny features after all.

Finally, a continued great thank you to those of you contributing to the Patreon. Even small donations have a great encouraging value when working on something as niche as a Python MU* game server in 2018 - thanks a lot!