FLOSS Project Planets

ERPAL: Drupal security: How to deliver Drupal updates continuously

Planet Drupal - Wed, 2015-05-20 06:30

While developing a system to automate Drupal updates and using that technology to fulfill our Drupal support contracts, we ran into many issues and questions about the workflows that integrate the update process into our overall development and deployment cycles. In this blog post, I’ll outline the best practices for handling different update types with different deployment processes – as well as the results thereof.

The general deployment workflow

Most professional Drupal developers work in a dev-stage-live environment. Using feature branches has become a valuable best-practice for deploying new features and hotfixes separately from the other features developed in the dev branch. Feature branches foster continuous delivery, although it does require additional infrastructure to test feature branches in separate instances. Let me sum up the development activity of the different branches.


This is where the development of new features happens and where the development team commits their code (or in a derived feature branch). When using feature branches, the dev branch is considered stable; features can be deployed forward separately. Nevertheless, the dev branch is there to test the integration of your locally developed changes with the code contributions of other developers, even if the current code of the dev branch hasn’t passed quality assurance. Before going live, the dev branch will be merged into the stage branch to be ready for quality assurance.


The stage branch is where code that’s about to be release (merged to the master branch and deployed to the live site) is thoroughly tested; it’s where the quality assurance happens. If the stage branch is bug-free, it will be merged into the master branch, which is the code base for the live site. The stage branch is the branch where customer acceptance happens.


The master branch contains the code base that serves the live site. No active changes happen here except hotfixes.

Hotfix branches

Hotfixes are changes applied to different environments without passing through the whole dev-stage-live development cycle. Hotfixes are handled in the same way as feature branches but with one difference: whereas feature branches start from the HEAD of the dev branch, a hotfix branch starts from the branch of the environment that requires the hotfix. In terms of security, a highly critical security update simply comes too late if it needs to go through the complete development cycle from dev to live. The same applies if there’s a bug on the live server that needs to be fixed immediately. Hotfix branches need to be merged back to the branches from which they were derived and all previous branches (e.g. if the hotfix branch was created from the master branch, it needs to be merged back to the master to bring all commits to the live site, and then it needs to be merged back to the stage and dev branch as well, so that all code changes are available for the development team)

Where to commit Drupal updates in the development workflow?

To answer this question we need to consider different types of updates. Security updates (including their criticality) and non-security updates (bug fixes and new features).

If we group them by priority we can derive the branches to which they need to be committed and also the duration of a deployment cycle. If you work in an continuous delivery environment, where you ship code continuously,the best way is to use feature branches derived from the dev branch.



Low (<=1 month):
- Bug fix updates - Feature updates

These updates should be committed by the development team and analysed for side effects. It’s still important to process these low-prio updates, as high-prio updates assume all previous code changes from earlier updates. You might miss some important quality assurance during high-prio updates to a module that hasn’t been updated for a long time.

Medium (<5 days):
- Security updates that are no critical and not highly critical

These updates should be applied in due time, as they’re related to the site's security. Since they’re not highly critical, we might decide to commit them on the stage branch and send a notification to the project lead, the quality assurance team or directly to you customer (depending on your SLA). Then, as soon as they’ve confirmed that the site works correctly, these updates will be merged to the master branch and back to stage and dev.

High (<4 hours):
- Critical and highly critical security updates

For critical and highly critical security updates we follow a "security first" strategy, ensuring that all critical security updates are applied immediately and as quickly as possible to keep the site secure. If there are bugs, we’ll fix them later! This strategy instructs us to apply updates directly to the master branch. Once the live site has been updated with the code from the master branch, we merge the updates back to the stage and dev branch. This is how we protected all our sites from Drupalgeddon in less than two hours!

Requirements for automation

If you want to automate your Drupal security updates with the Drop Guard service, all you need is the following:

  • Code deployment with GIT
  • Trigger the update of an instance by URL using e.g. Travis.ci, Jenkins CI, DeployHQ or other services to manage your deployment or alternatively execute SSH commands from the Drop Guard server.
Also to keep in mind:
  • Know what patches you’ve applied and don't forget to re-apply them during the update process (Drop Guard helps with its automated patch detection feature)
  • Automated tests reduce the time you spend on quality assurance

Where to commit an update depends on its priority and on the speed with which it needs to be deployed to the live site. Update continuously to ensure the ongoing quality and security of your project and to keep it future-proof. Feature and bug fix updates are less critical but also important to apply in due time.

For those of you interested in Drop Guard to automate the process as described in this blog post, please sign up for the free trial period so you can test all its features – for free – and get a personal on-boarding.

Categories: FLOSS Project Planets

Ruslan Spivak: Let’s Build A Web Server. Part 3.

Planet Python - Wed, 2015-05-20 06:00

“We learn most when we have to invent” —Piaget

In Part 2 you created a minimalistic WSGI server that could handle basic HTTP GET requests. And I asked you a question, “How can you make your server handle more than one request at a time?” In this article you will find the answer. So, buckle up and shift into high gear. You’re about to have a really fast ride. Have your Linux, Mac OS X (or any *nix system) and Python ready. All source code from the article is available on GitHub.

First let’s remember what a very basic Web server looks like and what the server needs to do to service client requests. The server you created in Part 1 and Part 2 is an iterative server that handles one client request at a time. It cannot accept a new connection until after it has finished processing a current client request. Some clients might be unhappy with it because they will have to wait in line, and for busy servers the line might be too long.

Here is the code of the iterative server webserver3a.py:

##################################################################### # Iterative server - webserver3a.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # ##################################################################### import socket SERVER_ADDRESS = (HOST, PORT) = '', 8888 REQUEST_QUEUE_SIZE = 5 def handle_request(client_connection): request = client_connection.recv(1024) print(request.decode()) http_response = b"""\ HTTP/1.1 200 OK Hello, World! """ client_connection.sendall(http_response) def serve_forever(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind(SERVER_ADDRESS) listen_socket.listen(REQUEST_QUEUE_SIZE) print('Serving HTTP on port {port} ...'.format(port=PORT)) while True: client_connection, client_address = listen_socket.accept() handle_request(client_connection) client_connection.close() if __name__ == '__main__': serve_forever()

To observe your server handling only one client request at a time, modify the server a little bit and add a 60 second delay after sending a response to a client. The change is only one line to tell the server process to sleep for 60 seconds.

And here is the code of the sleeping server webserver3b.py:

######################################################################### # Iterative server - webserver3b.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # # # # - Server sleeps for 60 seconds after sending a response to a client # ######################################################################### import socket import time SERVER_ADDRESS = (HOST, PORT) = '', 8888 REQUEST_QUEUE_SIZE = 5 def handle_request(client_connection): request = client_connection.recv(1024) print(request.decode()) http_response = b"""\ HTTP/1.1 200 OK Hello, World! """ client_connection.sendall(http_response) time.sleep(60) # sleep and block the process for 60 seconds def serve_forever(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind(SERVER_ADDRESS) listen_socket.listen(REQUEST_QUEUE_SIZE) print('Serving HTTP on port {port} ...'.format(port=PORT)) while True: client_connection, client_address = listen_socket.accept() handle_request(client_connection) client_connection.close() if __name__ == '__main__': serve_forever()

Start the server with:

$ python webserver3b.py

Now open up a new terminal window and run the curl command. You should instantly see the “Hello, World!” string printed on the screen:

$ curl http://localhost:8888/hello Hello, World!

And without delay open up a second terminal window and run the same curl command:

$ curl http://localhost:8888/hello

If you’ve done that within 60 seconds then the second curl should not produce any output right away and should just hang there. The server shouldn’t print a new request body on its standard output either. Here is how it looks like on my Mac (the window at the bottom right corner highlighted in yellow shows the second curl command hanging, waiting for the connection to be accepted by the server):

After you’ve waited long enough (more than 60 seconds) you should see the first curl terminate and the second curl print “Hello, World!” on the screen, then hang for 60 seconds, and then terminate:

The way it works is that the server finishes servicing the first curl client request and then it starts handling the second request only after it sleeps for 60 seconds. It all happens sequentially, or iteratively, one step, or in our case one client request, at a time.

Let’s talk about the communication between clients and servers for a bit. In order for two programs to communicate with each other over a network, they have to use sockets. And you saw sockets both in Part 1 and Part 2. But what is a socket?

A socket is an abstraction of a communication endpoint and it allows your program to communicate with another program using file descriptors. In this article I’ll be talking specifically about TCP/IP sockets on Linux/Mac OS X. An important notion to understand is the TCP socket pair.

The socket pair for a TCP connection is a 4-tuple that identifies two endpoints of the TCP connection: the local IP address, local port, foreign IP address, and foreign port. A socket pair uniquely identifies every TCP connection on a network. The two values that identify each endpoint, an IP address and a port number, are often called a socket.1

So, the tuple {,} is a socket pair that uniquely identifies two endpoints of the TCP connection on the client and the tuple {,} is a socket pair that uniquely identifies the same two endpoints of the TCP connection on the server. The two values that identify the server endpoint of the TCP connection, the IP address and the port 8888, are referred to as a socket in this case (the same applies to the client endpoint).

The standard sequence a server usually goes through to create a socket and start accepting client connections is the following:

  1. The server creates a TCP/IP socket. This is done with the following statement in Python:

    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  2. The server might set some socket options (this is optional, but you can see that the server code above does just that to be able to re-use the same address over and over again if you decide to kill and re-start the server right away).

    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
  3. Then, the server binds the address. The bind function assigns a local protocol address to the socket. With TCP, calling bind lets you specify a port number, an IP address, both, or neither.1

  4. Then, the server makes the socket a listening socket


The listen method is only called by servers. It tells the kernel that it should accept incoming connection requests for this socket.

After that’s done, the server starts accepting client connections one connection at a time in a loop. When there is a connection available the accept call returns the connected client socket. Then, the server reads the request data from the connected client socket, prints the data on its standard output and sends a message back to the client. Then, the server closes the client connection and it is ready again to accept a new client connection.

Here is what a client needs to do to communicate with the server over TCP/IP:

Here is the sample code for a client to connect to your server, send a request and print the response:

import socket # create a socket and connect to a server sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(('localhost', 8888)) # send and receive some data sock.sendall(b'test') data = sock.recv(1024) print(data.decode())

After creating the socket, the client needs to connect to the server. This is done with the connect call:

sock.connect(('localhost', 8888))

The client only needs to provide the remote IP address or host name and the remote port number of a server to connect to.

You’ve probably noticed that the client doesn’t call bind and accept. The client doesn’t need to call bind because the client doesn’t care about the local IP address and the local port number. The TCP/IP stack within the kernel automatically assigns the local IP address and the local port when the client calls connect. The local port is called an ephemeral port, i.e. a short-lived port.

A port on a server that identifies a well-known service that a client connects to is called a well-known port (for example, 80 for HTTP and 22 for SSH). Fire up your Python shell and make a client connection to the server you run on localhost and see what ephemeral port the kernel assigns to the socket you’ve created (start the server webserver3a.py or webserver3b.py before trying the following example):

>>> import socket >>> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) >>> sock.connect(('localhost', 8888)) >>> host, port = sock.getsockname()[:2] >>> host, port ('', 60589)

In the case above the kernel assigned the ephemeral port 60589 to the socket.

There are some other important concepts that I need to cover quickly before I get to answer the question from Part 2. You will see shortly why this is important. The two concepts are that of a process and a file descriptor.

What is a process? A process is just an instance of an executing program. When the server code is executed, for example, it’s loaded into memory and an instance of that executing program is called a process. The kernel records a bunch of information about the process - its process ID would be one example - to keep track of it. When you run your iterative server webserver3a.py or webserver3b.py you run just one process.

Start the server webserver3b.py in a terminal window:

$ python webserver3b.py

And in a different terminal window use the ps command to get the information about that process:

$ ps | grep webserver3b | grep -v grep 7182 ttys003 0:00.04 python webserver3b.py

The ps command shows you that you have indeed run just one Python process webserver3b. When a process gets created the kernel assigns a process ID to it, PID. In UNIX, every user process also has a parent that, in turn, has its own process ID called parent process ID, or PPID for short. I assume that you run a BASH shell by default and when you start the server, a new process gets created with a PID and its parent PID is set to the PID of the BASH shell.

Try it out and see for yourself how it all works. Fire up your Python shell again, which will create a new process, and then get the PID of the Python shell process and the parent PID (the PID of your BASH shell) using os.getpid() and os.getppid() system calls. Then, in another terminal window run ps command and grep for the PPID (parent process ID, which in my case is 3148). In the screenshot below you can see an example of a parent-child relationship between my child Python shell process and the parent BASH shell process on my Mac OS X:

Another important concept to know is that of a file descriptor. So what is a file descriptor? A file descriptor is a non-negative integer that the kernel returns to a process when it opens an existing file, creates a new file or when it creates a new socket. You’ve probably heard that in UNIX everything is a file. The kernel refers to the open files of a process by a file descriptor. When you need to read or write a file you identify it with the file descriptor. Python gives you high-level objects to deal with files (and sockets) and you don’t have to use file descriptors directly to identify a file but, under the hood, that’s how files and sockets are identified in UNIX: by their integer file descriptors.

By default, UNIX shells assign file descriptor 0 to the standard input of a process, file descriptor 1 to the standard output of the process and file descriptor 2 to the standard error.

As I mentioned before, even though Python gives you a high-level file or file-like object to work with, you can always use the fileno() method on the object to get the file descriptor associated with the file. Back to your Python shell to see how you can do that:

>>> import sys >>> sys.stdin <open file '<stdin>', mode 'r' at 0x102beb0c0> >>> sys.stdin.fileno() 0 >>> sys.stdout.fileno() 1 >>> sys.stderr.fileno() 2

And while working with files and sockets in Python, you’ll usually be using a high-level file/socket object, but there may be times where you need to use a file descriptor directly. Here is an example of how you can write a string to the standard output using a write system call that takes a file descriptor integer as a parameter:

>>> import sys >>> import os >>> res = os.write(sys.stdout.fileno(), 'hello\n') hello

And here is an interesting part - which should not be surprising to you anymore because you already know that everything is a file in Unix - your socket also has a file descriptor associated with it. Again, when you create a socket in Python you get back an object and not a non-negative integer, but you can always get direct access to the integer file descriptor of the socket with the fileno() method that I mentioned earlier.

>>> import socket >>> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) >>> sock.fileno() 3

One more thing I wanted to mention: have you noticed that in the second example of the iterative server webserver3b.py, when the server process was sleeping for 60 seconds you could still connect to the server with the second curl command? Sure, the curl didn’t output anything right away and it was just hanging out there but how come the server was not accept ing a connection at the time and the client was not rejected right away, but instead was able to connect to the server? The answer to that is the listen method of a socket object and its BACKLOG argument, which I called REQUEST_QUEUE_SIZE in the code. The BACKLOG argument determines the size of a queue within the kernel for incoming connection requests. When the server webserver3b.py was sleeping, the second curl command that you ran was able to connect to the server because the kernel had enough space available in the incoming connection request queue for the server socket.

While increasing the BACKLOG argument does not magically turn your server into a server that can handle multiple client requests at a time, it is important to have a fairly large backlog parameter for busy servers so that the accept call would not have to wait for a new connection to be established but could grab the new connection off the queue right away and start processing a client request without delay.

Whoo-hoo! You’ve covered a lot of ground. Let’s quickly recap what you’ve learned (or refreshed if it’s all basics to you) so far.

  • Iterative server
  • Server socket creation sequence (socket, bind, listen, accept)
  • Client connection creation sequence (socket, connect)
  • Socket pair
  • Socket
  • Ephemeral port and well-known port
  • Process
  • Process ID (PID), parent process ID (PPID), and the parent-child relationship.
  • File descriptors
  • The meaning of the BACKLOG argument of the listen socket method

Now I am ready to answer the question from Part 2: “How can you make your server handle more than one request at a time?” Or put another way, “How do you write a concurrent server?”

The simplest way to write a concurrent server under Unix is to use a fork() system call.

Here is the code of your new shiny concurrent server webserver3c.py that can handle multiple client requests at the same time (as in our iterative server example webserver3b.py, every child process sleeps for 60 secs):

########################################################################### # Concurrent server - webserver3c.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # # # # - Child process sleeps for 60 seconds after handling a client's request # # - Parent and child processes close duplicate descriptors # # # ########################################################################### import os import socket import time SERVER_ADDRESS = (HOST, PORT) = '', 8888 REQUEST_QUEUE_SIZE = 5 def handle_request(client_connection): request = client_connection.recv(1024) print( 'Child PID: {pid}. Parent PID {ppid}'.format( pid=os.getpid(), ppid=os.getppid(), ) ) print(request.decode()) http_response = b"""\ HTTP/1.1 200 OK Hello, World! """ client_connection.sendall(http_response) time.sleep(60) def serve_forever(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind(SERVER_ADDRESS) listen_socket.listen(REQUEST_QUEUE_SIZE) print('Serving HTTP on port {port} ...'.format(port=PORT)) print('Parent PID (PPID): {pid}\n'.format(pid=os.getpid())) while True: client_connection, client_address = listen_socket.accept() pid = os.fork() if pid == 0: # child listen_socket.close() # close child copy handle_request(client_connection) client_connection.close() os._exit(0) # child exits here else: # parent client_connection.close() # close parent copy and loop over if __name__ == '__main__': serve_forever()

Before diving in and discussing how fork works, try it, and see for yourself that the server can indeed handle multiple client requests at the same time, unlike its iterative counterparts webserver3a.py and webserver3b.py. Start the server on the command line with:

$ python webserver3c.py

And try the same two curl commands you’ve tried before with the iterative server and see for yourself that, now, even though the server child process sleeps for 60 seconds after serving a client request, it doesn’t affect other clients because they are served by different and completely independent processes. You should see your curl commands output “Hello, World!” instantly and then hang for 60 secs. You can keep on running as many curl commands as you want (well, almost as many as you want :) and all of them will output the server’s response “Hello, World” immediately and without any noticeable delay. Try it.

The most important point to understand about fork() is that you call fork once but it returns twice: once in the parent process and once in the child process. When you fork a new process the process ID returned to the child process is 0. When the fork returns in the parent process it returns the child’s PID.

I still remember how fascinated I was by fork when I first read about it and tried it. It looked like magic to me. Here I was reading a sequential code and then “boom!”: the code cloned itself and now there were two instances of the same code running concurrently. I thought it was nothing short of magic, seriously.

When a parent forks a new child, the child process gets a copy of the parent’s file descriptors:

You’ve probably noticed that the parent process in the code above closed the client connection:

else: # parent client_connection.close() # close parent copy and loop over

So how come a child process is still able to read the data from a client socket if its parent closed the very same socket? The answer is in the picture above. The kernel uses descriptor reference counts to decide whether to close a socket or not. It closes the socket only when its descriptor reference count becomes 0. When your server creates a child process, the child gets the copy of the parent’s file descriptors and the kernel increments the reference counts for those descriptors. In the case of one parent and one child, the descriptor reference count would be 2 for the client socket and when the parent process in the code above closes the client connection socket, it merely decrements its reference count which becomes 1, not small enough to cause the kernel to close the socket. The child process also closes the duplicate copy of the parent’s listen_socket because the child doesn’t care about accepting new client connections, it cares only about processing requests from the established client connection:

listen_socket.close() # close child copy

I’ll talk about what happens if you do not close duplicate descriptors later in the article.

As you can see from the source code of your concurrent server, the sole role of the server parent process now is to accept a new client connection, fork a new child process to handle that client request, and loop over to accept another client connection, and nothing more. The server parent process does not process client requests - its children do.

A little aside. What does it mean when we say that two events are concurrent?

When we say that two events are concurrent we usually mean that they happen at the same time. As a shorthand that definition is fine, but you should remember the strict definition:

Two events are concurrent if you cannot tell by looking at the program which will happen first.2

Again, it’s time to recap the main ideas and concepts you’ve covered so far.

  • The simplest way to write a concurrent server in Unix is to use the fork() system call
  • When a process forks a new process it becomes a parent process to that newly forked child process.
  • Parent and child share the same file descriptors after the call to fork.
  • The kernel uses descriptor reference counts to decide whether to close the file/socket or not
  • The role of a server parent process: all it does now is accept a new connection from a client, fork a child to handle the client request, and loop over to accept a new client connection.

Let’s see what is going to happen if you don’t close duplicate socket descriptors in the parent and child processes. Here is a modified version of the concurrent server where the server does not close duplicate descriptors, webserver3d.py:

########################################################################### # Concurrent server - webserver3d.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # ########################################################################### import os import socket SERVER_ADDRESS = (HOST, PORT) = '', 8888 REQUEST_QUEUE_SIZE = 5 def handle_request(client_connection): request = client_connection.recv(1024) http_response = b"""\ HTTP/1.1 200 OK Hello, World! """ client_connection.sendall(http_response) def serve_forever(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind(SERVER_ADDRESS) listen_socket.listen(REQUEST_QUEUE_SIZE) print('Serving HTTP on port {port} ...'.format(port=PORT)) clients = [] while True: client_connection, client_address = listen_socket.accept() # store the reference otherwise it's garbage collected # on the next loop run clients.append(client_connection) pid = os.fork() if pid == 0: # child listen_socket.close() # close child copy handle_request(client_connection) client_connection.close() os._exit(0) # child exits here else: # parent # client_connection.close() print(len(clients)) if __name__ == '__main__': serve_forever()

Start the server with:

$ python webserver3d.py

Use curl to connect to the server:

$ curl http://localhost:8888/hello Hello, World!

Okay, the curl printed the response from the concurrent server but it did not terminate and kept hanging. What is happening here? The server no longer sleeps for 60 seconds: its child process actively handles a client request, closes the client connection and exits, but the client curl still does not terminate.

So why does the curl not terminate? The reason is the duplicate file descriptors. When the child process closed the client connection, the kernel decremented the reference count of that client socket and the count became 1. The server child process exited, but the client socket was not closed by the kernel because the reference count for that socket descriptor was not 0, and, as a result, the termination packet (called FIN in TCP/IP parlance) was not sent to the client and the client stayed on the line, so to speak. There is also another problem. If your long-running server doesn’t close duplicate file descriptors, it will eventually run out of available file descriptors:

Stop your server webserver3d.py with Control-C and check out the default resources available to your server process set up by your shell with the shell built-in command ulimit:

$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 3842 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 3842 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

As you can see above, the maximum number of open file descriptors (open files) available to the server process on my Ubuntu box is 1024.

Now let’s see how your server can run out of available file descriptors if it doesn’t close duplicate descriptors. In an existing or new terminal window, set the maximum number of open file descriptors for your server to be 256:

$ ulimit -n 256

Start the server webserver3d.py in the same terminal where you’ve just run the $ ulimit -n 256 command:

$ python webserver3d.py

and use the following client client3.py to test the server.

##################################################################### # Test client - client3.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # ##################################################################### import argparse import errno import os import socket SERVER_ADDRESS = 'localhost', 8888 REQUEST = b"""\ GET /hello HTTP/1.1 Host: localhost:8888 """ def main(max_clients, max_conns): socks = [] for client_num in range(max_clients): pid = os.fork() if pid == 0: for connection_num in range(max_conns): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(SERVER_ADDRESS) sock.sendall(REQUEST) socks.append(sock) print(connection_num) os._exit(0) if __name__ == '__main__': parser = argparse.ArgumentParser( description='Test client for LSBAWS.', formatter_class=argparse.ArgumentDefaultsHelpFormatter, ) parser.add_argument( '--max-conns', type=int, default=1024, help='Maximum number of connections per client.' ) parser.add_argument( '--max-clients', type=int, default=1, help='Maximum number of clients.' ) args = parser.parse_args() main(args.max_clients, args.max_conns)

In a new terminal window, start the client3.py and tell it to create 300 simultaneous connections to the server:

$ python client3.py --max-clients=300

Soon enough your server will explode. Here is a screenshot of the exception on my box:

The lesson is clear - your server should close duplicate descriptors. But even if you close duplicate descriptors, you are not out of the woods yet because there is another problem with your server, and that problem is zombies!

Yes, your server code actually creates zombies. Let’s see how. Start up your server again:

$ python webserver3d.py

Run the following curl command in another terminal window:

$ curl http://localhost:8888/hello

And now run the ps command to show running Python processes. This the example of ps output on my Ubuntu box:

$ ps auxw | grep -i python | grep -v grep vagrant 9099 0.0 1.2 31804 6256 pts/0 S+ 16:33 0:00 python webserver3d.py vagrant 9102 0.0 0.0 0 0 pts/0 Z+ 16:33 0:00 [python] <defunct>

Do you see the second line above where it says the status of the process with PID 9102 is Z+ and the name of the process is <defunct>? That’s our zombie there. The problem with zombies is that you can’t kill them.

Even if you try to kill zombies with $ kill -9 , they will survive. Try it and see for yourself.

What is a zombie anyway and why does our server create them? A zombie is a process that has terminated, but its parent has not waited for it and has not received its termination status yet. When a child process exits before its parent, the kernel turns the child process into a zombie and stores some information about the process for its parent process to retrieve later. The information stored is usually the process ID, the process termination status, and the resource usage by the process. Okay, so zombies serve a purpose, but if your server doesn’t take care of these zombies your system will get clogged up. Let’s see how that happens. First stop your running server and, in a new terminal window, use the ulimit command to set the max user processess to 400(make sure to set open files to a high number, let’s say 500 too):

$ ulimit -u 400 $ ulimit -n 500

Start the server webserver3d.py in the same terminal where you’ve just run the $ ulimit -u 400 command:

$ python webserver3d.py

In a new terminal window, start the client3.py and tell it to create 500 simultaneous connections to the server:

$ python client3.py --max-clients=500

And, again, soon enough your server will blow up with an OSError: Resource temporarily unavailable exception when it tries to create a new child process, but it can’t because it has reached the limit for the maximum number of child processes it’s allowed to create. Here is a screenshot of the exception on my box:

As you can see, zombies create problems for your long-running server if it doesn’t take care of them. I will discuss shortly how the server should deal with that zombie problem.

Let’s recap the main points you’ve covered so far:

  • If you don’t close duplicate descriptors, the clients won’t terminate because the client connections won’t get closed.
  • If you don’t close duplicate descriptors, your long-running server will eventually run out of available file descriptors (max open files).
  • When you fork a child process and it exits and the parent process doesn’t wait for it and doesn’t collect its termination status, it becomes a zombie.
  • Zombies need to eat something and, in our case, it’s memory. Your server will eventually run out of available processes (max user processes) if it doesn’t take care of zombies.
  • You can’t kill a zombie, you need to wait for it.

So what do you need to do to take care of zombies? You need to modify your server code to wait for zombies to get their termination status. You can do that by modifying your server to call a wait system call. Unfortunately, that’s far from ideal because if you call wait and there is no terminated child process the call to wait will block your server, effectively preventing your server from handling new client connection requests. Are there any other options? Yes, there are, and one of them is the combination of a signal handler with the wait system call.

Here is how it works. When a child process exits, the kernel sends a SIGCHLD signal. The parent process can set up a signal handler to be asynchronously notified of that SIGCHLD event and then it can wait for the child to collect its termination status, thus preventing the zombie process from being left around.

By the way, an asynchronous event means that the parent process doesn’t know ahead of time that the event is going to happen.

Modify your server code to set up a SIGCHLD event handler and wait for a terminated child in the event handler. The code is available in webserver3e.py file:

########################################################################### # Concurrent server - webserver3e.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # ########################################################################### import os import signal import socket import time SERVER_ADDRESS = (HOST, PORT) = '', 8888 REQUEST_QUEUE_SIZE = 5 def grim_reaper(signum, frame): pid, status = os.wait() print( 'Child {pid} terminated with status {status}' '\n'.format(pid=pid, status=status) ) def handle_request(client_connection): request = client_connection.recv(1024) print(request.decode()) http_response = b"""\ HTTP/1.1 200 OK Hello, World! """ client_connection.sendall(http_response) # sleep to allow the parent to loop over to 'accept' and block there time.sleep(3) def serve_forever(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind(SERVER_ADDRESS) listen_socket.listen(REQUEST_QUEUE_SIZE) print('Serving HTTP on port {port} ...'.format(port=PORT)) signal.signal(signal.SIGCHLD, grim_reaper) while True: client_connection, client_address = listen_socket.accept() pid = os.fork() if pid == 0: # child listen_socket.close() # close child copy handle_request(client_connection) client_connection.close() os._exit(0) else: # parent client_connection.close() if __name__ == '__main__': serve_forever()

Start the server:

$ python webserver3e.py

Use your old friend curl to send a request to the modified concurrent server:

$ curl http://localhost:8888/hello

Look at the server:

What just happened? The call to accept failed with the error EINTR.

The parent process was blocked in accept call when the child process exited which caused SIGCHLD event, which in turn activated the signal handler and when the signal handler finished the accept system call got interrupted:

Don’t worry, it’s a pretty simple problem to solve, though. All you need to do is to re-start the accept system call. Here is the modified version of the server webserver3f.py that handles that problem:

########################################################################### # Concurrent server - webserver3f.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # ########################################################################### import errno import os import signal import socket SERVER_ADDRESS = (HOST, PORT) = '', 8888 REQUEST_QUEUE_SIZE = 1024 def grim_reaper(signum, frame): pid, status = os.wait() def handle_request(client_connection): request = client_connection.recv(1024) print(request.decode()) http_response = b"""\ HTTP/1.1 200 OK Hello, World! """ client_connection.sendall(http_response) def serve_forever(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind(SERVER_ADDRESS) listen_socket.listen(REQUEST_QUEUE_SIZE) print('Serving HTTP on port {port} ...'.format(port=PORT)) signal.signal(signal.SIGCHLD, grim_reaper) while True: try: client_connection, client_address = listen_socket.accept() except IOError as e: code, msg = e.args # restart 'accept' if it was interrupted if code == errno.EINTR: continue else: raise pid = os.fork() if pid == 0: # child listen_socket.close() # close child copy handle_request(client_connection) client_connection.close() os._exit(0) else: # parent client_connection.close() # close parent copy and loop over if __name__ == '__main__': serve_forever()

Start the updated server webserver3f.py:

$ python webserver3f.py

Use curl to send a request to the modified concurrent server:

$ curl http://localhost:8888/hello

See? No EINTR exceptions any more. Now, verify that there are no more zombies either and that your SIGCHLD event handler with wait call took care of terminated children. To do that, just run the ps command and see for yourself that there are no more Python processes with Z+ status (no more <defunct> processes). Great! It feels safe without zombies running around.

  • If you fork a child and don’t wait for it, it becomes a zombie.
  • Use the SIGCHLD event handler to asynchronously wait for a terminated child to get its termination status
  • When using an event handler you need to keep in mind that system calls might get interrupted and you need to be prepared for that scenario

Okay, so far so good. No problems, right? Well, almost. Try your webserver3f.py again, but instead of making one request with curl use client3.py to create 128 simultaneous connections:

$ python client3.py --max-clients 128

Now run the ps command again

$ ps auxw | grep -i python | grep -v grep

and see that, oh boy, zombies are back again!

What went wrong this time? When you ran 128 simultaneous clients and established 128 connections, the child processes on the server handled the requests and exited almost at the same time causing a flood of SIGCHLD signals being sent to the parent process. The problem is that the signals are not queued and your server process missed several signals, which left several zombies running around unattended:

The solution to the problem is to set up a SIGCHLD event handler but instead of wait use a waitpid system call with a WNOHANG option in a loop to make sure that all terminated child processes are taken care of. Here is the modified server code, webserver3g.py:

########################################################################### # Concurrent server - webserver3g.py # # # # Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X # ########################################################################### import errno import os import signal import socket SERVER_ADDRESS = (HOST, PORT) = '', 8888 REQUEST_QUEUE_SIZE = 1024 def grim_reaper(signum, frame): while True: try: pid, status = os.waitpid( -1, # Wait for any child process os.WNOHANG # Do not block and return EWOULDBLOCK error ) except OSError: return if pid == 0: # no more zombies return def handle_request(client_connection): request = client_connection.recv(1024) print(request.decode()) http_response = b"""\ HTTP/1.1 200 OK Hello, World! """ client_connection.sendall(http_response) def serve_forever(): listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) listen_socket.bind(SERVER_ADDRESS) listen_socket.listen(REQUEST_QUEUE_SIZE) print('Serving HTTP on port {port} ...'.format(port=PORT)) signal.signal(signal.SIGCHLD, grim_reaper) while True: try: client_connection, client_address = listen_socket.accept() except IOError as e: code, msg = e.args # restart 'accept' if it was interrupted if code == errno.EINTR: continue else: raise pid = os.fork() if pid == 0: # child listen_socket.close() # close child copy handle_request(client_connection) client_connection.close() os._exit(0) else: # parent client_connection.close() # close parent copy and loop over if __name__ == '__main__': serve_forever()

Start the server:

$ python webserver3g.py

Use the test client client3.py:

$ python client3.py --max-clients 128

And now verify that there are no more zombies. Yay! Life is good without zombies :)

Congratulations! It’s been a pretty long journey but I hope you liked it. Now you have your own simple concurrent server and the code can serve as a foundation for your further work towards a production grade Web server.

I’ll leave it as an exercise for you to update the WSGI server from Part 2 and make it concurrent. You can find the modified version here. But look at my code only after you’ve implemented your own version. You have all the necessary information to do that. So go and just do it :)

What’s next? As Josh Billings said,

“Be like a postage stamp — stick to one thing until you get there.”

Start mastering the basics. Question what you already know. And always dig deeper.

“If you learn only methods, you’ll be tied to your methods. But if you learn principles, you can devise your own methods.” —Ralph Waldo Emerson

Below is a list of books that I’ve drawn on for most of the material in this article. They will help you broaden and deepen your knowledge about the topics I’ve covered. I highly recommend you to get those books somehow: borrow them from your friends, check them out from your local library, or just buy them on Amazon. They are the keepers:

  1. Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

  2. Advanced Programming in the UNIX Environment, 3rd Edition

  3. The Linux Programming Interface: A Linux and UNIX System Programming Handbook

  4. TCP/IP Illustrated, Volume 1: The Protocols (2nd Edition) (Addison-Wesley Professional Computing Series)

  5. The Little Book of SEMAPHORES (2nd Edition): The Ins and Outs of Concurrency Control and Common Mistakes. Also available for free on the author’s site here.

BTW, I’m writing a book “Let’s Build A Web Server: First Steps” that explains how to write a basic web server from scratch and goes into more detail on the topics I just covered. Subscribe to the mailing list to get the latest updates about the book and the release date.


  1. Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition) 

  2. The Little Book of SEMAPHORES (2nd Edition): The Ins and Outs of Concurrency Control and Common Mistakes

Categories: FLOSS Project Planets

Martin-&#201;ric Racine: xf86-video-geode 2.11.17

Planet Debian - Wed, 2015-05-20 05:46

This morning, I pushed out version 2.11.17 of the Geode X.Org driver. This is the driver used by the OLPC XO-1 and by a plethora of low-power desktops, micro notebooks and thin clients. This is a minor release. It merges conditional support for the OpenBSD MSR device (Marc Ballmer, Matthieu Herrb), fixes a condition that prevents compiling on some embedded platforms (Brian A. Lloyd) and upgrades the code for X server 1.17 compatibility (Maarten Lankhorst).

Pending issues:

  • toggle COM2 into DDC probing mode during driver initialization
  • reset the DAC chip when exiting X and returning to vcons
  • fix a rendering corner case with Libre Office
Categories: FLOSS Project Planets

Enrico Zini: love-thy-neighbor

Planet Debian - Wed, 2015-05-20 05:35
Love thy neighbor as thyself

‘Love thy neighbor as thyself’, words which astoundingly occur already in the Old Testament.

One can love one’s neighbor less than one loves oneself; one is then the egoist, the racketeer, the capitalist, the bourgeois. and although one may accumulate money and power one does not of necessity have a joyful heart, and the best and most attractive pleasures of the soul are blocked.

Or one can love one’s neighbor more than oneself—then one is a poor devil, full of inferiority complexes, with a longing to love everything and still full of hate and torment towards oneself, living in a hell of which one lays the fire every day anew.

But the equilibrium of love, the capacity to love without being indebted to anyone, is the love of oneself which is not taken away from any other, this love of one’s neighbor which does no harm to the self.

(From Herman Hesse, "My Belief")

I always have a hard time finding this quote on the Internet. Let's fix that.

Categories: FLOSS Project Planets

Rhonda D'Vine: Berge

Planet Debian - Wed, 2015-05-20 05:21

I wrote well over one year ago about Earthlings. It really did have some impact on my life. Nowadays I try to avoid animal products where possible, especially for my food. And in the context of vegan information that I tracked I stumbled upon a great band from Germany: Berge. They recently started a deal with their record label which says that if they receive one million clicks within the next two weeks on their song 10.000 Tränen their record label is going to donate 10.000,- euros to a German animal rights organization. Reason enough for me to share this band with you! :)
(For those who are puzzled by the original upload date of the video: Don't let yourself get confused, the call for it is from this monday)

  • 10.000 Tränen: This is the song that needs the views. It's a nice tune and great lyrics to think about. Even though its in German it got English subtitles. :)
  • Schauen was passiert: In the light of 10.000 Tränen it was hard for me to select other songs, but this one sounds nice. "Let's see what happens". :)
  • Meer aus Farben: I love colors. And I hate the fact that most conference shirts are black only. Or that it seems to be impossible to find colorful cloths and shoes for tall women.

Like always, enjoy!

/music | permanent link | Comments: 3 | Flattr this

Categories: FLOSS Project Planets

Jim Birch: Using Drupal&#039;s Environment Indicator to help visually manage Dev, Stage, and Production Servers

Planet Drupal - Wed, 2015-05-20 05:00

There are days that I work on half a dozen different websites.  I'm sure some of you are in the same boat.  We make client edits and change requests with rapid effieciency.  We work locally, push to staging, test and review, then push to the live server and repeat.  I would be remiss in saying that I never made a change on the live or staging site accidentally.

The Drupal Environment Indicator module allows you to name, color, and configure a multitude of visual queues for each of your different servers, or other variables, like Git branch or path.  It is very easy to install, and can integrate with Toolbar, Admin Menu, and Mobile Friendly Navigation Toolbar for no additional screen space. 

Once installed, set the permissions of the roles you want to give permission to see the indicator.  You can adjust the general settings at /admin/config/development/environment-indicator/settings

While you can create different indicators inside the admin UI, I prefer to set these in the settings.php files on the various servers so they are not overidden when we move databases back from Production back to Staging and Dev.

Read more

Categories: FLOSS Project Planets

بايثون العربي: كيفية إستخدام وحدة Random في بايثون

Planet Python - Wed, 2015-05-20 04:54
برامج الكمبيوتر وخاصة الألعاب منها تكون ممتعة لو كانت هناك بعض من الأشياء العشوائية ولكن ولسوء الحظ ليس لدينا أي طريقة تمكننا من الاعتماد عليها لتوليد الارقام العشوائية بشكل رائع ومع ذلك فان معظم لغات البرمجة ومنها بايثون تتضمن بعض الدوال التي تقوم بتوليد الأرقام الشبه العشوائية وتقوم هذه الدوال بعمل بعض الخطوات لتقوم بعرض الارقام بطريقة عشوائية.
يصعب على أجهزة الكمبيوتر توليد أرقام عشوائية حقيقية وهي تحتاج الى عتاد خاص لتوليد ارقام عشوائية حقيقية ولكنها عملية معقدة وباهضة لذلك نقوم بالإكتفاء بما تقدمه لغات البرمجة من عملية التوليد الشبه العشوائية .

الأرقام العشوائية في البرامج تسمح لنا بلعب ألعاب نجهل احداثها المستقبلية بسبب عشوائية المراحل .
سبق لي وان تكلمت عن دالة randrange  الموجودة في وحدة Random بشكل سريع ومختصر ولكن اليوم سنتلكم عن مجموعة اخرى من الدوال الموجودة في وحدة Random .
تتيح لنا  وحدة Random إمكانية الوصول مجموعة كبيرة من الوظائف والدوال ومن أهمها تلك التي تسمح لنا بتوليد الأرقام العشوائية .
متى نستعمل وحدة Random
نحتاج وحدة Random عندما نريد من الكمبيوتر ان يقوم بإختيار رقم معين في مجال محدد وهي ليست مخصصة للأرقام وفقط بل يمكننا اختيار عناصر عشوائية من القوائم القواميس والكثير من الأمور الاخرى .
دوال وحدة Random
كما وسبق أن قلت ان هذه الوحدة تحتوي على الكثير من الدوال والوظائف التي تساعدنا في أعمالنا وسنقوم بعرض مجموعة مفيدة من تلك الدوال .
  • Randint
إذا كنت تريد أن تقوم بتوليد أعداد صحيحة عشوائيا نقوم بإستخدام الدالة Randint  وهي تقبل قيمتين :
القيمة الادنى والقيمة الأعلى ويتم ضمهما معا في نطاق هملية الإختيار العشوائية ولتوضيح العملية أكثر ناخذ مثال :

لتوليد رقم عشوائي من 1 الى 5 نقوم بكتابة الكود التالي :

import random
print random.randint(0, 5)
وستكون النتيجة رقم عشوائي من الأرقام التالية : 1،2،3،4،5
  • Random
اذا كنت تريد ارقام وأعداد كبيرة يمكننا إستخدام العلامة الرياضية الضرب .
المثال التالي سيقوم بعرض رقم عشواي من 0 الى 100

import random
random.random() * 100

  • choice
إذا كنت تريد عرض قيمة عشواية من القوائم نقوم بإستخدام الدالة choice
البرنامج التالي سيقوم بعرض نتائج مختلفة في كل مرة يتم تشغيل فيها البرنامج


  • Shuffle

تقوم هذه الدالة بإعادة توزيع  عناصر القائمة عشوائيا دعونا نأخذ مثال لتوضيح الأمر أكثر
import random

list = [20, 16, 10, 5];
print list

print list
وستكون النتيجة

  • Randrange
تقوم هذه الدالة بعرض قيمة عشوائية من مجموعة من العناصر التي تم تعيينها مسبقا

range(الخطوة النهاية البداية ).
البداية : الرقم التي تبدأ منها عملية الإختيار العشوائية ويمكن ان يكون هذا الرقم ضمن نطاق الارقام العشوائية.
النهاية : الرقم أو العدد الذي تنتهي عنده عملية الاختيار العشوائية ولا يمكن ان يكون هذا الرقم ضمن نطاق الارقام العشوائية .
الخطوة: هذه الخاصية تمثل الرقم الذي يقوم البرنامج لاضافته للرقم العشوائي .
دعونا ناخذ مثال عن العملية

import random

#إختيار رقم عشوائي من 100 الى 1000
random.randrange(100, 1000, 2)

# إختيار رقم عشوائي من 100 الى 1000
random.randrange(100, 1000, 3)

Categories: FLOSS Project Planets

Modules Unraveled: 135 Writing the Book Drupal 8 Configuration Management with Anja Schirwinski and Stefan Borchert - Modules Unraveled Podcast

Planet Drupal - Wed, 2015-05-20 00:40
Published: Tue, 05/19/15Download this episodeWriting a Book for D8
  • What’s it like writing a book for a piece of software that isn’t even officially released yet?
  • How long did the writing process take?
    • Packt publishing sent us a proposal to write this book in December of 2013. We got started quickly, sending them an outline of the chapters and an estimated page count in the same month. The original estimated page count was 150, it turned out to be around 120. We received a pretty strict time line, having to finish a chapter every two weeks, starting in December of 2013.
    • We managed to finish most chapters in two weeks, but some of the longer ones took a little longer since we also started one of our biggest projects we had had until then, also in January. That was pretty tough because that project took up way more than a regular full time job, so we ended up having to write all of the chapters late at night and on the weekends. In May, all of our chapters then went to the editors and we didn’t hear back from the publisher for a really long time.
    • We also told them that we will have to rewrite a lot of the chapters since there was so much work in progress with the Configuration Management Initiative and they were changing a lot about how it worked, like going from the file based default to the database default. I think it was in January of 2015 when chapters came back with some feedback and we started rewriting every chapter, which was pretty painful at the time. We were able to update some of the documentation at drupal.org with the changes we found. It felt good to contribution at least a small part, when with our project and the book we had no time left to contribute code to Drupal 8 like we usually do.
    • We spent around 40 days on the book between the two of us.
    • In December, Packt asked the first publisher to review the book. We had recommended them one of our team members at undpaul, Tom, who has a similar amount of Drupal knowledge as Stefan. We really wanted to have someone from CMI to review the book, like Greg Dunlap. They had turned down reviewing the book after the first chapters were written, because too much would still change. Then after the changes went in we kept recommending Greg but I never heard anything back, maybe he was busy or they didn’t bother to ask. At the beginning of this year they told us the book was planned to be published by March. We recommended waiting because we didn’t expect a release candidate before the European Drupalcon and we would have rather had someone like Greg take the time to review, but Packt had another opinion :) Since most of CMI changes were finished, we didn’t feel too uncomfortable about the time of publishing, and it was also kind of nice to finally be done with this thing :) So it took a little over a year from start to finish. It was published on March 24th.
  • Do you expect to need to rewrite anything between now and when 8.0 is released?
The Book: Drupal 8 Configuration Management
  • What do you cover in the book?
    • We start of with a basic introduction to what Configuration Management in Drupal means, because it is a thing in Software development in general, that doesn’t usually refer to what it is in Drupal, where it basically just means that configuration is saved in files which makes deployment easier. In the first chapters, we make sure the reader understands what Configuration Management means and why version control is so important. We mention some best practices and then show how to use it for non-coders as well, since there’s a nice backend non-technical folks can use, even if you don’t use version control (which of course we don’t recommend). We also have a part that describes how managing configuration works in Drupal 7 (Features!) and then dive into code examples, explaining schema files, showing how to add configuration to a custom module, how to upgrade Drupal 7 variables to the new system and cover configuration management for multilingual sites.
  • Who is the target audience of the book?
  • Why did you decide to write about Configuration Management?
    • We have used Features to deploy configuration changes for a very long time, I don’t recall not using it since we started the company 5 years ago. We have talked about it at several DrupalCamps and Drupal User Groups and always tried to convince everyone to use it. We were really excited about the Configuration Management Initiative and thought it was a very good fit for us.
  • Before we started recording, you mentioned that there is a companion website to the book. Can you talk about what content we’ll find there, and what purpose that serves?
  • Are you building any sites in D8 at Undpaul?
Episode Links: Anja on drupal.orgAnja on TwitterStefan on drupal.orgStefan on TwitterWhere to buy the bookThe website for the bookundpaul on Twitterundpaul Instagramundpaul websiteTags: Drupal 8Bookplanet-drupal
Categories: FLOSS Project Planets

A. Jesse Jiryu Davis: Server Discovery And Monitoring In PyMongo, Perl, And C

Planet Python - Tue, 2015-05-19 23:09

(Cross-posted from the MongoDB Blog.)

How does a MongoDB driver discover and monitor a single server, a set of mongos servers, or a replica set? How does it determine what types of servers they are? How does it keep this information up to date? How does it discover an entire replica set given an initial host list, and how does it respond to stepdowns, elections, reconfigurations, network error, or the loss of a server?

In the past each MongoDB driver answered these questions a little differently, and mongos differed a little from the drivers. We couldn't answer questions like, "Once I add a secondary to my replica set, how long does it take for the driver to start using it?" Or, "How does a driver detect when the primary steps down, and how does it react?"

To standardize our drivers, I wrote the Server Discovery And Monitoring Spec, with David Golden, Craig Wilson, Jeff Yemin, and Bernie Hackett. Beginning with this spring's next-generation driver releases, all our drivers conform to the spec and answer these questions the same. Or, where there's a legitimate reason for them to differ, there are as few differences as possible and each is clearly explained in the spec. Even in cases where several answers seem equally good, drivers agree on one way to do it.

The spec describes how a driver monitors a topology:

Topology: The state of your deployment. What type of deployment it is, which servers are available, and what type of servers (mongos, primary, secondary, ...) they are.

The spec covers all MongoDB topologies, but replica sets are the most interesting. So I'll explain the spec's algorithm for replica sets by telling the story of your application as it passes through life stages: it starts up, discovers a replica set, and reaches a steady state. Then there is a crisis—I spill coffee on your primary server's motherboard—and a resolution—the replica set elects a new primary and the driver discovers it.

At each stage we'll observe a typical multi-threaded driver, PyMongo 3.0, a typical single-threaded driver, the Perl Driver 1.0, and a hybrid, the C Driver 1.2. (I implemented PyMongo's server discovery and monitoring. David Golden wrote the Perl version, and Samantha Ritter and Jason Carey wrote the one in C.)

To conclude, I'll tell you our strategy for verifying spec compliance in ten programming languages, and I'll share links for further reading.


When your application initializes, it creates a MongoClient. In Python:

client = MongoClient( 'mongodb://hostA,hostB/?replicaSet=my_rs')

In Perl:

my $client = MongoDB::MongoClient->new({ host => "mongodb://hostA,hostB/?replicaSet=my_rs" });

In C, you can either create a client directly:

mongoc_client_t *client = mongoc_client_new ( "mongodb://hostA,hostB/?replicaSet=my_rs");

Or create a client pool:

mongoc_client_pool_t *pool = mongoc_client_pool_new ( "mongodb://hostA,hostB/?replicaSet=my_rs"); mongoc_client_t *client = mongoc_client_pool_pop (pool);

A crucial improvement of the next gen drivers is, the constructor no longer blocks while it makes the initial connection. Instead, the constructor does no network I/O. PyMongo launches a background thread per server (two threads in this example) to initiate discovery, and returns control to your application without blocking. Perl does nothing until you attempt an operation; then it connects on demand.

In the C Driver, if you create a client directly it behaves like the Perl Driver: it connects on demand, on the main thread. But the C Driver's client pool launches one background thread to discover and monitor all servers.

The spec's "no I/O in constructors" rule is a big win for web applications that use our next gen drivers: In a crisis, your app servers might be restarted while your MongoDB servers are unreachable. Your application should not throw an error at startup, when it constructs the client object. It starts up disconnected and tries to reach your servers until it succeeds.


The initial host list you provide is called the "seed list":

Seed list: The initial list of server addresses provided to the MongoClient.

The seed list is the stepping-off point for the driver's journey of discovery. As long as one seed is actually an available replica set member, the driver will discover the whole set and stay connected to it indefinitely, as described below. Even if every member of the set is replaced with a new host, like the Ship of Theseus, it is still the same replica set and the driver remains connected to it.

I tend to think of a driver as a tiny economy of information about your topology. Monitoring supplies information, and your application's operations demand information. Their demands are defined in David Golden's Server Selection Spec, while the method of supplying information is defined here, in the Server Discovery And Monitoring Spec. In the beginning, there is no information, and the monitors rush to supply some. I'll talk more about the demand side later, in the "Crisis" section.


Let's start with PyMongo. In PyMongo, like other multi-threaded drivers, the MongoClient constructor starts one monitor thread each for "hostA" and "hostB".

Monitor: A thread or async task that occasionally checks the state of one server.

Each monitor connects to its assigned server and executes the "ismaster" command. Ignore the command's archaic name, which dates from the days of master-slave replication, long superseded by replica sets. The ismaster command is the client-server handshake. Let's say the driver receives hostB's response first:

ismaster = { "setName": "my_rs", "ismaster": false, "secondary": true, "hosts": [ "hostA:27017", "hostB:27017", "hostC:27017"]}

hostB confirms it belongs to your replica set, informs you that it is a secondary, and lists the members in the replica set config. PyMongo sees a host it didn't know about, hostC, so it launches a new thread to connect to it.

If your application threads are waiting to do any operations with the MongoClient, they block while awaiting discovery. But since PyMongo now knows of a secondary, if your application is waiting to do a secondary read, it can now proceed:

db = client.get_database( "dbname", read_preference=ReadPreference.SECONDARY) # Unblocks when a secondary is found. db.collection.find_one()

Meanwhile, discovery continues. PyMongo waits for ismaster responses from hostA and hostC. Let's say hostC responds next, and its response includes "ismaster": true:

ismaster = { "setName": "my_rs", "ismaster": true, "secondary": false, "hosts": [ "hostA:27017", "hostB:27017", "hostC:27017"]}

Now PyMongo knows the primary, so all reads and writes are unblocked. PyMongo is still waiting to hear back from hostA; once it does, it can use hostA for secondary reads as well.


Multithreaded Perl code is problematic, so the Perl Driver doesn't launch a thread per host. How, then does it discover your set? When you construct a MongoClient it does no I/O. It waits for you to begin an operation before it connects. Once you do, it scans the hosts serially, initially in random order.

Scan: A single-threaded driver's process of checking the state of all servers.

Let's say the driver begins with hostB, a secondary. Here's a detail I didn't show you earlier: replica set members tell you who they think the primary is. HostB's reply includes "primary": "hostC:27017":

ismaster = { "setName": "my_rs", "ismaster": false, "secondary": true, "primary": "hostC:27017", "hosts": [ "hostA:27017", "hostB:27017", "hostC:27017"]}

The Perl Driver uses this hint to put hostC next in the scan order, because connecting to the primary is its top priority. It checks hostC and confirms that it's primary. Finally, it checks hostA to ensure it can connect, and discovers that hostA is another secondary. Scanning is now complete and the driver proceeds with your application's operation.


The C driver has two modes for server discovery and monitoring: single-threaded and pooled. Single-threaded mode is optimized for embedding the C Driver within languages like PHP: PHP applications deploy many single-threaded processes connected to MongoDB. Each process uses the same connections to scan the topology as it uses for application operations, so the total connection count from many processes is kept to a minimum.

Other applications should use pooled mode: as we shall see, in pooled mode a background thread monitors the topology, so the application need not block to scan it.

C Driver's single-threaded mode

The C driver scans servers on the main thread, if you construct a single client:

mongoc_client_t *client = mongoc_client_new ( "mongodb://hostA,hostB/?replicaSet=my_rs");

In single-threaded mode, the C Driver blocks to scan your topology periodically with the main thread, just like the Perl Driver. But unlike the Perl Driver's serial scan, the C Driver checks all servers in parallel. Using a non-blocking socket per member, it begins a check on each member concurrently, and uses the asynchronous "poll" function to receive events from the sockets, until all have responded or timed out. The driver updates its topology as ismaster calls complete. Finally it ends the scan and returns control to your application.

Whereas the Perl Driver's topology scan lasts for the sum of all server checks (including timeouts), the C Driver's topology scan lasts only the maximum of any one check's duration, or the connection timeout setting, whichever is shorter. Put another way, in single-threaded mode the C Driver fans out to begin all checks concurrently, then fans in once all checks have completed or timed out. This "fan out, fan in" topology scanning method gives the C Driver an advantage scanning very large replica sets, or sets with several high-latency members.

C Driver's pooled mode

To activate the C Driver's pooled mode, make a client pool:

mongoc_client_pool_t *pool = mongoc_client_pool_new ( "mongodb://hostA,hostB/?replicaSet=my_rs"); mongoc_client_t *client = mongoc_client_pool_pop (pool);

The pool launches one background thread for monitoring. When the thread begins, it fans out and connects to all servers in the seed list, using non-blocking sockets and a simple event loop. As it receives ismaster responses from the servers, it updates its view of your topology, the same as a multi-threaded driver like PyMongo does. When it discovers a new server it begins connecting to it, and adds the new socket to the list of non-blocking sockets in its event loop.

As with PyMongo, when the C Driver is in background-thread mode, your application's operations are unblocked as soon as monitoring discovers a usable server. For example, if your C code is blocked waiting to insert into the primary, it is unblocked as soon as the primary is discovered, rather than waiting for all secondaries to be checked too.

Steady State

Once the driver has discovered your whole replica set, it periodically re-checks each server. The periodic check is necessary to keep track of your network latency to each server, and to detect when a new secondary joins the set. And in some cases periodic monitoring can head off errors, by proactively discovering when a server is offline.

By default, the monitor threads in PyMongo check their servers every ten seconds, as does the C Driver's monitor in background-thread mode. The Perl driver, and the C Driver in single-threaded mode, block your application to re-scan the replica set once per minute.

If you like my supply-and-demand model of a driver, the steady state is when your application's demand for topology information is satisfied. The driver occasionally refreshes its stock of information to make sure it's ready for future demands, but there is no urgency.


So I wander into your data center, swirling my cappuccino, and I stumble and spill it on hostC's motherboard. Now your replica set has no primary. What happens next?

When your application next writes to the primary, it gets a socket timeout. Now it knows the primary is gone. Its demand for information is no longer in balance with supply. The next attempt to write blocks until a primary is found.

To meet demand, the driver works overtime. How exactly it responds to the crisis depends on which type of monitoring it uses.

Multi-threaded: In drivers like PyMongo, the monitor threads wait only half a second between server checks, instead of ten seconds. They want to know as soon as possible if the primary has come back, or if one of the secondaries has been elected primary.

Single-threaded: Drivers like the Perl Driver sleep half a second between scans of the topology. The application's write operation remains blocked until the driver finds the primary.

C Driver Single-Threaded: In single-threaded mode, the C Driver sleeps half a second between scans, just like the Perl Driver. During the scan the driver launches non-blocking "ismaster" commands on all servers concurrently, as I described above.

C Driver Pooled Mode: Each time the driver's monitor thread receives an ismaster response, schedules that server's next ismaster call on the event loop only a half-second in the future.


Your secondaries, hostA and hostB, promptly detect my sabotage of hostC, and hold an election. In MongoDB 3.0, the election takes just a couple seconds. Let's say hostA becomes primary.

A half second or less later, your driver rechecks hostA and sees that it is now the primary. It unblocks your application's writes and sends them to hostA. In PyMongo, the monitor threads relax, and return to their slow polling strategy: they sleep ten seconds between server checks. Same for the C Driver's monitor in background-thread mode. The Perl Driver, and the C Driver in single-threaded mode, do not rescan the topology for another minute. Demand and supply are once again in balance.

Compliance Testing

I am particularly excited about the unit tests that accompany the Server Discovery And Monitoring Spec. We have 38 tests that are specified formally in YAML files, with inputs and expected outcomes for a range of scenarios. For each driver we write a test runner that feeds the inputs to the driver and verifies the outcome. This ends confusion about what the spec means, or whether all drivers conform to it. You can track our progress toward full compliance in MongoDB's issue tracker.

Further Study

The spec is long but tractable. It explains the monitoring algorithm in very fine detail. You can read a summary, and the spec itself, here:

Its job is to describe the demand side of the driver's information economy. For the supply side, read my colleague David Golden's article on his Server Selection Spec.

Categories: FLOSS Project Planets

LaKademy 2015 – here we go!

Planet KDE - Tue, 2015-05-19 22:10

Hi there,

Everything is ready for the 3rd edition of LaKademy – The KDE Latin America Summit \o/. The meeting will take place from 03-06 June, 2015, in Salvador, north-eastern Brazil. Besides of being the city where I live in :), it was the venue of the 1st Akademy-BR in 2010, when we began some efforts to create and then expand the culture of KDE hacking sprints in Brazil and, after, in Latin-America. Hence, we are now somewhat with that cosy feeling of returning to the grandma’s house for a portion of home-made cookies :). For this year, we decided on having only hacking sessions and quick meetings, rather than talks and/or introductory short-courses. We want to leverage contributions and have more things done during these four nice days of LaKademy 2015. We aren’t, by any means, alien to newcomers, though. The LaKademy 2015’s Call for Participation was already announced and everyone interested in knowing more about KDE contributions may join us at the hacking sessions, ask questions, get involved, and have fun.

For these four days, seven KDE contributors (and, hopefully, some visitors) will meet at the Information Technology Offices of the Federal University of Bahia. We are still settling the details of the program, but I would like to revisit some stuff I’ve done for KDevelop in the past, Filipe should keep working in Cantor enhancements, Lamarque in Plasma Network Manager, and Aracele in translation and promo stuff. As usual, we have also a promo meeting involving all participants where we set the plans for conquering the world with KDE :).

Keep tuned for upcoming news about LaKademy 2015 ! See you …

Categories: FLOSS Project Planets

Eddy Petri&#537;or: Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 2 - RedBoot reverse engineering and APEX hacking

Planet Debian - Tue, 2015-05-19 21:12
(continuation of Linksys NSLU2 adventures into the NetBSD land passed through JTAG highlands - part 1; meanwhile, my article was mentioned briefly in BSDNow Episode 89 - Exclusive Disjunction around minute 36:25)

Choosing to call RedBoot from a hacked Apex
As I was saying in my previous post, in order to be able to automate the booting of the NetBSD image via TFTP I opted for using a 2nd stage bootloader (planning to flash it in the NSLU2 instead of a Linux kernel), and since Debian was already using Apex, I chose Apex, too.

The first problem I found was that the networking support in Apex was relying on an old version of the Intel NPE library which I couldn't find on Intel's site, the new version was incompatible/not building with the old build wrapper in Apex, so I was faced with 3 options:
  1. Fight with the availabel Intel code and try to force it to compile in Apex
  2. Incorporate the NPE driver from NetBSD into a rump kernel to be included in Apex instead of the original Intel code, since the NetBSD driver only needed an easily compilable binary blob
  3. Hack together an Apex version that simulates the typing necessary RedBoot commands to load via TFTP the netbsd image and execute it.
After taking a look at the NPE driver buildsystem, I concluded there were very few options less attractive that option 1, among which was hammering nails through my forehead as a improvement measure against the severe brain damage which I would probably be likely to be influcted with after dealing with the NPE "build system".

Option 2 looked like the best option I could have, given the situation, but my NetBSD foo was too close to 0 to even dream to endeavor on such a task. In my evaluation, this still remains the technically superior solution to the problem since is very portable, and flexible way to ensure networking works in spite of the proprietary NPE code.

But, in practice, the best option I could implement at the time was option 3. I initially planned to pre-fill from Apex the RedBoot buffer that the stored the keyboard strokes with my desired commands:

load -r -b 0x200000 -h netbsd-nfs.bin
gSince this was the first time ever for me I was going to do less than trivial reverse engineering in order to find the addresses and signatures of interesting functions in the RedBoot code, it wasn't bad at all that I had a version of the RedBoot source code.

When stuck with reverse engineering, apply JTAG
The bad thing was that the code Linksys published as the source of the RedBoot running inside the NSLU2 was, in fact, a different code which had some significant changes around the code pieces I was mostly interested in. That in spite of the GPL terms.

But I thought that I could manage in spite of that. After all, how hard could it be to identify the 2-3 functions I was interested in and 1 buffer? Even if I only had the disassembled code from the slug, I shouldn't be that hard.

I struggled with this for about 2-3 weeks on the few occasions I had during that time, but the excitement of leaning something new kept me going. Until I got stuck somewhere between the misalignment between the published RedBoot code and the disassembled code, the state of the system at the time of dumping the contents from RAM (for purposes of disassemby), the assembly code generated by GCC for some specific C code I didn't have at all, and the particularities of ARM assembly.

What was most likely to unblock me was to actually see the code in action, so I decided attaching a JTAG dongle to the slug and do a session of in-circuit-debugging was in order.

Luckily, the pinout of the JTAG interface was already identified in the NSLU2 Linux project, so I only had to solder some wires to the specified places and a 2x20 header to be able to connect through JTAG to the board.

JTAG connections on Kinder (the NSLU2 targeting NetBSD)
After this was done I tried immediately to see if when using a JTAG debugger I could break the execution of the code on the system. The answer was sadly, no.

The chip was identified, but breaking the execution was not happening. I tried this in OpenOCD and in another proprietary debugger application I had access to, and the result was the same, breaking was not happening.
$ openocd -f interface/ftdi/olimex-arm-usb-ocd.cfg -f board/linksys_nslu2.cfgOpen On-Chip Debugger 0.8.0 (2015-04-14-09:12)Licensed under GNU GPL v2For bug reports, read    http://openocd.sourceforge.net/doc/doxygen/bugs.htmlInfo : only one transport option; autoselect 'jtag'adapter speed: 300 kHzInfo : ixp42x.cpu: hardware has 2 breakpoints and 2 watchpoints0Info : clock speed 300 kHzInfo : JTAG tap: ixp42x.cpu tap/device found: 0x29277013 (mfg: 0x009,part: 0x9277, ver: 0x2)[..]
$ telnet localhost 4444Trying ::1...Trying to localhost.Escape character is '^]'.Open On-Chip Debugger> halttarget was in unknown state when halt was requestedin procedure 'halt'> pollbackground polling: onTAP: ixp42x.cpu (enabled)target state: unknown Looking into the documentation I found a bit of information on the XScale processors[X] which suggested that XScale processors might necessarily need the (otherwise optional) SRST signal on the JTAG inteface to be able to single step the chip.

This confused me a lot since I was sure other people had already used JTAG on the NSLU2.

The options I saw at the time were:
  1. my NSLU2 did have a fully working JTAG interface (either due to the missing SRST signal on the interface or maybe due to a JTAG lock on later generation NSLU-s, as was my second slug)
  2. nobody ever single stepped the slug using OpenOCD or other JTAG debugger, they only reflashed, and I was on totally new ground
I even contacted Rod Whitby, the project leader of the NSLU2 project to try to confirm single stepping was done before. Rod told me he never did that and he only reflashed the device.

This confused me even further because, from what I encountered on other platforms, in order to flash some device, the code responsible for programming the flash is loaded in the RAM of the target microcontroller and that code is executed on the target after a RAM buffer with the to be flashed data is preloaded via JTAG, then the operation is repeated for all flash blocks to be reprogrammed.

I was aware it was possible to program a flash chip situated on the board, outside the chip, by only playing with the chip's pads, strictly via JTAG, but I was still hoping single stepping the execution of the code in RedBoot was possible.

Guided by that hope and the possibility the newer versions of the device to be locked, I decided to add a JTAG interface to my older NSLU2, too. But this time I decided I would also add the TRST and SRST signals to the JTAG interface, just in case single stepping would work.

This mod involved even more extensive changes than the ones done on the other NSLU, but I was so frustrated by the fact I was stuck that I didn't mind poking a few holes through the case and the prospect of a connector always sticking out from the other NSLU2 which was doing some small, yet useful work in my home LAN.

It turns out NOBODY single stepped the NSLU2 After biting the bullet and soldering JTAG interface with also the TRST and the SRST signals connected as the pinout page from the NSLU2 Linux wiki suggested, I was disappointed to observe that I was not able to single step the older either, in spite of the presence of the extra signals.

I even tinkered with the reset configurations of OpenOCD, but had not success. After obtaining the same result on the proprietary debugger and digging through a presentation made by Rod back in the hay day of the project and the conversations on the NSLU2 Linux Yahoo mailing list I finally concluded:
Actually nobody single stepped the NSLU2, no matter the version of the NSLU2 or connections available on the JTAG interface!So I was back to square 1, I had to either struggle with disassembly, reevaluate my inital options, find another option or even drop entirely the idea. Since I was already committed to the project dropping entirely the idea didn't seem like the reasonable thing to do.

Since I was feeling I was really close to finish on the route I chose a while ago, was not any significantly more knowledgeable in the NetBSD code and looking at the NPE code made me feel like washing my hands, the only option I saw was to go on.

Digging a lot more through the internet, I was finally able to find another version of the RedBoot source which was modified for Intel ixp42x systems. A few checks here and there revealed this newly found code was actually almost identical to the code I had disassembled from the slug I was aiming to run NetBSD on.

Long story short, a couple of days later I had a hacked Apex that could go through the RedBoot data structures, search for available commands in RedBoot and successfully call any of the built-in RedBoot commands!

Testing with loading this modified Apex by hand in RAM via TFTP then jumping into it to see if things woked as expected revealed a few small issues which I corrected right away.

Flashing a modified RedBoot?! But why? Wasn't Apex supposed to avoid exactly that risky operation?
Since the tests when executing from RAM were successful, my custom second stage Apex bootloader for NetBSD net booting was ready to be flashed into the NSLU2.

I added two more targets in the Makefile in the code on the dedicated netbsd branch of my Apex repository to generate the images ready for flashing into the NSLU2 flash (RedBoot needs to find a Sercomm header in flash, otherwise it will crash) and the exact commands to be executed in RedBoot are also print out after generation. This way, if the command is copy-pasted, there is no risk the NSLU2 is bricked by mistake.

After some flashing and reflashing of the apex_nslu2.flash image into the NSLU2 flash, some manual testing, tweaking and modifying the default built in APEX commands, checking that the sequence of commands 'move', 'go 0x01d00000' would jump into Apex, which, in turn, would call RedBoot to transfer the netbsd-nfs.bin image from a TFTP to RAM and then execute it successfully, it was high time to check NetBSD would boot automatically after the NSLU is powered on.

It didn't. Contrary to my previous tests, no call made from Apexto the RedBoot code would return back to Apex, not even a basic execution of the 'version' command.

It turns out the default commands hardcoded into RedBoot were 'boot; exec 0x01d00000', but I had tested 'boot; go 0x01d0000', which is not the same thing.

While 'go' does a plain jump at the specified address, the 'exec' command also does some preparations so it allows a jump into the Linux kernel and those preparations break some environment the RedBoot commands expect.

So the easiest solution was to change the RedBoot's built-in command and turn that 'exec' into a 'go'. But that meant this time I was actually risking to brick the NSLU, unless I
was able to reflash via JTAG the NSLU2.

(to be continued - next, changing RedBoot and bisecting through the NetBSD history)

[X] Linksys NSLU2 has an XScale IXP420 processor which is compatible at ASM level with the ARMv5TEJ instruction set
Categories: FLOSS Project Planets

Gunnar Wolf: Feeling somewhat special

Planet Debian - Tue, 2015-05-19 19:36

Today I feel more special than I have ever felt.

Or... Well, or something like that.

Thing is, there is no clear adjective for this — But I successfully finished my Specialization degree! Yes, believe it or not, today I can formally say I am Specialist in Informatic Security and Information Technologies (Especialista en Seguridad Informática y Tecnologías de la Información), as awarded by the Higher School of Electric and Mechanic Engineering (Escuela Superior de Ingeniería Mecánica y Eléctrica) of the National Polytechnical Institute (Instituto Politécnico Nacional).

In Mexico and most Latin American countries, degrees are usually incorporated to your name as if they were a nobiliary title. Thus, when graduating from Engineering studies (pre-graduate universitary level), I became "Ingeniero Gunnar Wolf". People graduating from further postgraduate programs get to introduce themselves as "Maestro Foobar Baz" or "Doctor Quux Noox". And yes, a Specialization is a small posgraduate program (I often say, the smallest possible posgraduate). And as a Specialist... What can I brag about? Can say I am Specially Gunnar Wolf? Or Special Gunnar Wolf? Nope. The honorific title for a Specialization is a pointer to null, and when casted into a char* it might corrupt your honor-recognizing function. So I'm still Ingeniero Gunnar Wolf, for information security reasons.

So that's the reason I am now enrolled in the Masters program. I hope to write an addenda to this message soonish (where soonish ≥ 18 months) saying I'm finally a Maestro.

As a sidenote, many people asked me: Why did I take on the specialization, which is a degree too small for most kinds of real work recognition? Because it's been around twenty years since I last attended a long-term scholar program as a student. And my dish is quite full with activities and responsabilities. I decided to take a short program, designed for 12 months (I graduated in 16, minus two months that the university was on strike... Quite good, I'd say ;-) ) to see how I fared on it, and only later jumping on the full version.

Because, yes, to advance my career at the university, I finally recognized and understood that I do need postgraduate studies.

Oh, and what kind of work did I do for this? Besides the classes I took, I wrote a thesis on a model for evaluating covert channels for establishing secure communications.

Categories: FLOSS Project Planets

Graham Dumpleton: Returning a string as the iterable from a WSGI application.

Planet Python - Tue, 2015-05-19 17:07
The possible performance consequences of returning many separate data blocks from a WSGI application were covered in the previous post. In that post the WSGI application used as an example was one which returned the contents of a file as many small blocks of data. Part of the performance problems seen arose due to how the WSGI servers would flush each individual block of data out, writing it onto
Categories: FLOSS Project Planets

Gizra.com: Visual regression tests on every commit

Planet Drupal - Tue, 2015-05-19 17:00

As we dive deeper into visual regression testing in our development workflow we realize a sad truth: on average, we break our own CSS every week and a half.

Don't feel bad for us, as in fact I'd argue that it's pretty common across all web projects - they just don't know it. It seems we all need a system that will tell us when we break our CSS.

While we don't know of a single (good) system that does this, we were able to connect together a few (good) systems to get just that, with the help of: Travis-CI, webdriverCSS, Shoov.io, BrowserStack/Sauce Labs, and ngrok. Oh my!

Don't be alarmed by the long list. Each one of these does one thing very well, and combining them together was proven to be not too complicated, nor too costly.

You can jump right into the .travis file of the Gizra repo to see its configuration, or check the webdriverCSS test. Here's the high level overview of what we're doing:

Gizra.com is built on Jekyll but visual regression could be executed on every site, regardless of the underlying technology. Travis is there to help us build a local installation. Travis also allows adding encrypted keys, so even though the repo is public, we were able to add our Shoov.io and ngrok access tokens in a secure way.

We want to use services such as BrowserStack or Sauce-Labs to test our local installation on different browsers (e.g. latest chrome and IE11). For that we need to have an external URL accessible by the outside world, which is where ngrok comes in: ngrok http -log=stdout -subdomain=$TRAVIS_COMMIT 9000 from the .travis.yml file exposes our Jekyll site inside the Travis box to a unique temporary URL based on the Git commit (e.g. https://someCommitHash.ngrok.io).

WebdriverCSS tests are responsible for capturing the screenshots, and comparing them against the baseline images. If a regression is found, it will be automatically pushed to Shoov, and a link to the regression would be provided in the Travis log. This means that if a test was broken, we can immediately see where's the regression and figure out if it is indeed a bug - or, if not, replace the baseline image with the "regression" image.

Visual regression found and uploaded to shoov.io

Continue reading…

Categories: FLOSS Project Planets

Mediacurrent: Contrib Committee Status Review for April, 2015

Planet Drupal - Tue, 2015-05-19 16:47

The fourth month of the year brought reminders that Winter can show up at unexpected times, with snow flurries during the early parts of the month. It also that we can only juggle so much. With many of us involved in organizing regional events and preparing for Drupalcon, our code contributions waned for a second month, down to a rather low 20 hours.

Categories: FLOSS Project Planets

Drupalpress, Drupal in the Health Sciences Library at UVA: executing an r script with bash

Planet Drupal - Tue, 2015-05-19 16:43

Here’s a tangent:

Let’s say you need to randomly generate a series of practice exam questions. You have a bunch of homework assignments, lab questions and midterms, all of which are numbered in a standard way so that you can sample from them.

Here’s a simple R script to run those samples and generate a practice exam that consists of references to the assignments and their original numbers.

## exam prep script ## build hw data j <- 1 hw <- data.frame(hw_set = NA, problm = seq(1:17)) for (i in seq(1:12)) { hw[j,1] <- paste0("hw",j) j <- j+1 } library(tidyr) hw <- expand(hw) names(hw) <- c("problm_set", "problm") ## build exam data j <- 1 exam <- data.frame(exam_num = NA, problm = seq(1:22)) for (i in seq(1:8)) { exam[j,1] <- paste0("exam",j) j <- j+1 } library(tidyr) exam <- expand(exam) names(exam) <- c("problm_set", "problm") ## create practice exam prctce <- rbind(exam,hw) prctce_test <- prctce[sample(1:nrow(prctce), size=22),] row.names(prctce_test) <- 1:nrow(prctce_test) print(prctce_test)

As the last line indicates, the final step of the script is to output a prctce_test … that will be randomly generated each time the script is run, but may include duplicates over time.

Sure. Fine. Whatever.

Probably a way to do this with Drupal … or with Excel … or with a pencil and paper … why use R?

Two reasons: 1) using R to learn R and 2) scripting this simulation let’s you automate things a little bit easier.

In particular, you can use something like BASH to execute the script n number of times.

for n in {1..10}; do Rscript examprep.R > "YOUR_PATH_HERE/practice${n}.txt"; done

That will give you 10 practice test txt files that are all named with a tokenized number, with just one command. And of course that could be written into a shell script that’s automated or processed on a scheduler.

Sure. Fine. Whatever.

OK. While this is indeed a fairly underwhelming example, the potential here is kind of interesting. Our next step is to investigate using Drupal Rules to initiate a BASH script that in turn executes an algorithm written in R. The plan is to also use Drupal as the UI for entering the data to be processed in the R script.

Will document that here if/when that project comes together.

Categories: FLOSS Project Planets

Ian Ozsvald: Data Science Deployed &#8211; Opening Keynote for PyConSE 2015

Planet Python - Tue, 2015-05-19 16:03

I’ve just had a fab couple of days at PyConSE in Stockholm, I really enjoyed giving the opening keynote (thanks!) and attending two days of interesting talks. The Saturday was packed with data science talks (see below), it felt like a mini PyData or EuroSciPy, most cool!

The goal of my talk was to show use-cases for why you should do data science, why it is valuable, how to do it successfully with Python and how get the data products deployed. The whole shebang in 40 minutes. Tools mentioned include scikit-learn, statsmodels, textract, pandas, matplotlib, seaborn, bokeh, IPython and Notebooks, Spyder, PyCharm, Flask and Spyre.

My main points seemed to make it through, phew!

What I take from @ianozsvald talk:
“How can i turn our data into business value?”
“Log everything!”
Think + hypothesize + test @pythse

Exploiting your data is key to staying relevant in your business! Listening to @ianozsvald at #pyconse @scalior

Note – I’ll be updating this write-up a little over the next couple of days (it is the end of the conf and I’m rather shattered right now!).

The slides for my Data Science Deployed talk are below, I’ll link to the video once it is available:

I’d like to acknowledge Ferenc Huszár (Balderton) and Thomas Stone (Prediction.io) for feedback on early ideas for my talk – cheers gents!

I also plugged PyDataBerlin, our upcoming PyDataLondon (June 19-21, CfP open for just 1 more week) and EuroSciPy on stage, hopefully we’ll see a few more international visitors. I should also have plugged PyConUK too as there’s now a Science Track too!

The following talks from yesterday will interest you, I hope the videos come online soon:

  • Analyzing data with Pandas
  • Data processing and machine learning with Python (slides)
  • Deep Learning and Deep Data Science
  • Hacking Human Language
  • IPython: How a notebook is changing science
  • The Hitchhikers Guide to Python

Here’s a couple of extra links that might be interesting:

Here’s Ilian Iliev’s review of the conference too.

I have a vague idea to write-up these topics more in the future, I’m calling this Building Data Science Products with Python. There’s a mailing list, I’ll email to ask questions a little over the coming months to figure out if/how I should write this.

Thanks everyone for a lovely conference!

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.
Categories: FLOSS Project Planets

Lars Wirzenius: Software development estimation

Planet Debian - Tue, 2015-05-19 15:50

Acceptable estimations for software development:

  • Almost certainly doable in less than a day.
  • Probably doable in less than a day, almost certainly not going to take more than three days.
  • Probably doable in less than a week, but who knows?
  • Certainly going to take longer than a week, and nobody can say how long, but if you press me, the estimate is between two weeks and four months.

Reality prevents better accuracy.

Categories: FLOSS Project Planets

Timothy Potter: Integrating Storm and Solr

Planet Apache - Tue, 2015-05-19 14:13
In this post I introduce a new open source project provided by Lucidworks for integrating Solr and Storm. Specifically, I cover features such as micro-buffering, data mapping, and how to send custom JSON documents to Solr from Storm. I assume you have a basic understanding of how Storm works, but if you need a quick refresher, please review the Storm concepts documentation. As you read through this post, it will help to have the project source code on your local machine. After cloning https://github.com/LucidWorks/storm-solr, simply do: mvn clean package. This will create the unified storm-solr-1.0.jar in the target/ directory for the project. The project discussed here started out as a simple bolt for indexing documents in Solr. My first pass at creating Solr bolt was quite simple, but then a number of questions came up that made my simple bolt not quite so simple. For instance, how do I …
  • Separate my application business logic from Storm boilerplate code?
  • Unit test application logic in my bolts and spouts?
  • Run a topology locally while developing?
  • Configure my Solr bolt to specify environment-specific settings like the ZooKeeper connection string needed by SolrCloud?
  • Package my topology into something that can be deployed to a Storm cluster?
  • Measure the performance of my components at runtime?
  • Integrate with other services and databases when building a real-world topology?
  • Map Tuples in my topology to a format that Solr can process?
This is just a small sample of the types of questions that arise when building a high-performance streaming application with Storm. I quickly realized that I needed more than just a Solr bolt. Hence, the project evolved into a toolset that makes it easy to integrate Storm and Solr, as well as addressing all of the questions raised above. I’ll spare you the nitty-gritty details of the framework supporting Solr integration with Storm. If you’re interested, the README for the project contains more details about how the framework was designed. Packaging and Running a Storm Topology To begin, let’s understand how to run a topology in Storm. Effectively, there are two basic modes of running a Storm topology: local and cluster mode. Local mode is great for testing your topology locally before pushing it out to a remote Storm cluster, such as staging or production. For starters, you need to compile and package your code and all of its dependencies into a unified JAR with a main class that runs your topology. For this project, I use the Maven Shade plugin to create the unified JAR with dependencies. The benefit of the Shade plugin is that it can relocate classes into different packages at the byte-code level to avoid dependency conflicts. This comes in quite handy if your application depends on 3rd party libraries that conflict with classes on the Storm classpath. You can look at the project pom.xml file for specific details about I use the Shade plugin. For now, let it suffice to say that the project makes it very easy to build a Storm JAR for your application. Once you have a unified JAR (storm-solr-1.0.jar), you’re ready to run your topology in Storm. The project includes a main class named com.lucidworks.storm.StreamingApp that allows you to run a topology locally or in a remote Storm cluster. Specifically, StreamingApp provides the following:
  • Separates the process of defining a Storm topology from the process of running a Storm topology in different environments. This lets you focus on defining a topology for your specific requirements.
  • Provides a clean mechanism for separating environment-specific configuration settings.
  • Minimizes duplicated boilerplate code when developing multiple topologies and gives you a common place to insert reusable logic needed for all of your topologies.
To use StreamingApp, you simply need to implement the StormTopologyFactory interface, which defines the spouts and bolts in your topology: public interface StormTopologyFactory { String getName(); StormTopology build(StreamingApp app) throws Exception; } Let’s look at a simple example of a StormTopologyFactory implementation that defines a topology for indexing tweets into Solr: class TwitterToSolrTopology implements StormTopologyFactory { static final Fields spoutFields = new Fields("id", "tweet") String getName() { return "twitter-to-solr" } StormTopology build(StreamingApp app) throws Exception { // setup spout and bolts for accessing Spring-managed POJOs at runtime SpringSpout twitterSpout = new SpringSpout("twitterDataProvider", spoutFields); SpringBolt solrBolt = new SpringBolt("solrBoltAction", app.tickRate("solrBolt")); // wire up the topology to read tweets and send to Solr TopologyBuilder builder = new TopologyBuilder() builder.setSpout("twitterSpout", twitterSpout, app.parallelism("twitterSpout")) builder.setBolt("solrBolt", solrBolt, app.parallelism("solrBolt")) .shuffleGrouping("twitterSpout") return builder.createTopology() } } A couple of things should stand out to you in this listing. First, there’s no command-line parsing, environment-specific configuration handling, or any code related to running this topology. All that you see here is code defining a StormTopology; StreamingApp handles all the boring stuff for you. Second, the code is quite easy to understand because it only does one thing. Lastly, this class is written in Groovy instead of Java, which helps keep things nice and tidy and I find Groovy to be more enjoyable to write. Of course if you don’t want to use Groovy, you can use Java, as the framework supports both seamlessly. The following diagram depicts the TwitterToSolrTopology. A key aspect of the solution is the use of the Spring framework to manage beans that implement application specific logic in your topology and leave the Storm boilerplate work to reusable components: SpringSpout and SpringBolt. We’ll get into the specific details of the implementation shortly, but first, let’s see how to run the TwitterToSolrTopology using the StreamingApp framework. For local mode, you would do: java -classpath $STORM_HOME/lib/*:target/storm-solr-1.0.jar com.lucidworks.storm.StreamingApp \ example.twitter.TwitterToSolrTopology -localRunSecs 90 The command above will run the TwitterToSolrTopology for 90 seconds on your local workstation and then shutdown. All the setup work is provided by the StreamingApp class. To submit to a remote cluster, you would do: $STORM_HOME/bin/storm jar target/storm-solr-1.0.jar com.lucidworks.storm.StreamingApp \ example.twitter.TwitterToSolrTopology -env staging Notice that I’m using the -env flag to indicate I’m running in my staging environment. It’s common to need to run a Storm topology in different environments, such as test, staging, and production, so that’s built into the StreamingApp framework. So far, I’ve shown you how to define a topology and how to run it. Now let’s get into the details of how to implement components in a topology. Specifically, let’s see how to build a bolt that indexes data into Solr, as this illustrates many of the key features of the framework. SpringBolt In Storm, a bolt performs some operation on a Tuple and optionally emits Tuples into the stream. In the example Twitter topology definition above, we see this code: SpringBolt solrBolt = new SpringBolt("solrBoltAction", app.tickRate("solrBolt")); This creates an instance of SpringBolt that delegates message processing to a Spring-managed bean with ID “solrBoltAction”. The main benefit of the SpringBolt is it allows us to separate Storm-specific logic and boilerplate code from application logic. The com.lucidworks.storm.spring.SpringBolt class allows you to implement your bolt logic as a simple Spring-managed POJO (Plain Old Java Object). To leverage SpringBolt, you simply need to implement the StreamingDataAction interface: public interface StreamingDataAction { SpringBolt.ExecuteResult execute(Tuple input, OutputCollector collector); } At runtime, Storm will create one or more instances of SpringBolt per JVM. The number of instances created depends on the parallelism hint configured for the bolt. In the Twitter example, we simply pulled the number of tasks for the Solr bolt from our configuration: // wire up the topology to read tweets and send to Solr ... builder.setBolt("solrBolt", solrBolt, app.parallelism("solrBolt")) ... The SpringBolt needs a reference to the solrBoltAction bean from the Spring ApplicationContext. The solrBoltAction bean is defined in resources/storm-solr-spring.xml as: <bean id="solrBoltAction" class="com.lucidworks.storm.solr.SolrBoltAction" scope="prototype"> <property name="solrInputDocumentMapper" ref="solrInputDocumentMapper"/> <property name="maxBufferSize" value="${maxBufferSize}"/> <property name="bufferTimeoutMs" value="${bufferTimeoutMs}"/> </bean> There are a couple of interesting aspects of about this bean definition. First, the bean is defined with prototype scope, which means that Spring will create a new instance for each SpringBolt instance that Storm creates at runtime. This is important because it means your bean instance will only be accessed by one thread at a time so you don’t need to worry about thread-safety issues. Also notice that the maxBufferSize and bufferTimeoutMs properties are set using Spring’s dynamic variable resolution syntax, e.g. ${maxBufferSize}. These properties will be resolved during bean construction from a configuration file called resources/Config.groovy. When the SpringBolt needs a reference to solrBoltAction bean, it first needs to get the Spring ApplicationContext. The StreamingApp class is responsible for bootstrapping the Spring ApplicationContext using storm-solr-spring.xml. StreamingApp ensures there is only one Spring context initialized per JVM instance per topology as multiple topologies may be running in the same JVM. If you’re concerned about the Spring container being too heavyweight, rest assured there is only one container initialized per JVM per topology and bolts and spouts are long-lived objects that only need to be initialized once by Storm per task. Put simply, the overhead of Spring is quite minimal especially for long-running streaming applications. The framework also provides a SpringSpout that allows you to implement a data provider as a simple Spring-managed POJO. I’ll refer you to the source code for more details about SpringSpout but it basically follows the same design patterns as SpringBolt. Environment-specific Configuration I’ve implemented several production Storm topologies in the past couple years and one pattern that keeps emerging is the need to manage configuration settings for different environments. For instance, we’ll need to index into a different SolrCloud cluster for staging and production. To address this need, the Spring-driven framework allows you to keep all environment-specific configuration properties in the same configuration file, see resources/Config.groovy. Don’t worry if you don’t know Groovy, the syntax of the Config.groovy file is very easy to understand and allows you to cleanly separate properties for the following environments: test, dev, staging, and production. Put simply, this approach allows you to run the topology in multiple environments using a simple command-line switch to specify the environment settings that should be applied -env. Metrics Storm provides high-level metrics for bolts and spouts, but if you need more visibility into the inner workings of your application-specific logic, then it’s common to use the Java metrics library, see: https://dropwizard.github.io/metrics/3.1.0/. Fortunately, there are open source options for integrating metrics with Spring, see: https://github.com/ryantenney/metrics-spring. The Spring context configuration file resources/storm-solr-spring.xml comes pre-configured with all the infrastructure needed to inject metrics into your bean implementations. When implementing your StreamingDataAction (bolt) or StreamingDataProvider (spout), you can have Spring auto-wire metrics objects using the @Metric annotation when declaring metrics-related member variables. For instance, the SolrBoltAction class uses a Timer to track how long it takes to send batches to Solr. @Metric public Timer sendBatchToSolr; The SolrBoltAction class provides several examples of how to use metrics in your bean implementations. At this point you should have a basic understanding of the main features of the framework. Now let’s turn our attention to some Solr-specific features. Micro-buffering and Ack’ing Input Tuples It’s possible that thousands of documents per second will be flowing into each Solr bolt. To avoid sending too many requests into Solr and to avoid blocking too much in the topology, the bolt uses an internal buffer to send documents to Solr in small batches. This helps reduce the number of network round-trips between your bolt and Solr. The bolt supports a maximum buffer size setting to control when the buffer should be flushed, which defaults to 100. Buffering poses two basic issues in a streaming topology. First, you’re likely using Storm to power a near real-time data processing application, so we don’t want to delay documents from getting into Solr for too long. To support this, the bolt supports a buffer timeout setting that indicates when a buffer should be flushed to ensure documents flow into Solr in a timely manner. Consequently, the buffer will be flushed when either the size threshold or the time limit is reached. There is a subtle side-effect that would normally require a background thread to flush the buffer if there was some delay in messages being sent into the bolt by upstream components. Fortunately, Storm provides a simple mechanism that allows your bolt to receive a special type of Tuple on a periodic schedule, known as a TickTuple. Whenever the SolrBoltAction bean receives a TickTuple, it checks to see if the buffer needs to be flushed, which avoids holding documents for too long and alleviates the need for a background thread to monitor the buffer. Field Mapping The SolrBoltAction bean takes care of sending documents to SolrCloud in an efficient manner, but it only works with SolrInputDocument objects from SolrJ. It’s unlikely that your Storm topology will be working with SolrInputDocument objects natively, so the SolrBoltAction bean delegates mapping of input Tuples to SolrInputDocument objects to a Spring-managed bean that implements the com.lucidworks.storm.solr.SolrInputDocumentMapper interface. This fits nicely with our design approach of separating concerns in our topology. The default implementation provided in the project (DefaultSolrInputDocumentMapper) uses Java reflection to read data from a Java object to populate the fields of the SolrInputDocument. In the Twitter example, the default implementation uses Java reflection to read data from a Twitter4J Status object to populate dynamic fields on a SolrInputDocument instance. It should be clear, however, that you can inject your own SolrInputDocumentMapper implementation into the bolt bean using Spring if the default implementation does not meet your needs. JSON As of Solr 5, you can send arbitrary JSON documents to Solr and have it parse out documents for indexing. For more information about this cool feature in Solr, please see: http://lucidworks.com/blog/indexing-custom-json-data/ If you want to send arbitrary JSON objects to Solr and have it index documents during JSON parsing, you need to use the solrJsonBoltAction bean instead of solrBoltAction. For our Twitter example, you could define the solrJsonBoltAction bean as: <bean id="solrJsonBoltAction" class="com.lucidworks.storm.solr.SolrJsonBoltAction" scope="prototype"> <property name="split" value="/"/> <property name="fieldMappings"> <list> <value>$FQN:/**</value> </list> </property> </bean> Lucidworks Fusion Lastly, if you’re using Lucidworks Fusion (and you should be), then instead of sending documents directly to Solr, you can send them to a Fusion indexing pipeline using the FusionBoltAction class. FusionBoltAction posts JSON documents to the Fusion proxy which gives you security and the full power of Fusion pipelines for generating Solr documents.

The post Integrating Storm and Solr appeared first on Lucidworks.

Categories: FLOSS Project Planets
Syndicate content