Planet Python |
Mike C. Fletcher: Corner cases do crop up, don't they?
Luke Plant: Async Raven/Sentry client with Django/Python
Sentry really ought to use UDP, not TCP, because you don't want logging functionality to stall or even slow down your main application. At the moment, it doesn't support that, although there have been some promising commits.
For my usage (a web application), this means that you can really only use Sentry for logging exceptions, and not for anything less important.
However, there are some alternatives to UDP that make Sentry usable for more than exceptions. You could use a queue process like Celery or RabbitMQ (apparently what they use at Disqus).
A more light weight alternative, however, is an asynchronous client that does its work in the background, and so doesn't block your web server thread.
There is some hopeful looking code in raven.contrib.async, but unfortunately it currently has a critical bug (Raven 1.4.2).
However, using that code I cobbled together my own, and this one subclasses DjangoClient, which is what I need:
from raven.contrib.django import DjangoClient from raven.contrib.async import AsyncWorker class AsyncDjangoClient(DjangoClient): """ This client uses a single background thread to dispatch errors. """ def __init__(self, *args, **kwargs): self.worker = AsyncWorker() super(AsyncDjangoClient, self).__init__(*args, **kwargs) def send_sync(self, **kwargs): super(AsyncDjangoClient, self).send(**kwargs) def send(self, **kwargs): self.worker.queue.put_nowait((self.send_sync, kwargs))Then you need to set SENTRY_CLIENT in your settings to point to this class.
(If you're not using Django, you should be able to do something similar.)
This is working fine for me - I can now enable the Sentry 404 middleware and not see any slowdown on my app, as opposed to the synchronous client which was slowing down 404 responses massively because my Sentry server is not on the same box as my main web app.
I should say this is use at own risk - the AsyncClient in Raven is undocumented as well as broken, so I don't know if it is considered a sensible approach or not!
Manuel de la Pena Saenz: Python at codemotion.es
As the codemotion.es page states, codemotion is:
Codemotion es el evento que reunirá en España a técnicos, desarrolladores y estudiantes de todas las comunidades y lenguajes. Por primera vez se celebrará en España después de 5 años de éxito en Italia.
Which badly translats to:
Codemotion is the event that will gather technicians, developers and students of different communities and languages. For the first time the event will take place in Spain after 5 years in Italy
Python is within the languages that will take part of the events thanks to the python madrid group. The group will go over the different talk proposals in the next meeting and will try to get you the best talks possible. So if you wanna take part you can do the following
- Join the python madrid group.
- Go to codemotion.es and join the event.
- Spread the word.
Lets try and make python a better know language in Spain!.
Mike C. Fletcher: Want to display MathML? MathJax
Fredrik Håård's Blaag: Using Python to get rid of .doc
I'll be appearing att Software Passion to speak about using Python for protocol specifications, instead of using an external document to write the specification, and then try to implement it from there (or, perhaps more common, implementing it and then trying to keep the document up-to-date).
A while ago at Visual Units, the situation was this: There was a protocol to transfer data over TCP from fleet management black boxes running J2ME to a server running Python, which then stored that data so interesting things could be done with it. Accompanying the protocol was a ever-slightly-out-of-date protocol specification, and a client implementation in Python used for testing the server.
This means that we had four different implementations of the protocol: one in Java, two in Python, and one in English. If one of those was not updated when the others were, the system was no longer consistent, and might break in interesting ways.
Since this created a lot of work for me, I set out to change things. First, I searched for viable existing solutions, but the need to keep the protocol compact (telematics data transfer is expensive), and J2ME support meant I did not find anything to use off the shelf.
Instead I started to implement my own solution, with a vision that I would implement the protocol once, and use it everywhere - Java, Python, and English. In the end, using a couple of hundred of rows of Python, we can now specify a protocol thus:
message = string timestamp = i64 timediff = i32 ping = ("A ping, with a time and message", timestamp, message) pong = ("A pong, with message, timestamp and perceived lag", timestamp, timediff, message)...and from this, we create Java source code for the terminals, the Python clients and servers use it directly when packing and parsing messages, and the documentation for the poor souls who might want to read English instead of Python is generated.
Want to know how this was made possible, see some code, and point and laugh at my miserable attempts that failed? Want to know why meta-classes were absolutely vital - or not? Register for Software Passion where I'll be talking about this - if you use the promontion code 'BLAAG' when registering, you'll even get a 10% discount!
Stefan Scherfke: Designing and Testing PyZMQ Applications – Part 1
ZeroMQ (or ØMQ or ZMQ) is an intelligent messaging framework and described as “sockets on steroids”. That is, they look like normal TCP sockets but actually work as you’d expect sockets to work. PyZMQ adds even more convenience to them, which makes it a really a good choice if you want to implement a distributed application. Another big plus for ØMQ is that you can integrate sub-systems written in C, Java or any other language ØMQ supports (which are a lot).
If you’ve never heard of ØMQ before, I recommend to read ZeroMQ an Introduction by Nicholas Piël, before you go on with this article.
The ØMQ Guide and PyZMQ’s documentation are really good, so you can easily get started. However, when we began to implement a larger application with it (a distributed simulation framework), several questions arose which were not covered by the documentation:
- What’s the best way do design our application?
- How can we keep it readable, flexible and maintainable?
- How do we test it?
I didn’t find something like a best practice article that answered my questions. So in this series of articles, I’m going to talk about what I’ve learned during the last months. I’m not a PyZMQ expert (yet ;-)), but what I’ve done so far works quite well and I never had more tests in a project than I do have now.
You’ll find the source for the examples at bitbucket. They are written in Python 3.2 and tested under Mac OS X Lion, Ubuntu 11.10 and Windows 7, 64 bit in each case. If you have any suggestions or improvements, please fork me or just leave a comment.
In this first article, I’m going to talk a bit about how you could generally design your application to be flexible, maintainable and testable. The second part will be about unit testing and the finally, I’ll cover process and system testing.
Comparison of Different ApproachesThere are basically three possible ways to implement a PyZMQ application. One, that’s easy, but limited in practical use, one that’s more flexible, but not really pythonic and one, that needs a bit more setup, but is flexible and pythonic.
All three examples feature a simple ping process and a pong process with varying complexity. I use multiprocessing to run the pong process, because that’s what you should usually do in real PyZMQ applications (you don’t want to use threads and if both processes are running on the same machine, there’s no need to invoke both of them separately).
All of the examples will have the following output:
(zmq)$ python blocking_recv.py Pong got request: ping 0 Ping got reply: pong 0 ... Pong got request: ping 4 Ping got reply: pong 4Let’s start with the easy one first. You just use on of the socket’s recv methods in a loop:
# blocking_recv.py import multiprocessing import zmq addr = 'tcp://127.0.0.1:5678' def ping(): """Sends ping requests and waits for replies.""" context = zmq.Context() sock = context.socket(zmq.REQ) sock.bind(addr) for i in range(5): sock.send_unicode('ping %s' % i) rep = sock.recv_unicode() # This blocks until we get something print('Ping got reply:', rep) def pong(): """Waits for ping requests and replies with a pong.""" context = zmq.Context() sock = context.socket(zmq.REP) sock.connect(addr) for i in range(5): req = sock.recv_unicode() # This also blocks print('Pong got request:', req) sock.send_unicode('pong %s' % i) if __name__ == '__main__': pong_proc = multiprocessing.Process(target=pong) pong_proc.start() ping() pong_proc.join()So this is very easy and no that much code. The problem with this is, that it only works well if your process only uses one socket. Unfortunately, in larger applications that is rather rarely the case.
A way to handle multiple sockets per process is polling. In addition to your context and socket(s), you need a poller. You also have to tell it which events on which socket you are going to poll:
# polling.py def pong(): """Waits for ping requests and replies with a pong.""" context = zmq.Context() sock = context.socket(zmq.REP) sock.bind(addr) # Create a poller and register the events we want to poll poller = zmq.Poller() poller.register(sock, zmq.POLLIN|zmq.POLLOUT) for i in range(10): # Get all sockets that can do something socks = dict(poller.poll()) # Check if we can receive something if sock in socks and socks[sock] == zmq.POLLIN: req = sock.recv_unicode() print('Pong got request:', req) # Check if we cann send something if sock in socks and socks[sock] == zmq.POLLOUT: sock.send_unicode('pong %s' % (i // 2)) poller.unregister(sock)You see, that our pong function got pretty ugly. You need 10 iterations to do five ping-pongs, because in each iteration you can either send or reply. And each socket you add to your process adds two more if-statements. You could improve that design if you created a base class wrapping the polling loop and just register sockets and callbacks in an inheriting class.
That brings us to our final example. PyZMQ comes with with an adapted Tornado eventloop that handles the polling and works with ZMQStreams, that wrap sockets and add some functionality:
# eventloop.py from zmq.eventloop import ioloop, zmqstream class Pong(multiprocessing.Process): """Waits for ping requests and replies with a pong.""" def __init__(self): super().__init__() self.loop = None self.stream = None self.i = 0 def run(self): """ Initializes the event loop, creates the sockets/streams and starts the (blocking) loop. """ context = zmq.Context() self.loop = ioloop.IOLoop.instance() # This is the event loop sock = context.socket(zmq.REP) sock.bind(addr) # We need to create a stream from our socket and # register a callback for recv events. self.stream = zmqstream.ZMQStream(sock, self.loop) self.stream.on_recv(self.handle_ping) # Start the loop. It runs until we stop it. self.loop.start() def handle_ping(self, msg): """Handles ping requests and sends back a pong.""" # req is a list of byte objects req = msg[0].decode() print('Pong got request:', req) self.stream.send_unicode('pong %s' % self.i) # We’ll stop the loop after 5 pings self.i += 1 if self.i == 5: self.stream.flush() self.loop.stop()This even adds more boilerplate code, but it will pay of if you use more sockets and most of that stuff in run() can be put into a base class. Another drawback is, that the IOLoop only uses recv_multipart(). So you always get a lists of byte strings which you have to decode or deserialize on your own. However, you can use all the send methods socket offers (like send_unicode() or send_json()). You can also stop the loop from within a message handler.
In the next sections, I’ll discuss how you could implement a PyZMQ process that uses the event loop.
Communication DesignBefore you start to implement anything, you should think about what kind of processes you need in your application and which messages they exchange. You should also decide what kind of message format and serialization you want to use.
PyZMQ has built-in support for Unicode (send sends plain C strings which map to Python byte objects, so there’s a separate method to send Unicode strings), JSON and Pickle.
JSON is nice, because it’s fast and lets you integrate processes written in other languages into you application. It’s also a bit safer, because you cannot receive arbitrary objects as with pickle. The most straightforward syntax for JSON messages is to let them be triples [msg_type, args, kwargs], where msg_type maps to a method name and args and kwargs get passed as positional and keyword arguments.
I strongly recommend you to document each chain of messages your application sends to perform a certain task. I do this with fancy PowerPoint graphics and with even fancier ASCII art in Sphinx. Here is how I would document our ping-pong:
Sending pings ------------- * If the ping process sends a *ping*, the pong processes responds with a *pong*. * The number of pings (and pongs) is counted. The current ping count is sent with each message. :: PingProc PongProc [REQ] ---1--> [REP] <--2--- 1 IN : ['ping, count'] 1 OUT: ['ping, count'] 2 IN : ['pong, count'] 2 OUT: ['pong, count']First, I write some bullet points that explain how the processes behave and why they behave this way. This is followed by some kind of sequence diagram that shows when which process sents which message using which socket type. Finally, I write down how the messages are looking. # IN is what you would pass to send_multipart and # OUT is, what is received on the other side by recv_multipart. If one of the participating sockets is a ROUTER or DEALER, IN and OUT will differ (though that’s not the case in this example). Everything in single quotation marks (') represents a JSON serialized list.
If our pong process used a ROUTER socket instead of the REP socket, it would look like this:
1 IN : ['ping, count'] 1 OUT: [ping_uuid, '', 'ping, count'] 2 IN : [ping_uuid, '', 'pong, count'] 2 OUT: ['pong, count']This seems like a lot of tedious work, but trust me, it really helps a lot when you need to change something a few weeks later!
Application DesignIn the examples above, the Pong process was responsible for setting everything up, for receiving/sending messages and for the actual application logic (counting incoming pings and creating a pong).
Obviously, this is not a very good design. What we can do about this is to put most of that nasty setup stuff into a base class which all your processes can inherit from, and to put all the actual application logic into a separate (PyZMQ independent) class.
ZmqPocess – The Base Class for all ProcessesThe base class basically implements two things:
- a setup method that creates a context an a loop
- a stream factory method for streams with a on_recv callback. It creates a socket and can connect/bind it to a given address or bind it to a random port (that’s why it returns the port number in addition to the stream itself).
It also inherits multiprocessing.Process so that it is easier to spawn it as sub-process. Of course, you can also just call its run() method from you main().
# zmqproc.py import multiprocessing from zmq.eventloop import ioloop, zmqstream import zmq class ZmqProcess(multiprocessing.Process): """ This is the base for all processes and offers utility functions for setup and creating new streams. """ def __init__(self): super().__init__() self.context = None """The ØMQ :class:`~zmq.Context` instance.""" self.loop = None """PyZMQ's event loop (:class:`~zmq.eventloop.ioloop.IOLoop`).""" def setup(self): """ Creates a :attr:`context` and an event :attr:`loop` for the process. """ self.context = zmq.Context() self.loop = ioloop.IOLoop.instance() def stream(self, sock_type, addr, bind, callback=None, subscribe=b''): """ Creates a :class:`~zmq.eventloop.zmqstream.ZMQStream`. :param sock_type: The ØMQ socket type (e.g. ``zmq.REQ``) :param addr: Address to bind or connect to formatted as *host:port*, *(host, port)* or *host* (bind to random port). If *bind* is ``True``, *host* may be: - the wild-card ``*``, meaning all available interfaces, - the primary IPv4 address assigned to the interface, in its numeric representation or - the interface name as defined by the operating system. If *bind* is ``False``, *host* may be: - the DNS name of the peer or - the IPv4 address of the peer, in its numeric representation. If *addr* is just a host name without a port and *bind* is ``True``, the socket will be bound to a random port. :param bind: Binds to *addr* if ``True`` or tries to connect to it otherwise. :param callback: A callback for :meth:`~zmq.eventloop.zmqstream.ZMQStream.on_recv`, optional :param subscribe: Subscription pattern for *SUB* sockets, optional, defaults to ``b''``. :returns: A tuple containg the stream and the port number. """ sock = self.context.socket(sock_type) # addr may be 'host:port' or ('host', port) if isinstance(addr, str): addr = addr.split(':') host, port = addr if len(addr) == 2 else (addr[0], None) # Bind/connect the socket if bind: if port: sock.bind('tcp://%s:%s' % (host, port)) else: port = sock.bind_to_random_port('tcp://%s' % host) else: sock.connect('tcp://%s:%s' % (host, port)) # Add a default subscription for SUB sockets if sock_type == zmq.SUB: sock.setsockopt(zmq.SUBSCRIBE, subscribe) # Create the stream and add the callback stream = zmqstream.ZMQStream(sock, self.loop) if callback: stream.on_recv(callback) return stream, int(port) PongProc – The Actual ProcessThe PongProc inherits ZmqProcess and is the main class for our process. It creates the streams, starts the event loop and dispatches all messages to the appropriate handlers:
# pongproc.py from zmq.utils import jsonapi as json import zmq import zmqproc host = '127.0.0.1' port = 5678 class PongProc(zmqproc.ZmqProcess): """ Main processes for the Ponger. It handles ping requests and sends back a pong. """ def __init__(self, bind_addr): super().__init__() self.bind_addr = bind_addr self.rep_stream = None # Make sure this is pickle-able (e.g., not using threads) # or it won't work on Windows. If it's not pickle-able, instantiate # it in setup(). self.ping_handler = PingHandler() def setup(self): """Sets up PyZMQ and creates all streams.""" super().setup() self.rep_stream, _ = self.stream(zmq.REP, self.bind_addr, bind=True, callback=self.handle_rep_stream) def run(self): """Sets up everything and starts the event loop.""" self.setup() self.loop.start() def stop(self): """Stops the event loop.""" self.loop.stop() def handle_rep_stream(self, msg): """ Handles messages from a Pinger: *ping* Send back a pong. *plzdiekthxbye* Stop the ioloop and exit. """ msg_type, data = json.loads(msg[0]) if msg_type == 'ping': rep = self.ping_handler.make_pong(data) self.rep_stream.send_json(rep) elif msg_type == 'plzdiekthxbye': self.stop() else: raise RuntimeError('Received unkown message type: %s' % msg_type)There are a couple of things to note here:
I instantiated the PingHandler in the process’ __init__ method. If you are going to start this process as a sub-process via start, make sure everything you instantiate in __init__ is pickle-able or it won’t work on Windows (Linux and Mac OS X use fork to create a sub-process and fork just makes a copy of the main process and gives it a new process ID. On Windows, there is no fork and the context of your main process is pickled and sent to the sub-process).
In setup, call super().setup() before you create a stream or you won’t have a loop instance for them. You don’t call setup in the process’ __init__, because the context must be created within the new system process. So we call setup in run.
The stop method is not really necessary in this example, but it can be used to send stop messages to sub-processes when the main process terminates and to do other kinds of clean-up. You can also execute it if you except a KeyboardInterrupt after calling run.
handle_rep_stream is the message dispatcher for the process’ REP stream. It parses the message and calls the appropriate handler for that message (or raises an error if the message type is invalid). If your if and elif statements all do the same, you might consider replacing them with a dict that contains the handlers for each message type:
handlers = { 'msg': self.handler_for_msg, } try: rep = handlers[msg_type](data) self.rep_stream.send_multipart(rep) except KeyError: raise RuntimeError('Received unknown message.')
The PingHandler contains the actual application logic (which is not much, in this example). The make_pong method just gets the number of pings sent with the ping message and creates a new pong message. The serialization is done by PongProc, so our Handler does not depend on PyZMQ:
class PingHandler(object): def make_pong(self, num_pings): """Creates and returns a pong message.""" print('Pong got request number %s' % num_pings) return ['pong', num_pings] SummaryOkay, that’s it for now. I showed you three ways to use PyZMQ. If you have a very simple process with only one socket, you can easily use its blocking recv methods. If you need more than one socket, I recommend using the event loop. And polling … you don’t want to use that.
If you decide to use PyZMQ’s event loop, you should separate the application logic from all the PyZMQ stuff (like creating streams, sending/receiving messages and dispatching them). If your application consists of more then one process (which is usually the case), you should also create a base class with shared functionality for them.
In the next part, I’m going to talk about how you can test your application.
Stefan Scherfke: Designing and Testing PyZMQ Applications – Part 2
This is the second part of the series Designing and Testing PyZMQ Applications. In the first part, I wrote about designing a PyZMQ application, so this time it’s all about (unit) testing (remember, if it’s not tested, it’s broken). I also updated the repository for this article with the new code examples.
My favorite testing tools are pytest by Holger Krekel and Mock by Michael Ford. Pytest is particularly awesome because of its re-evaluation of assert statements. If your test contains an assert spam == 'eggs' and the assert fails, pytest re-evaluates it and prints the value of spam. Really helpful and you don’t need any boilerplate code for that. Mock is really nice for mocking external dependencies and asserting that your code called them in the correct way.
If you cloned the repository for this article, just run py.test from its root directory:
$ pip install pytest mock ... Successfully installed pytest mock Cleaning up... $ py.test =================== test session starts ==================== platform darwin -- Python 3.2.2 -- pytest-2.2.3 collected 11 items test/test_pongproc.py ....... test/test_zmqproc.py .... ================ 11 passed in 0.12 seconds ================= Unit TestingThe probability that PyZMQ works correctly is very high. The probability that your code will call a PyZMQ function in such a way that it blocks forever and halts your test runner is also very high. Therefore, it’s a good idea to mock everything PyZMQ-related for your unit tests. And since your application logic might also not be implemented when you start testing your process, you should mock that, too.
What you’ll actually end up testing is the following:
- Does your message handler call your application logic in the right way given a certain input message?
- Does your message handler create and send the correct reply based on the return value of your application logic?
Let’s start with ZmqProcess again. After all, everything else depends on it. Testing its setup method is easy. We just check that it creates a context and a loop:
# test/test_zmqproc.py from zmq.eventloop import ioloop import mock import pytest import zmq import zmqproc class TestZmqProcess(object): """Tests for :class:`zmqproc.ZmqProcess`.""" def test_setup(self): zp = zmqproc.ZmqProcess() zp.setup() assert isinstance(zp.context, zmq.Context) assert isinstance(zp.loop, ioloop.IOLoop)Testing stream is more complicated. We need to test if it can handle various address formats, if it creates or binds correctly and if it performs a default subscription for SUB sockets.
Pytest 2.2 introduced a parametrize decorator, that helps calling a test multiple times with varying inputs. You just define one or more arguments for your test function and a list of values for these arguments. For test_stream, I only need a kwargs parameter containing the parameters for the stream call:
# test/test_zmqproc.py @pytest.mark.parametrize('kwargs', [ dict(sock_type=23, addr='127.0.0.1:1234', bind=True, callback=mock.Mock()), dict(sock_type=23, addr='127.0.0.1', bind=True, callback=mock.Mock()), dict(sock_type=zmq.SUB, addr=('localhost', 1234), bind=False, callback=mock.Mock(), subscribe=b'ohai'), ]) def test_stream(self, kwargs):The next step is to create an instance of ZmqProcess and patch some of its attributes. We also need to set a defined return value for the socket’s bind_to_random_port method:
# test/test_zmqproc.py zp = zmqproc.ZmqProcess() # Patch the ZmqProcess instance zp.context = mock.Mock(spec_set=zmq.Context) zp.loop = mock.Mock(spec_set=ioloop.IOLoop) sock_mock = zp.context.socket.return_value sock_mock.bind_to_random_port.return_value = 42For the actual test, we also need to patch ZMQStream. Although mock.patch could work as a function decorator, we need to use it as context processor if we also uses pytest funcargs (e.g., via the parametrize decorator—I don’t know if it’s even possible to uses both, mock.patch as decorator and pytest funcargs in one test).
# test/test_zmqproc.py # Patch ZMQStream and start testing with mock.patch('zmq.eventloop.zmqstream.ZMQStream') as zmqstream_mock: stream, port = zp.stream(**kwargs)Finally, we can check the return values of our stream method and it made the correct calls to create the stream:
# test/test_zmqproc.py # Assert that the return values are correct assert stream is zmqstream_mock.return_value if isinstance(kwargs['addr'], tuple): assert port == kwargs['addr'][1] elif ':' in kwargs['addr']: assert port == int(kwargs['addr'][-4:]) else: assert port == sock_mock.bind_to_random_port.return_value # Check that the socket was crated correctly assert zp.context.socket.call_args == ((kwargs['sock_type'],), {}) if kwargs['bind'] and ':' in kwargs['addr']: assert sock_mock.bind.call_args == ( ('tcp://%s' % kwargs['addr'],), {}) elif kwargs['bind']: assert sock_mock.bind_to_random_port.call_args == ( ('tcp://%s' % kwargs['addr'],), {}) else: assert sock_mock.connect.call_args == ( ('tcp://%s:%s' % kwargs['addr'],), {}) # Check creation of the stream assert zmqstream_mock.call_args == ((sock_mock, zp.loop), {}) assert zmqstream_mock.return_value.on_recv.call_args == ( (kwargs['callback'],), {}) # Check default subscribtion if 'subscribe' in kwargs: assert sock_mock.setsockopt.call_args == ( (zmq.SUBSCRIBE, kwargs['subscribe']), {})Note: You may have noticed that I use assert my_mock.call_args == ... rather than my_mock.assert_called_with(...). The reason for that is simply, that assert statements are highlighted but ordinary function calls are not. This makes it easier for me to find all assertions in a test.
PongProcTesting the PongProc is not much different from testing its base class. pytest_funcarg__pp will instantiate a PongProc instance for each test that has a pp argument. The tests for setup, run and stop are easy to do. We create a few mocks and then ask them if the tested function called them correctly:
# test/test_pongproc.py from zmq.utils import jsonapi as json import mock, pytest, zmq import pongproc host, port = '127.0.0.1', 5678 def pytest_funcarg__pp(request): """Creates a PongProc instance.""" return pongproc.PongProc((host, port)) class TestPongProc(object): """Tests :class:`pongproc.PongProc`.""" def test_setup(self, pp): pp.stream = mock.Mock(side_effect=lambda *a, **k: (a[0], mock.Mock())) with mock.patch('zmqproc.ZmqProcess.setup') as setup_mock: pp.setup() assert setup_mock.call_count == 1 # Assert that all streams were created assert pp.stream.call_args_list == [ ((zmq.REP, (host, port)), dict(bind=True, callback=pp.handle_rep_stream)), ] assert pp.rep_stream == zmq.REP def test_run(self, pp): pp.setup = mock.Mock() pp.loop = mock.Mock() pp.run() assert pp.setup.call_count == 1 assert pp.loop.start.call_count == 1 def test_stop(self, pp): pp.loop = mock.Mock() pp.stop() assert pp.loop.stop.call_count == 1The callbacks for streams (e.g., PongProc.handle_rep_stream in our case) can get a bit more complicated, so I’ve split the test up in one test per message type plus one extra test that checks if invalid messages are handled correctly. If all your callbacks behave the same in that case (e.g., they all raise an error or just print something), you can handle them with one test case and the parametrize decorator:
# test/test_pongproc.py @pytest.mark.parametrize(('handler', 'msg'), [ ('handle_rep_stream', ['["spam", []]']), # You can add more handlers here ]) def test_handle_bad_msg(self, pp, handler, msg): pytest.raises(RuntimeError, getattr(pp, handler), msg)Testing if stop and ping messages are handled correctly is now straightforward. We perform some mocking (for the application logic and the stream that sends the reply), pass our message to the handler and then just check if it did the right things right:
# test/test_pongproc.py def test_stop_msg(self, pp): pp.stop = mock.Mock() pp.handle_rep_stream([b'["plzdiekthxbye", null]']) assert pp.stop.call_count == 1 def test_ping(self, pp): msg = ['ping', 1] # Input message retval = 'spam' # Return value for PingHandler.make_pong pp.ping_handler = mock.Mock(spec_set=pongproc.PingHandler) pp.ping_handler.make_pong.return_value = retval pp.rep_stream = mock.Mock() pp.handle_rep_stream([json.dumps(msg)]) assert pp.ping_handler.make_pong.call_args == ((msg[1],), {}) assert pp.rep_stream.send_json.call_args == ((retval,), {}) PingHandlerWhen we are done with all that network stuff, we can finally test the application logic. Easy-peasy in our case:
# test/test_pongproc.py def pytest_funcarg__ph(request): """Creates a PingHandler instance.""" return pongproc.PingHandler() class TestPingHandler(object): def test_make_pong(self, ph): ping_num = 23 ret = ph.make_pong(ping_num) assert ret == ['pong', ping_num] SummaryThanks to the Mock library, unit testing PyZMQ apps is really not that hard and not much different from normal unit testing. However, what we know now is only, that our process should work in theory. We haven’t yet started it and sent real messages to it.
The next and final part of this series will show you how you can automate testing complete processes. Until then, you should get your test coverage up to 100% to protect yourself from nasty surprises when you start with process testing.
Manuel de la Pena Saenz: Get/Set proxy settings in Gnome with GObject instrospection
With GObject introspection is very simple to set the settings of your system trough python. Fist, lets use the command line to find out our current settings:
gsettings list-recursively org.gnome.system.proxyThe following script allows you to retrieve the http proxy settings that you are currently using:
from gi.repository import Gio def get_settings(): """Get proxy settings.""" http_settings = Gio.Settings.new('org.gnome.system.proxy.http') host = http_settings.get_string('host') port = http_settings.get_int('port') if http_settings.get_boolean('use-authentication'): username = http_settings.get_string('authentication_user') password = http_settings.get_string('authentication_password') else: username = password = None return host, port, username, passwordSetting them is as easy as getting them:
from gi.repository import Gio def set_settings(host, port, username=None, password=None): """Set proxy settings.""" http_settings = Gio.Settings.new('org.gnome.system.proxy.http') http_settings.set_string('host', host) http_settings.set_int('port', port) if username is not None: http_settings.set_boolean('use-authentication', True) http_settings.set_string('authentication_user', username) http_settings.set_string('authentication_password', password)This is not utterly complicated but I’m notice that there are not many examples out there, so there you go. There is no code there that can be considered hard but I’d like to point out that if you use the get_value method from the Settings object you will have to call the appropriate get_* method from the returned GVariant, that is:
host = http_settings.get_string('host')is equal to the following:
host = http_settings.get_value('host').get_string()PyPy Development: Comparing Partial Evaluation and Tracing, Part 1
As part of writing my PhD I am currently thinking about the relationship between PyPy's meta-tracing approach with various previous ideas to automatically get a (JIT-)compiler from only an interpreter of a language. One of the most-researched ideas along these lines is that of partial evaluation. Partial evaluation has basically the same goals as PyPy when it comes to compilers: Write an interpreter, and get a compiler for free. The methods for reaching that goal are a bit different. In this series of blog posts, I am trying to explore the similarities and differences of partial evaluation and PyPy's meta-tracing.
A Flowgraph LanguageTo be able to clearly understand what "partial evaluation" is and what "meta-tracing" is I will show an "executable model" of both. To that end, I am defining a small imperative language and will then show what a partial evaluator and a tracer for that language look like. All this code will be implemented in Prolog. (Any pattern-matching functional language would do, but I happen to know Prolog best. Backtracking is not used, so you can read things simply as functional programs.) In this post I will start with the definition of the language, and a partial evaluator for it. The code written in this blog post can be found fully here: http://paste.pocoo.org/show/541004/
The language is conceptionally similar to PyPy's flow graphs, but a bit more restricted. It does not have function calls, only labelled basic blocks that consist of a series of linearly executed operations, followed by a conditional or an unconditional jump. Every operation is assigning a value to a variable, which is computed by applying some operation to some arguments.
A simple program to raise x to the yth power in that language looks like this:
power: res = 1 if y goto power_rec else goto power_done power_rec: res = res * x y = y - 1 if y goto power_rec else goto power_done power_done: print_and_stop(res)To represent the same program as Prolog data structures, we use the following Prolog code:
block(power, op1(res, same, const(1), if(y, power_rec, power_done))). block(power_rec, op2(res, mul, var(res), var(x), op2(y, sub, var(y), const(1), if(y, power_rec, power_done)))). block(power_done, print_and_stop(var(res))).Every rule of block declares one block by first giving the label of the block, followed by the code. Code is a series of op1 or op2 statements terminated by a jump, an if or a print_and_stop. op1 statements are operations with one argument of the form op1(res_variable, operation_name, argument, next_statement). Arguments can be either variables in the form var(name) or constants in the form const(value).
To run programs in this flowgraph language, we first need some helper functionality. The first few helper functions are concerned with the handling of environments, the data structures the interpreter uses to map variable names occuring in the program to the variables' current values. In Python dictionaries would be used for this purpose, but in Prolog we have to emulate these by lists of key/value pairs (not very efficient, but good enough):
lookup(X, [], _) :- throw(key_not_found(X)). lookup(Key, [Key/Value | _], Value) :- !. lookup(Key, [_ | Rest], Value) :- lookup(Key, Rest, Value). write_env([], X, V, [X/V]). write_env([Key/_ | Rest], Key, Value, [Key/Value | Rest]) :- !. write_env([Pair | Rest], Key, Value, [Pair | NewRest]) :- write_env(Rest, Key, Value, NewRest). remove_env([], _, []). remove_env([Key/_ | Rest], Key, Rest) :- !. remove_env([Pair | Rest], Key, [Pair | NewRest]) :- remove_env(Rest, Key, NewRest). resolve(const(X), _, X). resolve(var(X), Env, Y) :- lookup(X, Env, Y).The implementation of these functions is not too important. The lookup function finds a key in an environment list, the write_env function adds a new key/value pair to an environment, remove_env removes a key. The resolve function is used to take either a constant or a variable and return a value. If it's a constant, the value of that constant is returned, if it's a variable it is looked up in the environment. Note how the last argument of lookup and resolve is actually a return value, which is the typical approach in Prolog.
So far we have not specified what the primitive operations that can occur in the program actually mean. For that we define a do_op function which executes primitive operations:
do_op(same, X, X). do_op(mul, X, Y, Z) :- Z is X * Y. do_op(add, X, Y, Z) :- Z is X + Y. do_op(sub, X, Y, Z) :- Z is X - Y. do_op(eq, X, Y, Z) :- X == Y -> Z = 1; Z = 0. do_op(ge, X, Y, Z) :- X >= Y -> Z = 1; Z = 0. do_op(readlist, L, I, X) :- nth0(I, L, X). do_op(Op, _, _, _) :- throw(missing_op(Op)).Again the last argument is an output variable.
Now we can start executing simple operations. For that an interp predicate is defined. It takes as its first argument the current environment and as the second argument the operation to execute. E.g. to execute primitive operations with one or two arguments:
interp(op1(ResultVar, Op, Arg, Rest), Env) :- resolve(Arg, Env, RArg), do_op(Op, RArg, Res), write_env(Env, ResultVar, Res, NEnv), interp(Rest, NEnv). interp(op2(ResultVar, Op, Arg1, Arg2, Rest), Env) :- resolve(Arg1, Env, RArg1), resolve(Arg2, Env, RArg2), do_op(Op, RArg1, RArg2, Res), write_env(Env, ResultVar, Res, NEnv), interp(Rest, NEnv).First the arguments are resolved into values. Afterwards the operation is executed, and the result is written back into the environment. Then interp is called on the rest of the program. Similarly easy are the unconditional jump and print_and_stop:
interp(jump(L), Env) :- block(L, Block), interp(Block, Env). interp(print_and_stop(Arg), Env) :- resolve(Arg, Env, Val), print(Val), nl.In the unconditional jump we simply get the target block and continue executing that. To execute print_and_stop we resolve the argument, print the value and then are done.
The conditional jump is only slightly more difficult:
interp(if(V, L1, L2), Env) :- lookup(V, Env, Val), (Val == 0 -> block(L2, Block) ; block(L1, Block) ), interp(Block, Env).First the variable is looked up in the environment. If the variable is zero, execution continues at the second block, otherwise it continues at the first block.
Given this interpreter, we can execute the above example program like this, on a Prolog console:
$ swipl -s cfglang.pl ?- block(power, Block), interp(Block, [x/10, y/10]). 10000000000 Partial Evaluation of the Flowgraph LanguageLet's look at what a partial evaluator for this simple flowgraph language would look like. Partial evaluation (PE), also called specialization, is a program manipuation technique. PE takes an input program and transforms it into a (hopefully) simpler and faster output program. It does this by assuming that some variables in the input program are constants. All operations that act only on such constants can be folded away. All other operations need to remain in the output program (called residual program). Thus the partial evaluator proceeds much like an interpreter, just that it cannot actually execute some operations. Also, its output is not just a value, but also list of remaining operations that could not be optimized away.
The partial evaluator cannot use normal environments, because unlike the interpreter not all variables' values are known to it. It will therefore work on partial environments, which store just the know variables. For these partial environments, some new helper functions are needed:
plookup(Key, [], var(Key)). plookup(Key, [Key/Value | _], const(Value)) :- !. plookup(Key, [_ | Rest], Value) :- plookup(Key, Rest, Value). presolve(const(X), _, const(X)). presolve(var(V), PEnv, X) :- plookup(V, PEnv, X).The function plookup takes a variable and a partial environment and returns either const(Value) if the variable is found in the partial environment or var(Key) if it is not. Equivalently, presolve is like resolve, except that it uses plookup instead of lookup.
With these helpers we can start writing a partial evaluator. The following two rules are where the main optimization in the form of constant folding happens. The idea is that when the partial evaluator sees an operation that involves only constant arguments, it can constant-fold the operation, otherwise it can't:
pe(op1(ResultVar, Op, Arg, Rest), PEnv, NewOp) :- presolve(Arg, PEnv, RArg), (RArg = const(C) -> do_op(Op, C, Res), write_env(PEnv, ResultVar, Res, NEnv), RestResidual = NewOp ; remove_env(PEnv, ResultVar, NEnv), NewOp = op1(ResultVar, Op, RArg, RestResidual) ), pe(Rest, NEnv, RestResidual). pe(op2(ResultVar, Op, Arg1, Arg2, Rest), PEnv, NewOp) :- presolve(Arg1, PEnv, RArg1), presolve(Arg2, PEnv, RArg2), (RArg1 = const(C1), RArg2 = const(C2) -> do_op(Op, C1, C2, Res), write_env(PEnv, ResultVar, Res, NEnv), RestResidual = NewOp ; remove_env(PEnv, ResultVar, NEnv), NewOp = op2(ResultVar, Op, RArg1, RArg2, RestResidual) ), pe(Rest, NEnv, RestResidual).The pe predicate takes a partial environment, the current operations and potentially returns a new operation. To partially evaluate a simple operation, its arguments are looked up in the partial environment. If all the arguments are constants, the operation can be executed, and no new operation is produced. Otherwise, we need to produce a new residual operation which is exactly like the one currently looked at. Also, the result variable needs to be removed from the partial environment, because it was just overwritten by an unknown value.
The potentially generated residual operation is stored into the output argument NewOp. The output argument of the recursive call is the last argument of the newly created residual operation, which will then be filled by the recursive call. This is a typical approach in Prolog, but may look strange if you are not familiar with it.
Note how the first case of these two rules is just like interpretation. The second case doesn't really do anything, it just produces a residual operation. This relationship between normal evaluation and partial evaluation is very typical.
The unconditional jump and print_and_stop are not much more complex:
pe(jump(L), PEnv, jump(LR)) :- do_pe(L, PEnv, LR). pe(print_and_stop(Arg), Env, print_and_stop(RArg)) :- presolve(Arg, Env, RArg).To partially evaluate an unconditional jump we again produce a jump. The target label of that residual jump is computed by asking the partial evaluator to produce residual code for the label L with the given partial environment. print_and_stop is simply turned into a print_and_stop. We will see the code for do_pe soon.
Conditional jumps are more interesting:
pe(if(V, L1, L2), PEnv, NewOp) :- plookup(V, PEnv, Val), (Val = const(C) -> (C = 0 -> L = L2 ; L = L1 ), do_pe(L, PEnv, LR), NewOp = jump(LR) ; do_pe(L1, PEnv, L1R), do_pe(L2, PEnv, L2R), NewOp = if(V, L1R, L2R) ).First we look up the value of the condition variable. If it is a constant, we can produce better code, because we know statically that only one path is reachable. Thus we produce code for that path, and then emit an unconditional jump there. If the condition variable is not known at partial evaluation time, we need to partially evaluate both paths and produce a conditional jump in the residual code.
This rule is the one that causes the partial evaluator to potentially do much more work than the interpreter, because after an if sometimes both paths need to be explored. In the worst case this process never stops, so a real partial evaluator would need to ensure somehow that it terminates. There are many algorithms for doing that, but I will ignore this problem here.
Now we need to understand what the do_pe predicate is doing. Its most important task is to make sure that we don't do the same work twice by memoizing code that was already partially evaluated in the past. For that it keeps a mapping of Label, Partial Environment to Label of the residual code:
do_pe(L, PEnv, LR) :- (code_cache(L, PEnv, LR) -> true ; gensym(L, LR), assert(code_cache(L, PEnv, LR)), block(L, Code), pe(Code, PEnv, Residual), assert(block(LR, Residual)) ).If the code cache indicates that label L was already partially evaluated with partial environment PEnv, then the previous residual code label LPrevious is returned. Otherwise, a new label is generated with gensym, the code cache is informed of that new label with assert, then the block is partially evaluated and the residual code is added to the database.
For those who know partial evaluation terminology: This partial evaluator is a polyvariant online partial evaluator. "Polyvariant" means that for every label, several specialized version of the block can be generated. "Online" means that no preprocessing is done before the partial evaluator runs.
Partial Evaluation ExampleWith this code we can look at the classical example of partial evaluation (it's probably the "Hello World" of partial evaluation). We can ask the partial evaluator to compute a power function, where the exponent y is a fixed number, e.g. 5, and the base x is unknown:
?- do_pe(power, [y/5], LR). LR = power1.To find out which code was produced, we can use listing:
?- listing(code_cache) code_cache(power, [y/5], power1). code_cache(power_rec, [y/5, res/1], power_rec1). code_cache(power_rec, [y/4], power_rec2). code_cache(power_rec, [y/3], power_rec3). code_cache(power_rec, [y/2], power_rec4). code_cache(power_rec, [y/1], power_rec5). code_cache(power_done, [y/0], power_done1). ?- listing(block) .... the block definition of the user program .... block(power_done1, print_and_stop(var(res))). block(power_rec5, op2(res, mul, var(res), var(x), jump(power_done1))). block(power_rec4, op2(res, mul, var(res), var(x), jump(power_rec5))). block(power_rec3, op2(res, mul, var(res), var(x), jump(power_rec4))). block(power_rec2, op2(res, mul, var(res), var(x), jump(power_rec3))). block(power_rec1, op2(res, mul, const(1), var(x), jump(power_rec2))). block(power1, jump(power_rec1)).The code_cache tells which residual labels correspond to which original labels under which partial environments. Thus, power1 contains the code of power under the assumption that y is 5. Looking at the block listing, the label power1 corresponds to code that simply multiplies res by x five times without using the variable x at all. The loop that was present in the original program has been fully unrolled, the loop variable y has disappeared. Hopefully this is faster than the original program.
ConclusionIn this blog post we saw an interpreter for a simple flow graph language in Prolog, together with a partial evaluator for it. The partial evaluator essentially duplicates every rule of the interpreter. If all the arguments of the current operation are known, it acts like the interpreter, otherwise it simply copies the operation into the residual code.
Partial evaluation can be used for a variety of applications, but the most commonly cited one is that of applying it to an interpreter. To do that, the program that the interpreter runs is assumed to be constant by the partial evaluator. Thus a specialized version of the interpreter is produced that does not use the input program at all. That residual code can be seen as a compiled version of the input program.
In the next blog post in this series we will look at writing a simple tracer for the same flowgraph language.
PyPy Development: Optimizing Traces of the Flow Graph Language
This is the third blog post in a series about comparing partial evaluation and tracing. In the first post of the series I introduced a small flow-graph language together with an interpreter for it. Then I showed a partial evaluator for the language. In the second post of the series I showed how a tracer for the same language works and how it relates to both execution and to partial evaluation. Then I added support for promotion to that tracer.
In this post I will show how to optimize the traces that are produced by the tracer and compare the structure of the optimizer to that of partial evaluation.
The code from this post can be found here: http://paste.pocoo.org/show/547304/
Optimizing TracesIn the last post we saw how to produce a linear trace with guards by interpreting a control flow graph program in a special mode. A trace always end with a loop statement, which jumps to the beginning. The tracer is just logging the operations that are done while interpreting, so the trace can contain superfluous operations. On the other hand, the trace also contains some of the runtime values through promotions and some decisions made on them which can be exploited by optimization. An example for this is the trace produced by the promotion example from the last post:
op2(c,ge,var(i),const(0), guard_true(c,[],l_done, guard_value(x,5,[],b2, op2(x2,mul,var(x),const(2), op2(x3,add,var(x2),const(1), op2(i,sub,var(i),var(x3), loop))))))After the guard_value(x, 5, ...) operation, x is know to be 5: If it isn't 5, execution falls back to the interpreter. Therefore, operations on x after the guard can be constant-folded. To do that sort of constant-folding, an extra optimization step is needed. That optimization step walks along the trace, remembers which variables are constants and what their values are using a partial environment. The opimizer removes operations that have only constant arguments and leaves the others in the trace. This process is actually remarkably similar to partial evaluation: Some variables are known to be constants, operations on only constant arguments are optimized away, the rest remains.
The code for optimizing operations looks as follows:
optimize(op1(ResultVar, Op, Arg, Rest), PEnv, NewOp) :- presolve(Arg, PEnv, RArg), (RArg = const(C) -> do_op(Op, C, Res), write_env(PEnv, ResultVar, Res, NEnv), NewOp = RestResidual ; remove_env(PEnv, ResultVar, NEnv), NewOp = op1(ResultVar, Op, RArg, RestResidual) ), optimize(Rest, NEnv, RestResidual). optimize(op2(ResultVar, Op, Arg1, Arg2, Rest), PEnv, NewOp) :- presolve(Arg1, PEnv, RArg1), presolve(Arg2, PEnv, RArg2), (RArg1 = const(C1), RArg2 = const(C2) -> do_op(Op, C1, C2, Res), write_env(PEnv, ResultVar, Res, NEnv), NewOp = RestResidual ; remove_env(PEnv, ResultVar, NEnv), NewOp = op2(ResultVar, Op, RArg1, RArg2, RestResidual) ), optimize(Rest, NEnv, RestResidual).Just like partial evaluation! It even reuses the helper functions presolve from the partial evaluator and a partial environment PEnv. When the arguments of the operation are known constants in the partial environment, the operation can be executed at optimization time and removed from the trace. Otherwise, the operation has to stay in the output trace. The result variable (as in the partial evaluator) needs to be removed from the partial environment, because it was just overwritten by an unknown result.
Now we need to deal with guards in the trace.
optimize(guard_true(V, [], L, Rest), PEnv, NewOp) :- plookup(V, PEnv, Val), (Val = const(C) -> NewOp = RestResidual ; NewOp = guard_true(V, PEnv, L, RestResidual) ), optimize(Rest, PEnv, RestResidual). optimize(guard_false(V, [], L, Rest), PEnv, NewOp) :- plookup(V, PEnv, Val), (Val = const(C) -> NewOp = RestResidual, NEnv = PEnv ; write_env(PEnv, V, 0, NEnv), NewOp = guard_false(V, PEnv, L, RestResidual) ), optimize(Rest, NEnv, RestResidual).When the variable that is being guarded is actually known to be a constant, we can remove the guard. Note that it is not possible that the guard of that constant fails: The tracer recorded the operation while running with real values, therefore the guards have to succeed for values the optimizer discovers to be constant.
guard_false is slightly different from guard_true: after the former we know that the argument is actually 0. After guard_true we only know that it is not equal to zero, but not which precise value it has.
Another point to note in the optimization of guards is that the second argument of the guard operation, which was so far always just an empty list, is now replaced by the partial environment PEnv. I will discuss further down why this is needed.
Optimizing guard_value is very similar, except that it really gives precise information about the variable involved:
optimize(guard_value(V, C, [], L, Rest), PEnv, NewOp) :- plookup(V, PEnv, Val), (Val = const(C1) -> NewOp = RestResidual, NEnv = PEnv ; write_env(PEnv, V, C, NEnv), NewOp = guard_value(V, C, PEnv, L, RestResidual) ), optimize(Rest, NEnv, RestResidual).This operation is the main way how the optimizer gains constant variables that it then exploits to do constant-folding on later operations. This is a chief difference from partial evaluation: There the optimizer knows the value of some variables from the start. When optimizing traces, at the beginning the value of no variable is known. Knowledge about some variables is only later gained through guards.
Now we are missing what happens with the loop statement. In principle, it is turned into a loop statement again. However, at the loop statement a few additional operations need to be emitted. The reason is that we optimized away operations and thus assignments when the result value of the variable was a constant. That means the involved variable still potentially has some older value. The next iteration of the loop would continue with this older value, which is obviously wrong. Therefore we need to emit some assignments before the loop statement, one per entry in the partial environment:
optimize(loop, PEnv, T) :- generate_assignments(PEnv, T). generate_assignments([], loop). generate_assignments([Var/Val | Tail], op1(Var, same, const(Val), T)) :- generate_assignments(Tail, T).As an example of how generate_assignments assignments works, let's look at the following example. When the partial environment is, [x/5, y/10] the following assignments are generated:
?- generate_assignments([x/5, y/10], Out). Out = op1(x, same, const(5), op1(y, same, const(10), loop)).That's all the code of the optimizer. While the basic structure is quite similar to partial evaluation, it's a lot less complex as well. What made the partial evaluator hard was that it needs to deal with control flow statements and with making sure that code is reused if the same block is partially evaluated with the same constants. Here, all these complexities go away. The tracer has already removed all control flow and replaced it with guards and one loop operation at the end. Thus, the optimizer can simply do one pass over the operations, removing some (with some extra care around the loop statement).
With this machinery in place, we can optimize the trace from the promotion example of the last post:
?- optimize( guard_value(x,3,[],b2, op2(x2,mul,var(x),const(2), op2(x3,add,var(x2),const(1), op2(i,sub,var(i),var(x3), op2(c,ge,var(i),const(0), guard_true(c,[],l_done, loop)))))), [], LoopOut). LoopOut = guard_value(x, 3, [], b2, op2(i, sub, var(i), const(7), op2(c, ge, var(i), const(0), guard_true(c, [x/3, x2/6, x3/7], l_done, op1(x, same, const(3), op1(x2, same, const(6), op1(x3, same, const(7), loop)))))))More readably, the optimized version is:
guard_value(x, 3, [], b2, op2(i, sub, var(i), const(7), op2(c, ge, var(i), const(0), guard_true(c, [x/3, x2/6, x3/7], l_done, op1(x, same, const(3), op1(x2, same, const(6), op1(x3, same, const(7), loop)))))))As intended, the operations on x after the guard_value have all been removed. However, some additional assignments (to x, x2, x3) at the end have been generated as well. The assignments look superfluous, but the optimizer does not have enough information to easily recognize this. That can be fixed, but only at the cost of additional complexity. (A real system would transform the trace into static single assignment form to answer such questions.)
Resuming to the InterpreterWhy does the code above need to add the partial environment to the guards that cannot be optimized away? The reason is related to why we needed to generate assignments before the loop statement. The problem is that the optimizer removes assignments to variables when it knows the values of these variables. That means that when switching back from running the optimized trace to the interpreter, a number of variables are not updated in the environment, making the execution in the interpreter incorrect.
In the example above, this applies to the variables x2 and x3. When the second guard fails, they have not been assigned in the optimized case. Therefore, the guard lists them and their (always constant) values.
When switching back these assignments need to be made. Thus we need to adapt the resume_interp function from the last blog post as follows:
write_resumevars([], Env, Env). write_resumevars([Key / Value | Rest], Env, NEnv) :- write_env(Env, Key, Value, Env1), write_resumevars(Rest, Env1, NEnv). resume_interp(Env, ResumeVars, L) :- write_resumevars(ResumeVars, Env, NEnv), block(L, Block), interp(Block, NEnv).On resuming, the ResumeVars (a former partial environment) are simply added back to the normal environment before going back to the interpreter.
The data attached to guards about what needs to be done to resume to the interpreter when the guard fails is often a very complex part of a tracing system. The data can become big, yet most guards never fail. Therefore, most real systems try hard to compress the attached data or try to share it between subsequent guards.
SummaryIn this post we have shown how to optimize traces by applying a variant of the partial evaluation principle: Perform all the operations that have only constant arguments, leave the others alone. However, optimizing traces is much simpler, because no control flow is involved. All the questions about control flow have already been solved by the tracing component.
In the next and final post of the series I will show a larger example of how tracing and partial evaluation can be used to optimize a small bytecode interpreter.
Michael Bayer: Patterns Implemented by SQLAlchemy
When I first created SQLAlchemy, I knew I wanted to create something significant. It was by no means the first ORM or database abstraction layer I'd written; by 2005, I'd probably written about a dozen abstraction layers in several languages, including in Java, Perl, C and C++ (really bad C and even worse C++, one that talked to ODBC and another that communicated with Microsoft's ancient DB-LIB directly). All of these abstraction layers were in the range of awful to mediocre, and certainly none were anywhere near release-quality; even by late-90's to early-2000's standards. They were all created for closed-source applications written on the job, but each one did its job very well.
It was the repetitive creation of the same patterns over and over again that made apparent the kinds of things a real toolkit should have, as well as increased the urge to actually go through with it, so that I wouldn't have to invent new database interaction layers for every new project, or worse, be compelled by management to use whatever mediocre product they had read about the week before (keeping in mind I was made to use such disasters as EJB 1.0). But at the same time it was apparent to me that I was going to need to do some research up front as well. The primary book I used for this research was Patterns of Enterprise Archictecture by Martin Fowler. When reading this book, about half the patterns were ones that I'd already used implicitly, and the other half were ones that I was previously not entirely aware of.
Sometimes I read comments from new users expressing confusion or frustration with SQLAlchemy's concepts. Maybe some of these users are not only new to SQLAlchemy but are new to database abstraction layers in general, and some maybe even to relational databases themselves. What I'd like to lay out here is just how many of POEAA's patterns SQLAlchemy is built upon. If you're new to SQLAlchemy, my hope is that this list might help to de-mystify where these patterns come from.
These links are from Catalog of Patterns of Enterprise Architecture.
- Data Mapper - The key to this pattern is that object-relational mapping is applied to a user-defined class in a transparent way, keeping the details of persistence separate from the public interface of the class. SQLAlchemy's classical mapping system, which is the usage of the mapper() function to link a class with table metadata, implemented this pattern as fully as possible. In modern SQLAlchemy, we use the Declarative pattern which combines table metadata with the class' declaration as a shortcut to using mapper(), but the persistence API remains separate.
- Unit of Work - This pattern is where the system transparently keeps track of changes to objects and periodically flushes all those pending changes out to the database. SQLAlchemy's Session implements this pattern fully in a manner similar to that of Hibernate.
- Identity Map - This is an essential pattern that establishes unique identities for each object within a particular session, based on database identity. No ORM should be without this feature, as working with object structures and applications of the most moderate complexity is vastly simplified and made more efficient with this pattern in place.
- Metadata Mapping - this chapter in the book is where the name MetaData comes from. The exact correspondence to Fowler's pattern would be the combination of mapper() and Table.
- Query Object - Both the ORM Query and the Core select() construct are built on this pattern.
- Repository - An interface that serves as the gateway to the database, in terms of object-relational mappings. This is the SQLAlchemy Session.
- Lazy Load - Load a related collection or object as you need it. SQLAlchemy, like Hibernate, has a lot of options in how attributes can load things.
- Identity Field - Represent the primary key of a table's row within the object that represents it.
- Foreign Key Mapping - Database foreign keys are represented using relationships in the object model.
- Association Table Mapping - A class can be mapped that represents information about how two objects are related to each other. Use the Association Object for this pattern.
- Embedded Value - a value inline on an object represents multiple columns. SQLAlchemy provides the Composite pattern here.
- Serialized LOB - Sometimes you just want to stuff all the objects into a BLOB. Use the PickleType or roll a JSON type.
- Inheritance Mappers - Represent class hierarchies within database tables. See Inheritance Mapping.
- Optimistic Offline Lock - Set up a version id on your mapping to enable this feature in SQLAlchemy.
Thanks for reading!
Yuval Greenfield: Statistics on reddit’s top 10,000 titles with NLTK
Drawing inspiration from this blog post on title virality I wanted to investigate what makes these top 10,000 titles the best of their breed. Which are the best superlatives? Who/what’s the most popular subject? Let’s start with some statistics:
- On Feb. 03, 14:10:45 (UTC) the all-time top 10,000 submissions on reddit (/r/all) had a total of 82,751,429 upvotes and 62,655,532 downvotes (56.9% liked it).
- 5.2 years between the oldest and newest submission
- 8,331,382 comments. That’s about 833 comments per submission.
- The #1 post has 26,758 – 4,882 = 21,876 points
- The #10,000 post has 15,166 - 13,679 = 1,487 points
- And now some graphs….
- President Obama’s new campaign poster
- Upvoting everything just to see the new pineapples
- New approach to China
- Ricky Gervais has an idea that would not only make the Golden Globes watchable, it would make it the best show of the year
- Best picture of a dog getting hit in the crotch with a tennis ball you will see all day. Yup that’s my dog.
- CSI: Modern computer technology at its best.
- Dear Old People. We don’t want to kill you. You’re our parents and grandparents and we love you. But if you throw a cranky fit and keep us from getting decent, affordable health care, you can figure out how to work your own goddamn PCs and cable boxes and remote controls from now on.
- How I got an uncooperative eBay buyer to pay for her purchase. Was it unethical?
- How to report the News
- Supreme Court ruling comes down – Corporations are people with free speech and the protected right to bribe politicians. Let’s not even pretend anymore folks, democracy in America is dead.
- We, the People of the United States of America, reject the U.S. Supreme Court’s ruling in Citizens United, and move to amend our Constitution to firmly establish that money is not speech, and that human beings, not corporations, are persons entitled to constitutional rights.
- 14 out of 14 people found this review helpful (PIC)
- This is a news website article about a scientific finding
- I work in News. This is how you stop SOPA.
- Can we all agree, that this is NOT an accident? Fuck you, Fox News. [pic]
- I hate my job…
- Reddit, I don’t give a damn about your aunt, uncle, boyfriend, girlfriend, boss or toothless rabies infested dog who reads Reddit. Less personal crap and more a rticles please.
- I’ve had a vision and I can’t shake it: Colbert needs to hold a satirical rally in DC.
- I’m the only Caucasian in my part of town. I found this note on my windshield today…
I’m pretty sure you don’t need example links for these…
The top 10,000 seem to come mostly from 17:00 UTC and rarely from around 12:00 UTCThis isn’t exactly the probability of succeeding to hit the front page as it’s not clear at what time submission count is highest. But it’s something.
An apologyThis is my first time using NLTK and though I’m ok at coding I most certainly have no idea how to parse natural language. Here’s hoping this was somewhat insightful.
More graphs and data- Imgur album with a total of 11 more bar graphs
- The source code to continue parsing the data yourself
- Python Reddit API
- Python Natural Language Toolkit
- http://stackoverflow.com/questions/526469/practical-examples-of-nltk-use
Daniel Greenfeld: Announcing Consumer Notebook!
Let's drill down and take a closer look at one of the items on the page, in this case Doug Hellmann's amazing The Python Standard Library by Example. The product detail pages include the ability to add pros and cons and attach said products to comparison grids and specialized lists like 'my wishlist' and 'my possessions'.
Speaking of wishlists, check out my own:
In order to add items, like footy pajamas, I click on the 'add' button and paste the Amazon (or BestBuy) URL into the form:
At this time we just handle Amazon USA and BestBuy USA. In the future we plan on adding more affiliate providers, including non-USA providers to support our non-USA friends.
There's a lot more than that...In addition to weekly infographics, comparison grids, lists, and products, Consumer Notebook also awards points, coins, badges, and a growing privilege set to participating users. We even implemented an energy bar which regenerates over time, designed to match the pace of human users and serve as one of the brakes on scripts and bots.
TechnologyI built this with Audrey Roy using Python, Django, JQuery, PostGreSQL, Memcached, and RabbitMQ. I'll be blogging in depth about the technical side in an upcoming post.
Genesis
It was the summer of 2010 and we were brainstorming ideas for a coding contest called Django Dash. The one we settled on was a listing and comparison site for Django called Django Packages. The result has been a very useful tool for the Django community. Eventually, with the help of several dozen people, we turned the code into the Open Comparison framework and launched Pyramid and Plone implementations. Time permitting this year, we plan to do Python, Flask, Twisted, Node, JQuery, and other implementations.
Since then we've wanted to do something similar, but in the context of products. And we wanted to do it right - elegant design combined with an ad-free space. So we cooked up Consumer Notebook, launching today!
We'll be adding features and enhancements in the months to come. We've acquired a community manager, and even have a blog. We would love for you to check out the site, share it with your friends and family, and send us your commentary, suggestions, and advice.
PyCharm: PyCharm 2.0.2 update available
We’ve just released a second bugfix update for PyCharm 2.0, version 2.0.2.
The update includes a number of Django specific fixes and minor features, improvements for the debugger and some important fixes in the IDE platform. Check out the full release notes.
As usual, the new version is available for download from the JetBrains site.
And if you’re looking forward to new features and not just bugfixes, stay tuned — the Early Access for PyCharm 2.1 is coming soon!
S. Lott: Multiprocessing Goodness -- Part 2 -- Class Defintions
The function must be design to work with Queues or Pipelines or other synchronization techniques.
There's an advantage, however, to defining a class which gracefully handles generator functions. If we have Generator-Aware multi-processing, we can (1) write our algorithms as generators and then (2) trivially connect Processes with Queues to improve scalability.
We're looking at creating processing "pipelines" using Queues. That way we can easily handle multiple-producer and multiple-consumer (fan-in, fan-out) processing that enhances concurrency.
We have three use cases: Producer, Consumer and Consumer-Producer.
Producer
A Producer gets data from somewhere and populates a queue with it. This is the source that feeds data into the pipeline.
class ProducerProcess( Process ):
"""Produces items into a Queue.
The "target" must be a generator function which yields
pickable items.
"""
def __init__( self, group=None, target=None, name=None, args=None, kwargs=None, output_queue=None, consumers=0 ):
super( ProducerProcess, self ).__init__( name=name )
self.target= target
self.args= args if args is not None else []
self.kwargs= kwargs if kwargs is not None else {}
self.output_queue= output_queue
self.consumers= consumers
def run( self ):
target= self.target
for item in target(*self.args, **self.kwargs):
self.output_queue.put( item )
for x in range(self.consumers):
self.output_queue.put( None )
self.output_queue.close()
This class will wrap a "target" function which must be a generator. Every value yielded is put into the "output_queue". When the source data runs out, enough sentinel tokens are put into the queue to satisfy all consumers.
Consumer
A Consumer gets data from a queue and does some final processing. Perhaps it loads a database, or writes a file. It is the sink that consumes data on the pipeline.
class ConsumerProcess( Process ):
"""Consumes items from a Queue.
The "target" must be a function which expects an iterable as it's
only argument. Therefore, the args value is not used here.
"""
def __init__( self, group=None, target=None, name=None, kwargs=None, input_queue=None, producers=0 ):
super( ConsumerProcess, self ).__init__( name=name )
self.target= target
self.kwargs= kwargs if kwargs is not None else {}
self.input_queue= input_queue
self.producers= producers
def items( self ):
while self.producers != 0:
for item in iter( self.input_queue.get, None ):
yield item
self.producers -= 1
def run( self ):
target= self.target
target( self.items(), **self.kwargs )
This class will wrap a "target" function which must be ready to work with any iterable. Every value from the queue will be provided to the target function for processing. When enough sentinel tokens have been consumed from producers, it terminates processing.
Consumer-Producer
The middle of a processing pipeline is consumer-producer processes which consume from one queue and the produce to another queue.
class ConsumerProducerProcess( Process ):
"""Consumes items from a Queue and produces items onto a Queue.
The "target" must be a generator function which yields
pickable items and which expects an iterable as it's
only argument. Therefore, the args value is not used here.
"""
def __init__( self, group=None, target=None, name=None, kwargs=None, input_queue=None, producers=0, output_queue=None, consumers=0 ):
super( ConsumerProducerProcess, self ).__init__( name=name )
self.target= target
self.kwargs= kwargs if kwargs is not None else {}
self.input_queue= input_queue
self.producers= producers
self.output_queue= output_queue
self.consumers= consumers
def items( self ):
while self.producers != 0:
for item in iter( self.input_queue.get, None ):
yield item
self.producers -= 1
def run( self ):
target= self.target
for item in target(self.items(), **self.kwargs):
self.output_queue.put( item )
for x in range(self.consumers):
self.output_queue.put( None )
self.output_queue.close()
This class will wrap a "target" function which must be a generator function that consumes an iterable.
Every value from the queue is provided to the target generator. Every value yielded by the generator is sent to the output queue. The input side counts sentinels to know when to stop. The output side produces enough sentinels to alert downstream processes.
Target Functions
A producer function must be a generator function of this form
def prod( *args ):
for item in some_function(*args):
yield item
A consumer function looks like this:
def cons( source ):
for item in source:
final_disposition(item)
Finally, a consumer-producer function looks like this.
def cons_prod( source ):
for item in source:
next_value= transform(item)
yield next_value
These functions can be tested and debugged like this.
for final in consumer( cons_prod( producer( *args ) ) ):
print( final )
That way we're confident that our algorithm is correct before attempting to scale it with multiprocessing.
Mike C. Fletcher: Kid replaced with Genshi for DirectDocs
I've pretty much finished replacing Kid with Genshi for directdocs (the project that generates the PyOpenGL documentation by combining the upstream docbook files with the pydoc-like introspection of PyOpenGL, as well as generating the OpenGLContext tutorial files). Was pretty much painless. Caught for a bit on ${ [ do(x) for x in y] } operation not working as expected where do() is a py:def function; was returning generator objects in Genshi instead of producing text. A few py:for tags solved that. Also spent a bit figuring out how to copy lxml-produced ETree nodes directly into the result;<?python
from genshi.input import ET
?>and then insert the elements with ${ET(element)}.
ShiningPanda: Selenium, Python and Jenkins on Debian - 2/3
Sample project
This tutorial is based on the poll application from the Django tutorial. It describes all the required steps to run Selenium tests on a Django project. The source code can be found here.
This sample project is organized as follow:
The following features are implemented:
- Poll management via the Django admin site,
- Vote submission.
Don't forget to install Django!
pip install Django
Testing tools
To write and execute Selenium tests, the following tools are used:
- nose: an extensible test runner,
- selenose: a package using the Python bindings for Selenium to provide some useful Selenium related plugins for nose,
- djangosanetesting: a set of nose plugins dedicated to Django,
- CherryPy: used by djangosanetesting to start a web server,
- coverage: to enable code coverage reporting.
Install them all by typing:
pip install selenose djangosanetesting CherryPy coverage
Configure tests
The test configuration is located in tests/setup.cfg and will be loaded by nose at runtime:
[nosetests]
with-xunit = true
with-coverage = true
cover-package = djangotutorial,polls
with-django = true
with-cherrypyliveserver = true
with-selenium-server = true
with-selenium-driver = true
- with-xunit generates test result reports,
- with-coverage and cover-package generate coverage reports for the djangotutorial and polls packages,
- with-django enables djangosanetesting which setup database for instance,
- with-cherrypyliveserver starts a CherryPy server on djangotutorial for Selenium tests,
- with-selenium-server starts a Selenium Server,
- with-selenium-driver provides a Selenium Web Driver to the tests.
The test database is configured in djangotutorial/settings.py with TEST_DATABASE_NAME (used by djangosanetesting for live server) and TEST_NAME:
TEST_DATABASE_NAME = 'djangotutorialtest.db'
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
# Other configuration
# ...
'TEST_NAME': TEST_DATABASE_NAME,
}
}
Write tests
Selenium tests are located in the tests folder. Here is a test template:
from selenose.cases import SeleniumTestCase
from djangosanetesting.cases import HttpTestCase
class SampleTestCase(SeleniumTestCase, HttpTestCase):
def test(self):
self.driver.get('http://localhost:8000/')
# Your test here...
This test inherits from:
- selenose.cases.SeleniumTestCase to flag this test as a Selenium test for the Selenium Driver Plugin of selenose (the one providing a Web Driver),
- djangosanetesting.cases.HttpTestCase to flag this test as an HTTP test for djangosanetesting, to start the CherryPy server.
Then write tests as usual, with the benefit of having the self.driver directly available.
The following sections contain the test examples of two major features:
- vote submission,
- login on admin site.
Test vote submission
The project defines a sample poll How is selenose? in the djangotutorial/polls/fixtures/initial_data.json fixture which is loaded at database initialization.
This test verifies that the voting process is working for this poll:
class PollsTestCase(SeleniumTestCase, HttpTestCase):
def test_vote(self):
self.driver.get('http://localhost:8000/polls/')
poll = self.driver.find_element_by_link_text('How is selenose?')
poll.click()
time.sleep(2) # Should use accurate WebDriverWait
choices = self.driver.find_elements_by_name('choice')
self.assertEquals(2, len(choices))
choices[1].click()
choices[1].submit()
lis = self.driver.find_elements_by_tag_name('li')
self.assertEquals(2, len(lis))
self.assertEquals('Cool? -- 0 votes', lis[0].text)
self.assertEquals('Super cool? -- 1 vote', lis[1].text)
- First open the page listing the polls: http://localhost:8000/polls/,
- Search the link pointing to the How is selenose? poll, and click on it,
- Verify that two choices are available, then select the last one before submitting the form,
- Finally verify the vote results on the poll result page.
Test admin site login
The project also defines an administrator admin/admin in the djangotutorial/polls/fixtures/initial_data.json fixture which is loaded at database initialization.
This test checks that the admin user can log into the admin site:
class AdminTestCase(SeleniumTestCase, HttpTestCase):
def test_login(self):
self.driver.get('http://localhost:8000/admin/')
self.driver.find_element_by_id('id_username').send_keys('admin')
password = self.driver.find_element_by_id('id_password')
password.send_keys('admin')
password.submit()
time.sleep(2) # Should use accurate WebDriverWait
self.assertTrue(self.driver.find_element_by_id('user-tools').text.startswith('Welcome'))
- First open the administration interface login page: http://localhost:8000/admin/,
- Then look for the field dedicated to username (has an id id_username) and type admin,
- Then look for the field dedicated to password (has an id id_password) and type admin before submitting the form,
- Finally assert that the user is welcomed.
Run tests
The script tests/run.py is the entry point to run the tests. It:
- Adds / (folder of the project) and /djangotutorial (for polls module) in the sys.path,
- Exports the DJANGO_SETTINGS_MODULE in os.environ to define the Django settings module to use in tests,
- Changes the current directory to the tests folder,
- Call nose.
To run all tests with the firefox Web Driver environment (see selenose documentation for more information on Web Driver environments), execute:
$ python tests/run.py --selenium-driver=firefox
Or to run a single test:
$ python tests/run.py --selenium-driver=firefox test_admin.py
What's next?
In our next blog post, how to integrate this project with Jenkins!
Python 4 Kids: Welcome back, Class Recap
Happy New Year
Welcome back to Python4Kids! I am sorry for being a little slack since the end of last year, but hopefully I’m over that now and we can start powering on again!
Towards the end of last year we started working on classes and a GUI toolkit called Tkinter. In this tutorial we will recap classes. Hopefully, we’ll recap Tkinter next. If you have data and functions which are related to each other in some way, classes allow you to group them together. This makes managing them easier, particularly as your programs get bigger.
Example:
>>> class allDads(object): ... def __init__(self,age=28): ... self.age = age ... >>> dad1 = allDads() >>> dad1.age 28 >>> dad2 = allDads(35) >>> dad2.age 35 >>> dad1.age 28This class has one method (called __init__()) and one attribute (called age). It inherits from object.
Recap bit
The main things we have learned about classes are:
- they are based on (“inherit from”) some Python object (usually object <- the italics here refers to the Python thing called object as opposed to object in its ordinary sense)
- classes are defined with the class statement. The class statement creates an archetype* – that is, a kind of template – from which specific instances are created. Thus the class allDads was representing all dads in the world, while myDad, was a specific instance of a dad – that is, my dad in particular.
- data which is stored in a class or instance is called an attribute.
- functions which are part of a class or instance are called methods
- if you have an instance of a class (eg myDad), then the attributes and methods of myDad are identified by the dot operator “.”. So, the appearance attribute of the instance myDad is identified by myDad.appearance (see the dot?). If it had a method called makesARobot() (the parenthesis indicates it is a function or method) that would be identified by myDad.makesARobot().
- the attributes of an instance are (typically) specific to that instance. If two instances have an attribute of the same name (which will always be the case for attributes defined by the class), changing the attribute for an instance does not affect the attribute of another instance.
- classes usually define a special method called __init__() (“dunder init“, among other pronunciations). This method is run whenever an instance of the class is created. It INITialises the instance.
- classes make use of special variable called self. self is a bit tricky to explain, but it will become clear when you start using it a bit. Self is used when defining the class as a way for each instance to refer to that instance of the class, rather than to the class as a whole.
Python’s classes are a core component of the language. You will end up using them all the time. In fact, they are also a core part of object oriented programming. I have introduced classes at the same time as Tkinter because Tkinter needs them. In order to program Tkinter, you basically need to use classes!
Homework:
Part 1:
Create a class with a method called __init__(self, firstName = None). Make the __init__ method assign the value of the firstName parameter to an attribute which is also called firstName. Hint: check the example above, and use self.firstName).
Part 2:
Create some instances of your class, passing your first name as a string (ie put it in quotation marks) to the class.
Part 3
Print the firstName attribute of the instances you have created.
Notes:
* an “archetype” is “a universally understood symbol or term or pattern of behavior, a prototype upon which others are copied, patterned, or emulated.“
Mike C. Fletcher: pip install --link-system?
Just ran into the little wonder that is "pip install PIL", which results in a useless PIL (i.e. one without JPEG, PNG or TrueType support, but which imports, so doesn't trigger the "I don't have PIL code paths"). There's other cases with e.g. PostgreSQL drivers, or complex GUI libraries, where if you don't happen to have a toolchain installed you are out-of-luck on the install, and really, you'd rather just use the .deb file version anyway. What I'd really like is something like this:
pip install --link-system PIL psycopg2 pygamewhere the 3 packages there would be looked up in the system Python corresponding to the Python in the virtualenv and links would be created from venv/lib/pythonX.Y/site-packages to the system-level packages (if it's more reliable, copy the files, I really don't mind).
Does something already do this elegantly/cleanly? I have a stupid little script that works for the few libraries I have (basically just imports them, printing out the paths, which I then script to create the links), but I really want something where I can just add a flag (or whatever) to the requirements file so that the whole thing is transparent (save that if the library isn't installed at the system level the user gets told to install it manually (or something)).
Mike C. Fletcher: TTFQuery Updated, PyVRML97 Bug-fix
TTFQuery has been updated to feel a tiny bit more modern. It's still pretty old-school, but it now has some actual documentation, and should be more friendly for pip installation.
Found the bug that has been holding up my testing on PyOpenGL/OpenGLContext. The shadow demos had been broken, which I considered a blocker. Turns out not to have been related to the double->float conversion, instead, I left out the rotation axis normalization in the new transformation-matrix-calculation library. Duh!
