Chiradeep Vittal: Design patterns in orchestrators: transfer of desired state (part 3/N)

Planet Apache - Fri, 2017-06-23 01:40

Most datacenter automation tools operate on the basis of desired state. Desired state describes what should be the end state but not how to get there. To simplify a great deal, if the thing being automated is the speed of a car, the desired state may be “60mph”. How to get there (braking, accelerator, gear changes, turbo) isn’t specified. Something (an “agent”) promises to maintain that desired speed.

The desired state and changes to the desired state are sent from the orchestrator to various agents in a datacenter. For example, the desired state may be “two apache containers running on host X”. An agent on host X will ensure that the two containers are running. If one or more containers die, then the agent on host X will start enough containers to bring the count up to two. When the orchestrator changes the desired state to “3 apache containers running on host X”, then the agent on host X will create another container to match the desired state.

Transfer of desired state is another way to achieve idempotence (a problem described here)

We can see that there are two sources of changes that the agent has to react to:

  1. changes to desired state sent from the orchestrator and
  2. drift in the actual state due to independent / random events.

Let’s examine #1 in greater detail. There’s a few ways to communicate the change in desired state:

  1. Send the new desired state to the agent (a “command” pattern). This approach works most of the time, except when the size of the state is very large. For instance, consider an agent responsible for storing a million objects. Deleting a single object would involve sending the whole desired state (999999 items). Another problem is that the command may not reach the agent (“the network is not reliable”). Finally, the agent may not be able to keep up with rate of change of desired state and start to drop some commands.  To fix this issue, the system designer might be tempted to run more instances of the agent; however, this usually leads to race conditions and out-of-order execution problems.
  2. Send just the delta from the previous desired state. This is fraught with problems. This assumes that the controller knows for sure that the previous desired state was successfully communicated to the agent, and that the agent has successfully implemented the previous desired state. For example, if the first desired state was “2 running apache containers” and the delta that was sent was “+1 apache container”, then the final actual state may or may not be “3 running apache containers”. Again, network reliability is a problem here. The rate of change is an even bigger potential problem here: if the agent is unable to keep up with the rate of change, it may drop intermediate delta requests. The final actual state of the system may be quite different from the desired state, but the agent may not realize it! Idempotence in the delta commands helps in this case.
  3. Send just an indication of change (“interrupt”). The agent has to perform the additional step of fetching the desired state from the controller. The agent can compute the delta and change the actual state to match the delta. This has the advantage that the agent is able to combine the effects of multiple changes (“interrupt debounce”). By coalescing the interrupts, the agent is able to limit the rate of change. Of course the network could cause some of these interrupts to get “lost” as well. Lost interrupts can cause the actual state to diverge from the desired state for long periods of time. Finally, if the desired state is very large, the agent and the orchestrator have to coordinate to efficiently determine the change to the desired state.
  4. The agent could poll the controller for the desired state. There is no problem of lost interrupts; the next polling cycle will always fetch the latest desired state. The polling rate is critical here: if it is too fast, it risks overwhelming the orchestrator even when there are no changes to the desired state; if too slow, it will not converge the the actual state to the desired state quickly enough.

To summarize the potential issues:

  1. The network is not reliable. Commands or interrupts can be lost or agents can restart / disconnect: there has to be some way for the agent to recover the desired state
  2. The desired state can be prohibitively large. There needs to be some way to efficiently but accurately communicate the delta to the agent.
  3. The rate of change of the desired state can strain the orchestrator, the network and the agent. To preserve the stability of the system, the agent and orchestrator need to coordinate to limit the rate of change, the polling rate and to execute the changes in the proper linear order.
  4. Only the latest desired state matters. There has to be some way for the agent to discard all the intermediate (“stale”) commands and interrupts that it has not been able to process.
  5. Delta computation (the difference between two consecutive sets of desired state) can sometimes be more efficiently performed at the orchestrator, in which case the agent is sent the delta. Loss of the delta message or reordering of execution can lead to irrecoverable problems.

A persistent message queue can solve some of these problems. The orchestrator sends its commands or interrupts to the queue and the agent reads from the queue. The message queue buffers commands or interrupts while the agent is busy processing a desired state request.  The agent and the orchestrator are nicely decoupled: they don’t need to discover each other’s location (IP/FQDN). Message framing and transport are taken care of (no more choosing between Thrift or text or HTTP or gRPC etc).

There are tradeoffs however:

  1. With the command pattern, if the desired state is large, then the message queue could reach its storage limits quickly. If the agent ends up discarding most commands, this can be quite inefficient.
  2. With the interrupt pattern, a message queue is not adding much value since the agent will talk directly to the orchestrator anyway.
  3. It is not trivial to operate / manage / monitor a persistent queue. Messages may need to be aggressively expired / purged, and the promise of persistence may not actually be realized. Depending on the scale of the automation, this overhead may not be worth the effort.
  4. With an “at most once” message queue, it could still lose messages. With  “at least once” semantics, the message queue could deliver multiple copies of the same message: the agent has to be able to determine if it is a duplicate. The orchestrator and agent still have to solve some of the end-to-end reliability problems.
  5. Delta computation is not solved by the message queue.

OpenStack (using RabbitMQ) and CloudFoundry (using NATS) have adopted message queues to communicate desired state from the orchestrator to the agent.  Apache CloudStack doesn’t have any explicit message queues, although if one digs deeply, there are command-based message queues simulated in the database and in memory.

Others solve the problem with a combination of interrupts and polling – interrupt to execute the change quickly, poll to recover from lost interrupts.

Kubernetes is one such framework. There are no message queues, and it uses an explicit interrupt-driven mechanism to communicate desired state from the orchestrator (the “API Server”) to its agents (called “controllers”).

(Image courtesy: https://blog.heptio.com/core-kubernetes-jazz-improv-over-orchestration-a7903ea92ca)

Developers can use (but are not forced to use) a controller framework to write new controllers. An instance of a controller embeds an “Informer” whose responsibility is to watch for changes in the desired state and execute a controller function when there is a change. The Informer takes care of caching the desired state locally and computing the delta state when there are changes. The Informer leverages the “watch” mechanism in the Kubernetes API Server (an interrupt-like system that delivers a network notification when there is a change to a stored key or value). The deltas to the desired state are queued internally in the Informer’s memory. The Informer ensures the changes are executed in the correct order.

  • Desired states are versioned, so it is easier to decide to compute a delta, or to discard an interrupt.
  • The Informer can be configured to do a periodic full resync from the orchestrator (“API Server”) – this should take care of the problem of lost interrupts.
  • Apparently, there is no problem of the desired state being too large, so Kubernetes does not explicitly handle this issue.
  • It is not clear if the Informer attempts to rate-limit itself when there are excessive watches being triggered.
  • It is also not clear if at some point the Informer “fast-forwards” through its queue of changes.
  • The watches in the API Server use Etcd watches in turn. The watch server in the API server only maintains a limited set of watches received from Etcd and discards the oldest ones.
  • Etcd itself is a distributed data store that is more complex to operate than say, an SQL database. It appears that the API server hides the Etcd server from the rest of the system, and therefore Etcd could be replaced with some other store.

I wrote a Network Policy Controller for Kubernetes using this framework and it was the easiest integration I’ve written.

It is clear that the Kubernetes creators put some thought into the architecture, based on their experiences at Google. The Kubernetes design should inspire other orchestrator-writers, or perhaps, should be re-used for other datacenter automation purposes. A few issues to consider:

  • The agents (“controllers”) need direct network reachability to the API Server. This may not be possible in all scenarios, needing another level of indirection
  • The API server is not strictly an orchestrator, it is better described as a choreographer. I hope to describe this difference in a later blog post, but note that the API server never explicitly carries out a step-by-step flow of operations.

Categories: FLOSS Project Planets

Brad Lucas: Python Virtualenv

Planet Python - Fri, 2017-06-23 00:00

Virtualenv supports the creation of isolated Python environments. This allows you to create your project with all of it's dependencies in one place. Not only does this allow for a simpler deployment path when you release your project but it also makes trying different versions of libraries and experimenting safer.

The following is a good intro for virtualenv.



Step 1 is to install virtualenv. When you run the following you'll be installing virtualenv on your machine.

$ pip install virtualenv

You should be able to execute the command virtualenv afterwards.

When you can you can move on.


Here is how I use virtualenv

I create my project directory such as:

$ cd /home/brad/projects/ $ mkdir example $ cd example

Then I setup a virutal environment for this project with the following command:

$ virtualenv env

This differs from the site mentioned above. I do the same for every project so when I'm in each of my project directories they all have an env directory. Then everything is the same. This keeps things simple and easily remembered.

To start the virtual env you run the following from the project directory

$ source ./env/bin/activate

Once your virtual environment is activated anything you install will be put into the environment. If you look you'll notice new directories inside of ./env/lib/python2.7/site-packages after each install.

Now, install what you need. For example,

$ pip install zipline

To see all the libraries in your environment you can run pip list.

$ pip list

When you run python in your active environment you'll have access to all the libraries you've just installed.

Lastly, to get out of the virtual env you can run deactivate.

$ deactivate
Categories: FLOSS Project Planets

Hynek Schlawack: Sharing Your Labor of Love: PyPI Quick and Dirty

Planet Python - Thu, 2017-06-22 20:00

A completely incomplete guide to packaging a Python module and sharing it with the world on PyPI.

Categories: FLOSS Project Planets

Justin Mason: Links for 2017-06-22

Planet Apache - Thu, 2017-06-22 19:58
Categories: FLOSS Project Planets

Gaël Varoquaux: Scikit-learn Paris sprint 2017

Planet Python - Thu, 2017-06-22 18:00

Two week ago, we held in Paris a large international sprint on scikit-learn. It was incredibly productive and fun, as always. We are still busy merging in the work, but I think that know is a good time to try to summarize the sprint.

A massive workforce

We had a mix of core contributors and newcomers, which is a great combination, as it enables us to be productive, but also to foster the new generation of core developers. Were present:

  • Albert Thomas
  • Alexandre Abadie
  • Alexandre Gramfort
  • Andreas Mueller
  • Arthur Imbert
  • Aurélien Bellet
  • Bertrand Thirion
  • Denis Engemann
  • Elvis Dohmatob
  • Gael Varoquaux
  • Jan Margeta
  • Joan Massich
  • Joris Van den Bossche
  • Laurent Direr
  • Lemaitre Guillaume
  • Loic Esteve
  • Mohamed Maskani Filali
  • Nathalie Vauquier
  • Nicolas Cordier
  • Nicolas Goix
  • Olivier Grisel
  • Patricio Cerda
  • Paul Lagrée
  • Raghav RV
  • Roman Yurchak
  • Sebastien Treger
  • Sergei Lebedev
  • Thierry Guillemot
  • Thomas Moreau
  • Tom Dupré la Tour
  • Vlad Niculae
  • Manoj Kumar (could not come to Paris because of visa issues)

And many more people participating remote, and I am pretty certain that I forgot people.

Support and hosting

Hosting: As the sprint extended through a French bank holiday and the week end, we were hosted in a variety of venues:

  • La paillasse, a Paris bio-hacker space
  • Criteo, a French company doing word-wide add-banner placement. The venue there was absolutely gorgeous, with a beautiful terrace on the roofs of Paris. And they even had a social event with free drinks one evening.

Guillaume Lemaître did most of the organization, and at Criteo Ibrahim Abubakari was our host. We were treated like kings during the whole stay; each host welcoming us as well they could.

Financial support by France is IA: Beyond our hosts, we need to thank France is IA who fund the sprint, covering some of the lunches, accomodations, and travel expenses to bring in our contributors from abroad (3000 euros travel & accomodation, and 1000 euros for food and a venue during the week end).

Some achievements during the sprint

I would be hard to list everything that we did during the sprint (have a look at the development changelog if you’re curious). Here are some

  • Quantile transformer, to transform the data distribution into uniform, or Gaussian distributions (PR, example):



  • Memory saving by avoiding to cast to float64 if X is given as float32: we are slowly making sure that, as much as possible, all models avoid using internal representations of a dtype float64 when the data is given as float32. This reduces significantly memory usage and can give speed ups up to a factor of two.

  • API test on instances rather than class. This is to facilitate testing packages in scikit-learn-contrib.

  • Many small API fixes to ensure better consistency of models, as well as cleaning the codebase, making sure that examples display well under matplotlib 2.x.

  • Many bug fixes, include fixing corner cases in our average precision, which was dear to me (PR).

Work soon to be merged

  • ColumnTransformer (PR): from pandas dataframe to feature matrix, by applying different transformers to different columns.
  • Fixing t-SNE (PR): our t-SNE implementation was extremely memory-inefficient, and on top of this had minor bugs. We are fixing it.

There is a lot more of pending work that the sprint help moved forward. You can also glance at the monthly activity report on github.

Joblib progress

Joblib, the parallel-computing engine used by scikit-learn, is getting extended to work in distributed settings, for instance using dask distributed as a backend. At the sprint, we made progress running a grid-search on Criteo’s Hadoop cluster.

Categories: FLOSS Project Planets

Tarek Ziade: Advanced Molotov example

Planet Python - Thu, 2017-06-22 18:00

Last week, I blogged about how to drive Firefox from a Molotov script using Arsenic.

It is pretty straightforward if you are doing some isolated interactions with Firefox and if each worker in Molotov lives its own life.

However, if you need to have several "users" (==workers in Molotov) running in a coordinated way on the same web page, it gets a little bit tricky.

Each worker is its coroutine and triggers the execution of one scenario by calling the coroutine that was decorated with @scenario.

Let's consider this simple use case: we want to run five workers in parallel that all visit the same etherpad lite page with their own Firefox instance through Arsenic.

One of them is adding some content in the pad and all the others are waiting on the page to check that it is updated with that content.

So we want four workers to wait on a condition (=pad written) before they make sure and check that they can see it.

Moreover, since Molotov can call a scenario many times in a row, we need to make sure that everything was done in the previous round before changing the pad content again. That is, four workers did check the content of the pad.

To do all that synchronization, Python's asyncio offers primitives that are similar to the one you would use with threads. asyncio.Event can be used for instance to have readers waiting for the writer and vice-versa.

In the example below, a class wraps two Events and exposes simple methods to do the syncing by making sure readers and writer are waiting for each other:

class Notifier(object): def __init__(self, readers=5): self._current = 1 self._until = readers self._readers = asyncio.Event() self._writer = asyncio.Event() def _is_set(self): return self._current == self._until async def wait_for_writer(self): await self._writer.wait() async def one_read(self): if self._is_set(): return self._current += 1 if self._current == self._until: self._readers.set() def written(self): self._writer.set() async def wait_for_readers(self): await self._readers.wait()

Using this class, the writer can call written() once it has filled the pad and the readers can wait for that event by calling wait_for_writer() which blocks until the write event is set.

one_read() is then called for each read. This second event is used by the next writer to make sure it can change the pad content after every reader did read it.

So how do we use this class in a Molotov test? There are several options and the simplest one is to create one Notifier instance per run and set it in a variable:

@molotov.scenario(1) async def example(session): get_var = molotov.get_var notifier = get_var('notifier' + str(session.step), factory=Notifier) wid = session.worker_id if wid != 4: # I am NOT worker 4! I read the pad # wait for worker #4 to edit the pad await notifier.wait_for_writer() # <.. pad reading here...> # notify that we've read it await notifier.one_read() else: # I am worker 4! I write in the pad if session.step > 1: # waiting for the previous readers to have finished # before we start a new round previous_notifier = get_var('notifier' + str(session.step)) await previous_notifier.wait_for_readers() # <... writes in the pad...> # informs that the write task was done notifier.written()

A lot is going on in this scenario. Let's look at each part in detail. First of all, the notifier is created as a var via set_var(). Its name contains the session step.

The step value is incremented by Molotov every time a worker is running a scenario, and we can use that value to create one distinct Notifier instance per run. It starts at 1.

Next, the session.worker_id value gives each distinct worker a unique id. If you run molotov with 5 workers, you will get values from 0 to 4.

We are making the last worker (worker id== 4) the one that will be in charge of writing in the pad.

For the other workers (=readers), they just use wait_for_writer() to sit and wait for worker 4 to write the pad. worker 4 notifies them with a call to written().

The last part of the script allows Molotov to run the script several times in a row using the same workers. When the writer starts its work, if the step value is superior to one, it means that we have already run the test at least one time.

The writer, in that case, gets back the Notifier from the previous run and verifies that all the readers did their job before changing the pad.

All of this syncing work sound complicated, but once you understand the pattern, it let you run advanced scenario in Molotov where several concurrent "users" need to collaborate.

You can find the full script at https://github.com/tarekziade/molosonic/blob/master/loadtest.py

Categories: FLOSS Project Planets

Steve McIntyre: -1, Trolling

Planet Debian - Thu, 2017-06-22 17:59

Here's a nice comment I received by email this morning. I guess somebody was upset by my last post?

From: Tec Services <tecservices911@gmail.com> Date: Wed, 21 Jun 2017 22:30:26 -0700 To: steve@einval.com Subject: its time for you to retire from debian...unbelievable..your the quality guy and fucked up the installer! i cant ever remember in the hostory of computing someone releasing an installer that does not work!! wtf!!! you need to be retired...due to being retarded.. and that this was dedicated to ian...what a disaster..you should be ashames..he is probably roling in his grave from shame right now....

It's nice to be appreciated.

Categories: FLOSS Project Planets

Stephen Ferg: Unicode for dummies &#8212; Encoding

Planet Python - Thu, 2017-06-22 17:27

Another entry in an irregular series of posts about Unicode.
Typos fixed 2012-02-22. Thanks Anonymous, and Clinton, for reporting the typos.

This is a story about encoding and decoding, with a minor subplot involving Unicode.

As our story begins — on a dark and stormy night, of course — we find our protagonist deep in thought. He is asking himself “What is an encoding?”

What is an encoding?

The basic concepts are simple. First, we start with the idea of a piece of information — a message — that exists in a representation that is understandable (perspicuous) to a human being. I’m going to call that representation “plain text”. For English-language speakers, for example, English words printed on a page, or displayed on a screen, count as plain text.

Next, (for reasons that we won’t explore right now) we need to be able to translate a message in a plain-text representation into some other representation (let’s call that representation the “encoded text”), and we need to be able to translate the encoded text back into plain text. The translation from plain text to encoded text is called “encoding”, and the translation of encoded text back into plain text is called “decoding”.

There are three points worth noting about this process.

The first point is that no information can be lost during encoding or decoding. It must be possible for us to send a message on a round-trip journey — from plain text to encoded text, and then back again from encoded text to plain text — and get back exactly the same plain text that we started with. That is why, for instance, we can’t use one natural language (Russian, Chinese, French, Navaho) as an encoding for another natural language (English, Hindi, Swahili). The mappings between natural languages are too loose to guarantee that a piece of information can make the round-trip without losing something in translation.

The requirement for a lossless round-trip means that the mapping between the plain text and the encoded text must be very tight, very exact. And that brings us to the second point.

In order for the mapping between the plain text and the encoded text to be very tight — which is to say: in order for us to be able to specify very precisely how the encoding and decoding processes work — we must specify very precisely what the plain text representation looks like.

Suppose, for example, we say that plain text looks like this: the 26 upper-case letters of the Anglo-American alphabet, plus the space and three punctuation symbols: period (full stop), question mark, and dash (hyphen). This gives us a plain-text alphabet of 30 characters. If we need numbers, we can spell them out, like this: “SIX THOUSAND SEVEN HUNDRED FORTY-THREE”.

On the other hand, we may wish to say that our plain text looks like this: 26 upper-case letters, 26 lower-case letters, 10 numeric digits, the space character, and a dozen types of punctuation marks: period, comma, double-quote, left parenthesis, right parenthesis, and so on. That gives us a plain-text alphabet of 75 characters.

Once we’ve specified exactly what a plain-text representation of a message looks like — a finite sequence of characters from our 30-character alphabet, or perhaps our 75-character alphabet — then we can devise a system (a code) that can reliably encode and decode plain-text messages written in that alphabet. The simplest such system is one in which every character in the plain-text alphabet has one and only one corresponding representation in the encoded text. A familiar example is Morse code, in which “SOS” in plain text corresponds to

... --- ...

in encoded text.

In the real world, of course, the selection of characters for the plain-text alphabet is influenced by technological limitations on the encoded text. Suppose we have several available technologies for storing encoded messages: one technology supports an encoded alphabet of 256 characters, another technology supports only 128 encoded characters, and a third technology supports only 64 encoded characters. Naturally, we can make our plain-text alphabet much larger if we know that we can use a technology that supports a larger encoded-text alphabet.

And the reverse is also true. If we know that our plain-text alphabet must be very large, then we know that we must find — or devise — a technology capable of storing a large number of encoded characters.

Which brings us to Unicode.


Unicode was devised to be a system capable of storing encoded representations of every plain-text character of every human language that has ever existed. English, French, Spanish. Greek. Arabic. Hindi. Chinese. Assyrian (cuneiform characters).

That’s a lot of characters.

So the first task of the Unicode initiative was simply to list all of those characters, and count them. That’s the first half of Unicode, the Universal Character Set. (And if you really want to “talk Unicode”, don’t call plain-text characters “characters”. Call them “code points”.)

Once you’ve done that, you’ve got to figure out a technology for storing all of the corresponding encoded-text characters. (In Unicode-speak, the encoded-text characters are called “code values”.)

In fact Unicode defines not one but several methods of mapping code points to code values. Each of these methods has its own name. Some of the names start with “UTF”, others start with “UCS”: UTF-8, UTF-16, UTF-32, UCS-2, UCS-4, and so on. The naming convention is “UTF-” and “UCS-” Some (e.g. UCS-4 and UTF-32) are functionally equivalent. See the Wikipedia article on Unicode.

The most important thing about these methods is that some are fixed-width encodings and some are variable-width encodings. The basic idea is that the fixed-width encodings are very long — UCS-4 and UTF-32 are 4 bytes (32 bits) long — long enough to hold the the biggest code value that we will ever need.

In contrast, the variable-width encodings are designed to be short, but expandable. UTF-8, for example, can use as few as 8 bits (one byte) to store Latin and ASCII characters code points. But it also has a sort of “continued on the next byte” mechanism that allows it to use 2 bytes or even 4 bytes if it needs to (as it might, for Chinese characters). For Western programmers, that means that UTF-8 is both efficient and flexible, which is why UTF-8 is the de facto standardard encoding for exchanging Unicode text.

There is, then, no such thing as THE Unicode encoding system or method. There are several encoding methods, and if you want to exchange text with someone, you need explicitly to specify which encoding method you are using.

Is it, say, this.

Or this.

Or something else.

Which brings us back to something I said earlier.

Why encode something in Unicode?

At the beginning of this post I said

We start with the idea of a piece of information — a message — that exists in a representation that is understandable (perspicuous) to a human being.

Next, (for reasons that we won’t explore right now) we need to be able to translate a message in a plain-text representation into some other representation. The translation from plain text to encoded text is called “encoding”, and the translation of encoded text back into plain text is called “decoding”.

OK. So now it is time to explore those reasons. Why might we want to translate a message in a plain-text representation into some other representation?

One reason, of course, is that we want to keep a secret. We want to hide the plain text of our message by encrypting and decrypting it — basically, by keeping the algorithms for encoding and decoding secret and private.

But that is a completely different subject. Right now, we’re not interested in keeping secrets; we’re Python programmers and we’re interested in Unicode. So:

Why — as a Python programmer — would I need to be able to translate a plain-text message into some encoded representation… say, a Unicode representation such as UTF-8?

Suppose you are happily sitting at your PC, working with your favorite text editor, writing the standard Hello World program in Python (specifically, in Python 3+). This single line is your entire program.

print("Hello, world!")

Here, “Hello, world!” is plain text. You can see it on your screen. You can read it. You know what it means. It is just a string and you can (if you wish) do standard string-type operations on it, such as taking a substring (a slice).

But now suppose you want to put this string — “Hello, world!” — into a file and save the file on your hard drive. Perhaps you plan to send the file to a friend.

That means that you must eject your poor little string from the warm, friendly, protected home in your Python program, where it exists simply as plain-text characters. You must thrust it into the cold, impersonal, outside world of the file system. And out there it will exist not as characters, but as mere 1’s and 0’s, a jumble of dits and dots, charged and uncharged particles. And that means that your happy little plain-text string must be represented by some specific configuration of 1s and 0s, so that when somebody wants to retrieve that collection of 1s and 0s and convert it back into readable plain text, they can.

The process of converting a plain text into a specific configuration of 1s and 0s is a process of encoding. In order to write a string to a file, you must encode it using some encoding system (such as UTF-8). And to get it back from a file, you must read the file and decode the collection of 1s and 0s back into plain text.

The need to encode/decode strings when writing/reading them from/to files isn’t something new — it is not an additional burden imposed by Python 3’s new support for Unicode. It is something you have always done. But it wasn’t always so obvious. In earlier versions of Python, the encoding scheme was ASCII. And because, in those olden times, ASCII was pretty much the only game in town, you didn’t need to specify that you wanted to write and read your files in ASCII. Python just assumed it by default and did it. But — whether or not you realized it — whenever one of your programs wrote or read strings from a file, Python was busy behind the scene, doing the encoding and decoding for you.

So that’s why you — as a Python programmer — need to be able to encode and decode text into, and out of, UTF-8 (or some other encoding: UTF-16, ASCII, whatever). You need to encode your strings as 1s and 0s so you can put those 1s and 0s into a file and send the file to someone else.

What is plain text?

Earlier, I said that there were three points worth noting about the encoding/decoding process, and I discussed the first two. Here is the third point.

The distinction between plain text and encoded text is relative and context-dependent.

As programmers, we think of plain text as being written text. But it is possible to look at matters differently. For instance, we can think of spoken text as the plain text, and written text as the encoded text. From this perspective, writing is encoded speech. And there are many different encodings for speech as writing. Think of Egyptian hieroglyphics, Mayan hieroglyphics, the Latin alphabet, the Greek alphabet, Arabic, Chinese ideograms, wonderfully flowing Devanagari देवनागरी, sharp pointy cuneiform wedges, even shorthand. These are all written encodings for the spoken word. They are all, as Thomas Hobbes put it, “Marks by which we may remember our thoughts”.

Which reminds us that, in a different context, even speech itself — language — may be regarded as a form of encoding. In much of early modern philosophy (think of Hobbes and Locke) speech (or language) was basically considered to be an encoding of thoughts and ideas. Communication happens when I encode my thought into language and say something — speak to you. You hear the sound of my words and decode it back into ideas. We achieve communication when I successfully transmit a thought from my mind to your mind via language. You understand me when — as a result of my speech — you have the same idea in your mind as I have in mine. (See Ian Hacking, Why Does Language Matter to Philosophy?)

Finally, note that in other contexts, the “plain text” isn’t even text. Where the plain text is soundwaves (e.g. music), it can be encoded as an mp3 file. Where the plain text is an image, it can be encoded as a gif, or png, or jpg file. Where the plain text is a movie, it can be encoded as a wmv file. And so on.

Everywhere, we are surrounded by encoding and decoding.


I’d like to recommend Eli Bendersky’s recent post on The bytes/str dichotomy in Python 3, which prodded me — finally — to put these thoughts into writing. I especially like this passage in his post.

Think of it this way: a string is an abstract representation of text. A string consists of characters, which are also abstract entities not tied to any particular binary representation. When manipulating strings, we’re living in blissful ignorance. We can split and slice them, concatenate and search inside them. We don’t care how they are represented internally and how many bytes it takes to hold each character in them. We only start caring about this when encoding strings into bytes (for example, in order to send them over a communication channel), or decoding strings from bytes (for the other direction).

I strongly recommend Charles Petzold’s wonderful book Code: The Hidden Language of Computer Hardware and Software.

And finally, I’ve found Stephen Pincock’s Codebreaker: The History of Secret Communications a delightful read. It will tell you, among many other things, how the famous WWII Navaho codetalkers could talk about submarines and dive bombers… despite the fact that there are no Navaho words for “submarine” or “dive bomber”.

Categories: FLOSS Project Planets

LevelTen Interactive: Travel Websites Built with Drupal

Planet Drupal - Thu, 2017-06-22 16:46

This summer, LevelTen brought back the Web & Drupal Developer Internship program and we've brought on 3 up and coming developer interns! In today's post, Anima Bajracharya, and her research assignment with of Drupal Travel websites:

After researching some of the case studies of Travel sites created in Drupal, I found out that Drupal can help businesses across any industry to create rich digital experiences. It is no surprise that more than one million sites trust Drupal today. With benefits such as scalability, free modules, responsive design, flexible APIs and one of the...Read more

Categories: FLOSS Project Planets

ISO Image Writer

Planet KDE - Thu, 2017-06-22 15:14

ISO Image Writer is a tool I’m working on which writes .iso files onto a USB disk ready for installing your lovely new operating system.  Surprisingly many distros don’t have very slick recommendations for how to do this but they’re all welcome to try this.

It’s based on ROSA Image Writer which has served KDE neon and other projects well for some time.  This adds ISO verification to automatically check the digital signatures or checksums, currently supported is KDE neon, Kubuntu and Netrunner.  It also uses KAuth so it doesn’t run the UI as root, only a simple helper binary to do the writing.  And it uses KDE Frameworks goodness so the UI feels nice.

First alpha 0.1 is out now.

Download from https://download.kde.org/unstable/isoimagewriter/

Signed by release manager Jonathan Riddell with 0xEC94D18F7F05997E. Git tags are also signed by the same key.

It’s in KDE Git at kde:isoimagewriter and in bugs.kde.org, please do try it out and report any issues.  If you’d like a distro added to the verification please let me know and/or submit a patch. (The code to do with is a bit verbose currently, it needs tidied up.)

I’d like to work out how to make AppImages, Windows and Mac installs for this but for now it’s in KDE neon developer editions and available as source.


Categories: FLOSS Project Planets

Philip Semanchuk: Analyzing the Anglo-Saxonicity of the Baby BNC

Planet Python - Thu, 2017-06-22 14:46

This is a followup to an earlier post about using Python to measure the “Anglo-Saxonicity” of a text. I’ve used my code to analyze the Baby version of the British National Corpus, and I’ve found some interesting results.

How to Measure Anglo-Saxonicity – With a Ruler or Yardstick?


Thanks to a suggestion from Ben Sizer, I decided to analyze the British National Corpus. I started with the ‘baby’ corpus which, as you might imagine, is smaller than the full corpus.

It’s described as a “100 million word snapshot of British English at the end of the twentieth century“. It categorizes text samples into four groups: academic, conversations, fiction, and news. Below are stack plots showing the percentage of Anglo-Saxon, non-Anglo-Saxon, and unknown words for each document in each of the four groups. The Y axis shows the percentage of words in each category. The numbers along the X axis identify individual documents within the group.

I’ve deliberately given the charts non-specific names of Group A, B, C, and D so that we can play a game. :-)

Before we get to the game, here’s the averages for each group in table form. (The numbers might not add exactly to 100% due to rounding.)

Anglo-Saxon (%) Non-Anglo-Saxon (%) Unknown (%) Group A 67.0 17.7 15.3 Group B 56.1 25.8 18.1 Group C 72.9 13.2 13.9 Group D 58.6 22.0 19.3

Keep in mind that “unknown” words represent shortcomings in my database more than anything else.

The Game

The Baby BNC is organized into groups of academic, conversations, fiction, and news. Groups A, B, C, and D each represent one of those groups. Which do you think is which?

Click below to reveal the answer to the game and a discussion of the results.

Show The Answers

Answers Anglo-Saxon (%) Non-Anglo-Saxon (%) Unknown (%) A = Fiction 67.0 17.7 15.3 B = Academic 56.1 25.8 18.1 C = Conversations 72.9 13.2 13.9 D = News 58.6 22.0 19.3 Discussion

With the hubris that only 20/20 hindsight can provide, I’ll say that I don’t find these numbers terribly surprising. Conversations have the highest proportion of Anglo-Saxon (72.9%) and the lowest of non-Anglo-Saxon (13.2%). Conversations are apt to use common words, and the 100 most common words in English are about 95% Anglo-Saxon. The relatively fast pace of conversation doesn’t encourage speakers to pause to search for those uncommon words lest they bore their listener or lose their chance to speak. I think the key here is not the fact that conversations are spoken, but that they’re impromptu. (Impromptu if you’re feeling French, off-the-cuff if you’re more Middle-English-y, or extemporaneous if you want to go full bore Latin.)

Academic writing is on the opposite end of the statistics, with the lowest portion of Anglo-Saxon words (56.1%) and the highest non-Anglo-Saxon (25.8%). Academic writing tends to be more ambitious and precise. Stylistically, it doesn’t shy away from more esoteric words because its audience is, by definition, well-educated. It doesn’t need to stick to the common core of English to get its point across. In addition, those who shaped academia were the educated members of society, and for many centuries education was tied to the church or limited to the gentry, and both spoke a lot of Latin and French. That has probably influenced even the modern day culture of academic writing.

Two of the academic samples managed to use fewer than half Anglo-Saxon words. They are a sample from Colliding Plane Waves in General Relativity (a subject Anglo-Saxons spent little time discussing, I’ll wager) and a sample from The Lancet, the British medical journal (49% and 47% Anglo-Saxon, respectively). It’s worth noting that these samples also displayed highest and 5th highest percentage of words of unknown etymology (26% and 21%, respectively) of the 30 samples in this category. A higher proportion of unknowns depresses the results in the other two categories.

Fiction rests in the middle of this small group of 4 categories, and I’m a little surprised that the percentage of Anglo-Saxon is as high as it is. I feel like fiction lends itself to the kind of description that tends to use more non-Anglo-Saxon words, but in this sample it’s not all that different from conversation.

News stands out for having barely more Anglo-Saxon words than academic writing, and also the highest percentage of words of unknown etymological origin. The news samples are drawn principally from The Independent, The Guardian, The Daily Telegraph, The Belfast Telegraph, The Liverpool Daily Post and Echo, The Northern Echo, and The Scotsman. It would be interesting to analyze each of these groups independently to see if they differ significantly.


My hypothesis that conversations have a high percentage of Anglo-Saxon words because they’re off-the-cuff rather than because they’re spoken is something I can challenge with another experiment. Speeches are also spoken, but they’re often written in advance, without the pressure of immediacy, so the author would have time to reach for a thesaurus. I predict speeches will have an Anglo-Saxon/non-Anglo-Saxon profile closer to that of fiction than of either of the extremes in this data. It might vary dramatically based on speaker and audience, so I’ll have to choose a broad sample to smooth out biases.

I would also like to work with the American National Corpus.

Stay tuned, and let me know in the comments if you have observations or suggestions!





Categories: FLOSS Project Planets

Django Weblog: DjangoCon US Schedule Is Live

Planet Python - Thu, 2017-06-22 13:46

We are less than two months away from DjangoCon US in Spokane, WA, and we are pleased to announce that our schedule is live! We received an amazing number of excellent proposals, and the reviewers and program team had a difficult job choosing the final talks. We think you will love them. Thank you to everyone who submitted a proposal or helped to review them.

Tickets for the conference are still on sale! Check out our website for more information on which ticket type to select. We have also announced our tutorials. They are $150 each, and may be purchased at the same place as the conference tickets.

DjangoCon US will be held August 13-18 at the gorgeous Hotel RL in downtown Spokane. Our hotel block rate expires July 11, so reserve your room today!

Categories: FLOSS Project Planets

Mike Driscoll: Book Review: Software Architecture with Python

Planet Python - Thu, 2017-06-22 13:15

Packt Publishing approached me about being a technical reviewer for the book, Software Architecture with Python by Anand Balachandran Pillai. It sounded pretty interesting so I ended up doing the review for Packt. They ended up releasing the book in April 2017.

Quick Review
  • Why I picked it up: Packt Publishing asked me to do a technical review of the book
  • Why I finished it: Frankly because this was a well written book covering a broad range of topics
  • I’d give it to: Someone who is learning how to put together a large Python based project or application
  • Book Formats

    You can get this as a physical soft cover, Kindle on Amazon or various other eBook formats via Packt Publishing’s website.

    Book Contents

    This book has 10 chapters and is 556 pages long.

    Full Review

    The focus of this book is to educate the reader on how they might design and architect a highly scalable, robust application in Python. The first chapter starts off by going over the author’s ideas on the “principles of software architecture” and what they are. This chapter has no code examples whatsoever. It is basically all theory and basically sets up what the rest of the book will be covering.

    Chapter two is all about writing readable code that is easy to modify. It teaches some techniques for writing readable code and touches on recommendations regarding documentation, PEP8, refactoring, etc. It also teaches the fundamentals of writing modifiable code. Some of the techniques demonstrated in this chapter include abstracting common services, using inheritance, and late binding. It also discusses the topic of code smells.

    Chapter three clocks in at almost 50 pages and is focused on making testable code. While you can’t really teach testing in just one chapter, it does talk about such things as unit testing, using nose2 and py.test, code coverage, mocking and doctests. There is also a section on test driven development.

    In chapter four, we learn about getting good performance from our code. This chapter is about timing code, code profiling and high performance containers in Python. It covers quite a few modules / packages, such as cProfile, line profiler, memory profiler, objgraph and Pympler.

    For chapter five, we dig into the topic of writing applications that can scale. This chapter has some good examples and talks about the differences between concurrency, parallelism, multithreading vs multiprocessing and Python’s new asyncio module. It also discusses the Global Interpreter Lock (GIL) and how it effects Python’s performance in certain situations. Finally the reader will learn about scaling for the web and using queues, such as Celery.

    If you happen to be interested in security in Python, then chapter 6 is for you. It covers various types of security vulnerabilities in software in general and then talks about what the author sees as security problems in Python itself. It also discusses various coding strategies that help the developer write secure code.

    Chapter seven delves in to the subject of design patterns and is over 70 pages long. You will learn about such things as the singleton, factory, prototype, adapter, facade, proxy, iterator, observer and state patterns. This chapter does a nice job of giving an overview of design patterns, but I think a book that focuses a chapter per design pattern would be really interesting and really help drive the point home.

    Moving on, we get to chapter 8 which talks about “architectural patterns”. Here we learn about Model View Controller (MVC), which is pretty popular in the web programming sphere. We also learn a bit about event driven programming using twisted, eventlet, greenlet and Gevent. I usually think of a user interface using something like PyQt or wxPython when I think of event driven programming, but either way the concepts are the same. There is also a section on microservices in this chapter.

    Chapter nine’s focus is on deploying your Python applications. Here you will learn about using pip, virtualenv, PyPI, and PyPA. You will also learn a little about Fabric and Ansible in this chapter.

    The last chapter covers the techniques for debugging your applications. He starts with the basic print statement and moves on to using mocks and the logging module. He also talks about using pdb and similar tools such as iPdb and pdb++. The chapter is rounded out with sections on the trace module, the lptrace package and the strace package.

    This book is a bit different from your usual Python book in that it’s not really focused on the beginner. Instead we have professional software developer with nearly two decades of experience outlining some of the techniques he has used in creating his own applications at big companies. While there are a few minor grammatical issues here and there, overall I found this to be a pretty interesting book. I’m not saying that because I was a technical reviewer of the book. I have panned some of the books I have been a technical reviewer for in the past. This one is actually quite good and I would recommend it to anyone who wants to learn more about real world software development. It’s also good for people who want to learn about concurrency or design patterns.

    Software Architecture with Python

    by Anand Balachandran Pillai

    Amazon, Packt Publishing

    Other Book Reviews

    Categories: FLOSS Project Planets

    EuroPython: EuroPython 2017: Call for on-site volunteers

    Planet Python - Thu, 2017-06-22 11:29

    Would you like to be more than a participant and contribute to make this 2017 edition of EuroPython a smooth success? Help us!

    We have a few tasks that are open for attendees who would like to volunteer: fancy helping at the registration desk? Willing to chair a session? Find out how you can contribute and which task you can commit to. 

    What kind of qualifications do you need?

    English is a requirement. More languages are an advantage. Check our webpage or write us for any further information. 

    The conference ticket is a requirement. We cannot give you a free ticket, but we would like to thank you with one of our volunteer perks.

    How do you sign up?

    You can sign up for activities on our EuroPython 2017 Volunteer App.

    We really appreciate your help!


    EuroPython 2017 Team
    EuroPython Society
    EuroPython 2017 Conference

    Categories: FLOSS Project Planets

    PyCharm: PyCharm Edu 4 EAP: Integration with Stepik for Educators

    Planet Python - Thu, 2017-06-22 10:52

    PyCharm Educational Edition rolls out an Early Access Program update – download PyCharm Edu 4 EAP2 (build 172.3049).

    Integration with Stepik for Educators

    In 2016 we partnered with Stepik, a learning management and MOOC platform, to announce the Adaptive Python course. But if you want to create your own Python course with the help of PyCharm Edu, integration with Stepik may help you easily keep up your learning materials and share them with your students.

    Let’s take a simple example based on the Creating a Course with Subtasks tutorial and look at the integration features in more detail.

    Uploading a New Course

    Assume you’ve created a new course, added some lessons and checked the tasks:

    Now you want to test the new course and share it with your students. Using Stepik as course platform is a great choice, thanks to integration with PyCharm Edu. First, you’ll need to create an account and log in:

    Going back to PyCharm Edu, you can now see a special Stepik icon in the Status Bar:

    Use the link Log in to Stepik to be redirected to Stepik.org and authorize PyCharm Edu:

    The Stepik Status Bar icon will be enabled after you authorize the course:

    Now you can upload the course to Stepik:

    Updating a Course

    Once a course is created and uploaded to Stepik, you can always add or change lessons or add subtasks to it, as we do in our example:

    The whole course, a lesson or just a single task can be updated any time you want to save you changes on Stepik:

    Sharing a Course with Learners

    Stepik allows educators to manage their courses: you can make your course visible to everyone, or invite your students privately (students need to have a Stepik account):

    Learners that have been invited to join the course can go to PyCharm Edu Welcome Screen | Browse Courses and log in to Stepik with a special link:

    The course is now available in the list:

    There you go. Let us know how you like this workflow! Share your feedback here in the comments or report your findings on YouTrack, to help us improve PyCharm Edu.

    To get all EAP builds as soon as we publish them, set your update channel to EAP (go to Help | Check for Updates, click the ‘Updates’ link, and then select ‘Early Access Program’ in the drop-down). To keep all your JetBrains tools updated, try JetBrains Toolbox App!

    Your PyCharm Edu Team

    Categories: FLOSS Project Planets

    Using Compiler Explorer with Qt

    Planet KDE - Thu, 2017-06-22 10:14

    One of my preferred developer tools is a web called Compiler Explorer. The tool itself is excellent and useful when trying to optimize your code.
    The author of the tool describes it in the Github repository as:

    Compiler Explorer is an interactive compiler. The left-hand pane shows editable C/C++/Rust/Go/D/Haskell code. The right, the assembly output of having compiled the code with a given compiler and settings. Multiple compilers are supported, and the UI layout is configurable (the Golden Layout library is used for this). There is also an ispc compiler for a C variant with extensions for SPMD.

    The main problem I found with the tool is, it does not allow to write Qt code. I need to remove all the Qt includes, modify and remove a lot of code…

    So I decided to modify the tool to be able to find the Qt headers. To do that first of all, we need to clone the source code:

    git clone git@github.com:mattgodbolt/compiler-explorer.git

    The application is written using node.js, so make sure you have it installed before starting.

    The next step is to modify the options line in etc/config/c++.defaults.properties:

    -fPIC -std=c++14 -isystem /opt/qtbase_dev/include -isystem /opt/qtbase_dev/include/QtCore

    you need to change /opt/qtbase_dev with your own Qt build path.

    Then simply call make in the root folder, and the application starts running on port 10240 (by default).

    And the mandatory screenshoots ��

    The post Using Compiler Explorer with Qt appeared first on Qt Blog.

    Categories: FLOSS Project Planets

    John Goerzen: First Experiences with Stretch

    Planet Debian - Thu, 2017-06-22 09:19

    I’ve done my first upgrades to Debian stretch at this point. The results have been overall good. On the laptop my kids use, I helped my 10-year-old do it, and it worked flawlessly. On my workstation, I got a kernel panic on boot. Hmm.

    Unfortunately, my system has to use the nv drivers, which leaves me with an 80×25 text console. It took some finagling (break=init in grub, then manually insmoding the appropriate stuff based on modules.dep for nouveau), but finally I got a console so I could see what was breaking. It appeared that init was crashing because it couldn’t find liblz4. A little digging shows that liblz4 is in /usr, and /usr wasn’t mounted. I’ve filed the bug on systemd-sysv for this.

    I run root on ZFS, and further digging revealed that I had datasets named like this:

    • tank/hostname-1/ROOT
    • tank/hostname-1/usr
    • tank/hostname-1/var

    This used to be fine. The mountpoint property of the usr dataset put it at /usr without incident. But it turns out that this won’t work now, unless I set ZFS_INITRD_ADDITIONAL_DATASETS in /etc/default/zfs for some reason. So I renamed them so usr was under ROOT, and then the system booted.

    Then I ran samba not liking something in my bind interfaces line (to be fair, it did still say eth0 instead of br0). rpcbind was failing in postinst, though a reboot seems to have helped that. More annoying was that I had trouble logging into my system because resolv.conf was left empty (despite dns-* entries in /etc/network/interfaces and the presence of resolvconf). I eventually repaired that, and found that it kept removing my “search” line. Eventually I removed resolvconf.

    Then mariadb’s postinst was silently failing. I eventually discovered it was sending info to syslog (odd), and /etc/init.d/apparmor teardown let it complete properly. It seems like there may have been an outdated /etc/apparmor.d/cache/usr.sbin.mysql out there for some reason.

    Then there was XFCE. I use it with xmonad, and the session startup was really wonky. I had to zap my sessions, my panel config, etc. and start anew. I am still not entirely sure I have it right, but I at do have a usable system now.

    Categories: FLOSS Project Planets

    Drupal Association blog: Announcement: Board Meeting and Executive Session - June 28, 2017

    Planet Drupal - Thu, 2017-06-22 08:53

    On June 28, 2017 at 12:00 PDT/20:00 BST, The Drupal Association will host a one-hour virtual board meeting for the public to attend. It will be followed by an executive session, which is a private session for the board members.  We invite the public to join our board meeting via zoom or you can dial in with the following information:

    Board Meeting Agenda

    The Board Meeting Agenda includes:

    • An Executive Update covering the following topics and speakers

      • Community Discussions update from Whitney Hess

      • DrupalCon Baltimore Wrap

      • DrupalCon RFP update

      • Marketing Initiative / Review of Drupal.org privacy policy

      • Drupal.org Infrastructure RFP Update

    • Financial Update from Summit CPA

    • Q&A with the Drupal Association board

    • Q&A with the community attendees

    • The Board votes to approve Jan - April 2017 financial statements

    After the meeting, we will post a blog that shares more details about the meeting and we wil post the board materials and meeting minutes here.

    Executive Session Agenda

    While the The Executive Session is a private meeting amongst board members, we want to provide insight into what the agenda topics will be.

    • The Finance Committee will provide an overview of the 2016 financial audit and answer questions.

    • Discuss Drupal Association Board Executive Committee composition for 2017-2018 term.

    • The Governance Committee will provide an update and recommendation on how the Drupal Association can continue to support the community as they determine how to evolve community governance.

    • The Nominating Committee will provide an update on the progress with identifying new board member candidates for the three seats that expire in November 2017. Learn more about the Drupal Association board here.

    We hope you can join us to learn more about Drupal Association operations and to have your questions answered by the Drupal Association Board and staff.

    Categories: FLOSS Project Planets

    Amazee Labs: Lead Developer UK Conference 2017, Day 2

    Planet Drupal - Thu, 2017-06-22 08:34
    Lead Developer UK Conference 2017, Day 2

    This is part 2 of my summary from the Lead Developer UK conference. If you want to refresh your memory about what happened on Day 1 you can skip back for part 1, or alternatively continue reading about my highlights from the second day of this outstanding conference.

    admin Thu, 06/22/2017 - 14:34

    Kevin GoldsmithFail Fast, Fail Smart, Succeed started day two with the recommendation that we shouldn’t punish failure but we should make sure that we learn from our mistakes. Nothing can be more harmful than a culture that prevents talking about failure. Instead, when we learn to talk about our mistakes, others and ourselves will be able to get better much faster. I liked Kevin’s recommendation about creating a shared repository for the team to collect learnings they have made along the way.

    Fail Safe, Fail Smart, Succeed from Kevin Goldsmith

    Mathias MeyerBuilding and Scaling a Distributed and Inclusive Team gave some valuable insights into his experience at Travis CI. Having the team distributed across continents creates challenges such as when cultural mentalities differ, i.e. some would expect more direct communication while others are used to talk less directly about issues (remember ask vs. guess cultures from part 1?).

    I liked the idea of setting up a lot of decision making processes asynchronously via github pull requests, so that team members can contribute at their individual pace. Also, Travis is using special incident response channels for teams on Slack where they collaborate on important tasks in a timely manner.

    Randall KoutnikImplementers, Solvers, and Finders: Rethinking the Developer Career Path encouraged the audience to think beyond the classical categories of Junior, Regular and Senior developers. At a first stage, an implementer would give a solution specification and make it happen.

    To level up, developers would become solvers that come up with their own solutions to given problems and in the latest stage, they would find their own problems. Think about providing context like a problem space or a given product and you delegate more responsibility to that person so she will need to find possible problems herself. 

    Carly RobinsonMentoring Junior Engineers @ Slack HQ shared her personal career path and how she was mentored as a junior. Small startups often struggle with the task of providing the necessary mentorship for their juniors, so it was great to see such a success story. Carly mentioned that for her mentorship is a relationship and you need to establish a good foundation upfront between the mentor and the mentee. Setting goals, tracking progress and acknowledging success are important tools for successful mentorship.

    Similarly, being aware of your own emotions is important when reviewing another person’s work. Your initial reaction might be “This is dumb, I know how to fix this.” Instead, by being able to step back when having that reaction and reframing it into something like “Why did that person do that thing?” may allow you to reflect and discover the underlying issues and help come to a solution more collaboratively. 

    Overall, I got back from the Lead Developer conference with a lot of inspiration. It’s great to see that so many successful leaders talk about the same topics and mention that it’s worthwhile focusing on problems I face and try to tackle them everyday. For me, growing leadership skills is a continuous effort that takes a lot of self reflection and discipline. It might be easy to agree that points like “giving positive feedback” is the right thing to do, but implementing it into one self’s daily practice takes effort and practice.

    Slides of all talks mentioned above and more can be found on the conference website. I’d like to thank the whole organizing team for setting-up an incredible line-up and making sure the code of conduct doesn’t feel like something added as a afterthought, but ensuring diversity & inclusion was something that was really to the core of the Lead Developer conference. Next year’s events will happen in Austin, New York and London.

    Categories: FLOSS Project Planets

    Drupal Association blog: Drupal Association Q4 2016 Financial Update

    Planet Drupal - Thu, 2017-06-22 08:24

    As part of the Drupal Association Board's duty, board members met in April and approved the Q4 2016 financial statements. Now, we are able to share them with the community to provide transparency and clarity. You can find the financial statements here, which include the Income Statement and Balance Sheet for each month. Our cash balances are located on the balance sheet, located in the Asset section (first part of the balance sheet) called "cash and cash equivalents.

    In this blog post, we will answer the following questions:

    1. How did we perform financially this quarter?

    2. How did we perform financially through the end of 2016?

    3. How can we perform better in 2017?

    Setting Performance Metrics

    To answer #1 and #2, we need to know what success looks like. As they say, if you can’t measure it, you can’t manage it. The Drupal Association works with a virtual CFO firm called Summit CPA, who creates our monthly financial reports as well as sets our financial KPIs, making sure we are working towards goals that ensure the Drupal Association’s sustainability.

    Since the Drupal Association’s cash reserves were depleted due to investments in Drupal.org improvements especially to support Drupal 8’s release, Summit recommends that we rebuild our cash reserves and achieve a cash reserve KPI of 15%-30% of estimated twelve-month revenue. Since Drupal’s revenue and expenditures drastically fluctuate from month to month due to DrupalCon’s large cash outlay, a cash reserve goal closer to 30% is the ideal goal.

    To rebuild our cash reserves, we need to create an operating profit to fill the reserve. To do this, Summit recommends that our second KPI is to achieve a Net Income Margin of 10%.

    Q4 2016 Performance

    Since Q4 2016 is near the beginning of our financial turnaround, we will see improvements with both KPIs over time. It is also important to note that Q4 is historically when our cash is lowest. It is the period of time that is between DrupalCon Europe, which operated at a loss, and DrupalCon North America, which rebuilds our cash since it is a profitable event.

    Below is our KPI progress in Q4 2016.
















    2016 End of Year Performance

    2016 was a challenging year financially as we drastically reduced costs by laying off 40% of our staff and eliminating our Portland office. While these corrections were difficult, they set the organization on a sustainable path.

    While we continued to remain cash positive by the end of 2016 (see Cash Flow chart below), we operated at a loss, which was anticipated. In positive news, we reduced the losses by about $145,000 (see Forecast vs Actual table below).

    Chart: Cash Flow

    (*This chart shows the Drupal Association’s cash flow. It uses actual data from January 2015 to December 2016 and uses forecasted data from January 2017 to April 2017. )

    Table: 2016 Actual vs Forecast

    Areas of focus in 2017

    With these 2016 improvements in place, 2017 is positioned to be a healthier year financially for the Drupal Association. To ensure a stronger year, we conducted a margin analysis of our programs to see where we need to focus.

    From this study, we found several areas to focus in 2017 that create value for the community while also improving our financial health. Areas of focus include:

    • Make DrupalCon Europe a financially sustainable event that continues to provide value

    • Grow DrupalCon North America attendance through improved marketing and attracting more end users with customer content such as industry summits and case studies.

    • Create more value for Supporting Partners to grow participation and create a program for End Users to join.

    • Improve the Drupal adoption journey off of the Drupal.org front page by including content from Drupal businesses that provide value for the visitors and branding or leads for the Drupal businesses who provide the content.

    • Identify ways to reduce costs associated with Drupal.org by studying the sites and services the Drupal Association provides to see if we can reduce associated costs.

    We are hard at work making the above improvements and starting to see encouraging results. Starting with our 2017 quarterly updates, we will provide more clarity into our financial portfolio and how each program performed.

    File attachments:  image1.png image2.png Drupal Association - Q4 2016 Financial Statements (1).pdf
    Categories: FLOSS Project Planets
    Syndicate content