FLOSS Project Planets

Paul Everitt: Faster relevance ranking didn’t make it into PostgreSQL 9.4

Planet Python - Wed, 2014-10-29 13:34

Alas, the one big feature we really needed, the patch apparently got rejected.

PostgreSQL has a nice little full-text search story, especially when you combine it with other parts of our story (security-aware filtering of results, transactional integrity, etc.) Searches are very, very fast.

However, the next step — ranking the results — isn’t so fast. It requires a table scan (likely to TOAST files, meaning read a file and gunzip its contents) on every row that matched.

In our case, we’re doing prefix searches, and lots and lots of rows match. Lots. And the performance is, well, horrible. Oleg and friends had a super-fast speedup for this ready for PostgreSQL 9.4, but it apparently got rejected.

So we’re stuck. It’s too big a transition to switch to ElasticSearch or something. The customer probably should bail on prefix searching (autocomplete) but they won’t. We have an idea for doing this the right way (convert prefixes to candidate full words, as Google does, using PG’s built-in lexeme tools) but that is also too much for budget. Finally, we don’t have the option to throw SSDs at it.


Categories: FLOSS Project Planets

Metal Toad: Seeing Long Term Technology Adoption as Evolution

Planet Drupal - Wed, 2014-10-29 13:31

Much like an evolutionary tree our goal in technology adoption is too continue to move forward and evolve, rather than getting caught in a dead end.  In the natural world, becoming bigger can be good but can lead to extinction events should the environment or food source change.  Right now we are in a technology Jurassic...

Categories: FLOSS Project Planets

Andriy Kornatskyy: wheezy web: RESTful API Design

Planet Python - Wed, 2014-10-29 13:18
In this article we are going to explore a simple RESTful API created with wheezy.web framework. The demo implements a CRUD for tasks. Includes entity validation, content caching with dependencies and functional test cases. The source code is structured with well defined actors (you can read more about it here). Design The following convention is used with respect to operation, HTTP method (verb
Categories: FLOSS Project Planets

PyPy Development: A Field Test of Software Transactional Memory Using the RSqueak Smalltalk VM

Planet Python - Wed, 2014-10-29 12:55
Extending the Smalltalk RSqueakVM with STM

by Conrad Calmez, Hubert Hesse, Patrick Rein and Malte Swart supervised by Tim Felgentreff and Tobias Pape

Introduction

After pypy-stm we can announce that through the RSqueakVM (which used to be called SPyVM) a second VM implementation supports software transactional memory. RSqueakVM is a Smalltalk implementation based on the RPython toolchain. We have added STM support based on the STM tools from RPython (rstm). The benchmarks indicate that linear scale up is possible, however in some situations the STM overhead limits speedup.

The work was done as a master's project at the Software Architechture Group of Professor Robert Hirschfeld at at the Hasso Plattner Institut at the University of Potsdam. We - four students - worked about one and a half days per week for four months on the topic. The RSqueakVM was originally developped during a sprint at the University of Bern. When we started the project we were new to the topic of building VMs / interpreters.

We would like to thank Armin, Remi and the #pypy IRC channel who supported us over the course of our project. We also like to thank Toni Mattis and Eric Seckler, who have provided us with an initial code base.

Introduction to RSqueakVM

As the original Smalltalk implementation, the RSqueakVM executes a given Squeak Smalltalk image, containing the Smalltalk code and a snapshot of formerly created objects and active execution contexts. These execution contexts are scheduled inside the image (greenlets) and not mapped to OS threads. Thereby the non-STM RSqueakVM runs on only one OS thread.

Changes to RSqueakVM

The core adjustments to support STM were inside the VM and transparent from the view of a Smalltalk user. Additionally we added Smalltalk code to influence the behavior of the STM. As the RSqueakVM has run in one OS thread so far, we added the capability to start OS threads. Essentially, we added an additional way to launch a new Smalltalk execution context (thread). But in contrast to the original one this one creates a new native OS thread, not a Smalltalk internal green thread.

STM (with automatic transaction boundaries) already solves the problem of concurrent access on one value as this is protected by the STM transactions (to be more precise one instruction). But there are cases were the application relies on the fact that a bigger group of changes is executed either completely or not at all (atomic). Without further information transaction borders could be in the middle of such a set of atomic statements. rstm allows to aggregate multiple statements into one higher level transaction. To let the application mark the beginning and the end of these atomic blocks (high-level transactions), we added two more STM specific extensions to Smalltalk.

Benchmarks

RSqueak was executed in a single OS thread so far. rstm enables us to execute the VM using several OS threads. Using OS threads we expected a speed-up in benchmarks which use multiple threads. We measured this speed-up by using two benchmarks: a simple parallel summation where each thread sums up a predefined interval and an implementation of Mandelbrot where each thread computes a range of predefined lines.

To assess the speed-up, we used one RSqueakVM compiled with rstm enabled, but once running the benchmarks with OS threads and once with Smalltalk green threads. The workload always remained the same and only the number of threads increased. To assess the overhead imposed by the STM transformation we also ran the green threads version on an unmodified RSqueakVM. All VMs were translated with the JIT optimization and all benchmarks were run once before the measurement to warm up the JIT. As the JIT optimization is working it is likely to be adoped by VM creators (the baseline RSqueakVM did that) so that results with this optimization are more relevant in practice than those without it. We measured the execution time by getting the system time in Squeak. The results are:

Parallel Sum Ten Million Benchmark Parallel Sum 10,000,000 Thread Count RSqueak green threads RSqueak/STM green threads RSqueak/STM OS threads Slow down from RSqueak green threads to RSqueak/STM green threads Speed up from RSqueak/STM green threads to RSQueak/STM OS Threads 1 168.0 ms 240.0 ms 290.9 ms 0.70 0.83 2 167.0 ms 244.0 ms 246.1 ms 0.68 0.99 4 167.8 ms 240.7 ms 366.7 ms 0.70 0.66 8 168.1 ms 241.1 ms 757.0 ms 0.70 0.32 16 168.5 ms 244.5 ms 1460.0 ms 0.69 0.17
Parallel Sum One Billion Benchmark Parallel Sum 1,000,000,000
Thread CountRSqueak green threadsRSqueak/STM green threadsRSqueak/STM OS threadsSlow down from RSqueak green threads to RSqueak/STM green threadsSpeed up from RSqueak/STM green threads to RSQueak/STM OS Threads 1 16831.0 ms 24111.0 ms 23346.0 ms 0.70 1.03 2 17059.9 ms 24229.4 ms 16102.1 ms 0.70 1.50 4 16959.9 ms 24365.6 ms 12099.5 ms 0.70 2.01 8 16758.4 ms 24228.1 ms 14076.9 ms 0.69 1.72 16 16748.7 ms 24266.6 ms 55502.9 ms 0.69 0.44
Mandelbrot Iterative Benchmark Mandelbrot Thread Count RSqueak green threads RSqueak/STM green threads RSqueak/STM OS threads Slow down from RSqueak green threads to RSqueak/STM green threads Speed up from RSqueak/STM green threads to RSqueak/STM OS Threads 1 724.0 ms 983.0 ms 1565.5 ms 0.74 0.63 2 780.5 ms 973.5 ms 5555.0 ms 0.80 0.18 4 781.0 ms 982.5 ms 20107.5 ms 0.79 0.05 8 779.5 ms 980.0 ms 113067.0 ms 0.80 0.01
Discussion of benchmark results

First of all, the ParallelSum benchmarks show that the parallelism is actually paying off, at least for sufficiently large embarrassingly parallel problems. Thus RSqueak can also benefit from rstm.

On the other hand, our Mandelbrot implementation shows the limits of our current rstm integration. We implemented two versions of the algorithm one using one low-level array and one using two nested collections. In both versions, one job only calculates a distinct range of rows and both lead to a slowdown. The summary of the state of rstm transactions shows that there are a lot of inevitable transactions (transactions which must be completed). One reason might be the interactions between the VM and its low-level extensions, so called plugins. We have to investigate this further.

Limitations

Although the current VM setup is working well enough to support our benchmarks, the VM still has limitations. First of all, as it is based on rstm, it has the current limitation of only running on 64-bit Linux.

Besides this, we also have two major limitations regarding the VM itself. First, the atomic interface exposed in Smalltalk is currently not working, when the VM is compiled using the just-in-time compiler transformation. Simple examples such as concurrent parallel sum work fine while more complex benchmarks such as chameneos fail. The reasons for this are currently beyond our understanding. Second, Smalltalk supports green threads, which are threads which are managed by the VM and are not mapped to OS threads. We currently support starting new Smalltalk threads as OS threads instead of starting them as green threads. However, existing threads in a Smalltalk image are not migrated to OS threads, but remain running as green threads.

Future work for STM in RSqueak The work we presented showed interesting problems, we propose the following problem statements for further analysis:
  • Inevitable transactions in benchmarks. This looks like it could limit other applications too so it should be solved.
  • Collection implementation aware of STM: The current implementation of collections can cause a lot of STM collisions due to their internal memory structure. We believe it could bear potential for performance improvements, if we replace these collections in an STM enabled interpreter with implementations with less STM collisions. As already proposed by Remi Meier, bags, sets and lists are of particular interest.
  • Finally, we exposed STM through languages features such as the atomic method, which is provided through the VM. Originally, it was possible to model STM transactions barriers implicitly by using clever locks, now its exposed via the atomic keyword. From a language design point of view, the question arises whether this is a good solution and what features an stm-enabled interpreter must provide to the user in general? Of particular interest are for example, access to the transaction length and hints for transaction borders to and their performance impact.
    Details for the technically inclined
    • Adjustments to the interpreter loop were minimal.
    • STM works on bytecode granularity that means, there is a implicit transaction border after every bytecode executed. Possible alternatives: only break transactions after certain bytecodes, break transactions on one abstraction layer above, e.g. object methods (setter, getter).
    • rstm calls were exposed using primtives (a way to expose native code in Smalltalk), this was mainly used for atomic.
    • Starting and stopping OS threads is exposed via primitives as well. Threads are started from within the interpreter.
    • For Smalltalk enabled STM code we currently have different image versions. However another way to add, load and replace code to the Smalltalk code base is required to make a switch between STM and non-STM code simple.
      Details on the project setup

      From a non-technical perspective, a problem we encountered was the huge roundtrip times (on our machines up to 600s, 900s with JIT enabled). This led to a tendency of bigger code changes ("Before we compile, let's also add this"), lost flow ("What where we doing before?") and different compiled interpreters in parallel testing ("How is this version different from the others?") As a consequence it was harder to test and correct errors. While this is not as much of a problem for other RPython VMs, RSqueakVM needs to execute the entire image, which makes running it untranslated even slower.

      Summary

      The benchmarks show that speed up is possible, but also that the STM overhead in some situations can eat up the speedup. The resulting STM-enabled VM still has some limitations: As rstm is currently only running on 64-bit Linux the RSqueakVM is doing so as well. Eventhough it is possible for us now to create new threads that map to OS threads within the VM, the migration of exiting Smalltalk threads keeps being problematic.

      We showed that an existing VM code base can benefit of STM in terms of scaling up. Further it was relatively easy to enable STM support. This may also be valuable to VM developers considering to get STM support for their VMs.

      Categories: FLOSS Project Planets

      Patrick Matthäi: geoip and geoip-database news!

      Planet Debian - Wed, 2014-10-29 11:43

      Hi,

      geoip version 1.6.2-2 and geoip-database version 20141027-1 are now available in Debian unstable/sid, with some news of more free databases available :)

      geoip changes:

      * Add patch for geoip-csv-to-dat to add support for building GeoIP city DB. Many thanks to Andrew Moise for contributing! * Add and install geoip-generator-asn, which is able to build the ASN DB. It is a modified version from the original geoip-generator. Much thanks for contributing also to Aaron Gibson! * Bump Standards-Version to 3.9.6 (no changes required).

      geoip-database changes:

      * New upstream release. * Add new databases GeoLite city and GeoLite ASN to the new package geoip-database-extra. Also bump build depends on geoip to 1.6.2-2. * Switch to xz compression for the orig tarball.

      So much thanks to both contributors!

      Categories: FLOSS Project Planets

      Stefane Fermigier: Abilian annonce le début de la commercialisation de son extranet métier dédié aux clusters et aux pôles de compétitivité

      Planet Apache - Wed, 2014-10-29 11:03

      A l'occasion de l'Open World Forum 2014, Abilian, éditeur de solutions open source au service de la compétitivité des entreprises, dont je suis le fondateur et le CEO, annonce le lancement de la commercialisation de son offre dédiée aux clusters et aux pôles de compétitivité: Abilian SICS-PC (Système d'Information Collaboratif Sécurisé pour les Pôles de Compétitivité).

      La suite sur le site d'Abilian.

      Categories: FLOSS Project Planets

      Ed Crewe: Fixing third party Django packages for Python 3

      Planet Python - Wed, 2014-10-29 10:44
      With the release of Django 1.7 it could be argued that the balance has finally tipped towards Python 3 being its preferred platform. Well given Python 2.7 is the last 2.* then its probably time we all thought about moving to Python 3 for our Django deployments.

      Problem is those pesky third party package developers, because unless you are determined wheel reinventor (unlikely if you use Django!) - you are bound to have a range of third party eggs in your Django sites. As one of those pesky third party developers myself, it is about time I added Python 3 compatibility to my Django open source packages.

      There are a number of resources related to porting Python from 2 to 3, including specifically for Django, but hopefully this post may still prove useful as a summarised approach for doing it for your Django projects or third party packages. Hopefully it isn't too much work and if you have been writing Python as long as me, it may also get you out of any legacy syntax  habits you have.

      So lets get started, first thing is to set up Django 1.7 with Python 3
      For repeatable builds we want pip and virtualenv - if they are not there.
      For a linux platform such as Ubuntu you will have python3 installed as standard (although not yet the default python) so if you just add pip3 that lets you add the rest ...

      Install Python 3 and Django for testing
      sudo apt-get install python3-pip
      (OR sudo easy_install3 pip)
      sudo pip3 install virtualenv


      So now you can run virtualenv with python3 in addition to the default python (2.*)

      virtualenv --python=python3 myenv3
      cd myenv3
      bin/pip install django


      Then add a src directory for putting the egg in you want to make compatible with Python 3 so an example from git (of course you can do this as one pip line if the source is in git)


      mkdir src
      git clone https://github.com/django-pesky src/django-pesky
      bin/pip install -e src/django-pesky


      Then run the django-pesky tests (assuming nobody uses an egg without any tests!)
      so the command to run pesky's test may be something like the following ...

      bin/django-admin.py test pesky.tests --settings=pesky.settings
      One rather disconcerting thing that you will notice with tests is that the default assertEqual message is truncated in Python 3 where it wasn't in Python 2 with a count of the missing characters in square brackets, eg.

      AssertionError: Lists differ: ['Failed to open file /home/jango/myenv/sr[85 chars]tem'] != []


      Common Python 2 to Python 3 errors
      And wait for those errors. The most common ones are:

      1. print statement without brackets
      2. except Error as err (NOT except Error, err)
      3. File open and file methods differ.
        Text files require better quality encoding - so more files default to bytes because strings in Python 3 are all stored in unicode
        (On the down side this may need more work for initial encoding clean up *,
        but on the plus side functional errors due to bad encoding are less likely to occur)
      4. There is no unicode() method in Python 3 since all strings are now unicode - ie. its become str() and hence strings no longer need the u'string' marker 
      5. Since unicode is not available as a method, it is not used for Django models default representation. Hence just using
        def __str__(self):
                return self.name
        is the future proofed method. I actually found that models with __unicode__ and __str__ methods may not return any representation, rather than the __str__ one being used, as one might assume, in Django 1.7 and Python 3
      6. dictionary has_key has gone, must use in (if key in dict)

      * I found more raw strings were treated as bytes by Python 3 and these then required raw_string.decode(charset) to avoid them going into the database string (eg. varchar) fields as pseudo-bytes, ie. strings that held 'élément' as '\xc3\xa9l\xc3\xa9ment' rather than bytes, ie. b'\xc3\xa9l\xc3\xa9ment'

      Ideally you will want to maintain one version but keep it compatible with Python 2 and 3,
      since this is both less work and gets you into the habit of writing transitional Python :-)

      Test the same code against Python 2 and 3
      So to do that you want to be running your tests with builds in both Pythons.
      So repeat the above but with virtualenv --python=python2 myenv2
      and just symlink the src/django-pesky to the Python 2 src folder.

      Now you can run the tests for both versions against the same egg code -
      and make sure when you fix for 3 you don't break for 2.

      For current Django 1.7 you would just need to support the latest Python 2.7
      and so the above changes are all compatible except for use of unicode() and how you call open().

      Version specific code
      However in some cases you may need to write code that is specific to 2 or 3.
      If that occurs you can either use the approach of latest or anything else (cross fingers)

      try:
          latest version compatible code (e.g. Python 3 - Django 1.7)
      except:
          older version compatible code (e.g. Python 2 - Django < 1.7)

      Or you can use specific version targetting ...

      import sys, django
      django_version = django.get_version().split('.')

      if sys.version_info['major'] == 3 and django_version[1] == 7:
          latest version
      elif sys.version_info['major'] == 2 and django_version[1] == 6:
          older django version
      else:
          older version

      where ...

      django.get_version() -> '1.6' or '1.7.1'
      sys.version_info() -> {'major':3, 'minor':4, 'micro':0, 'releaselevel':'final', 'serial':0}

      SummarySo how did I get on with my first egg, django-csvimport ? ... it actually proved quite time consuming since the csv.reader library was far more sensitive to bad character encoding in Python 3 and so a more thorough manual alternative had to be implemented for those important edge cases - which the tests are aimed to cover. After all if a CSV file is really well encoded and you already have a model for it - it hardly needs a pesky third party egg for CSV imports - just a few django shell lines using the csv library will do the job.







      Categories: FLOSS Project Planets

      A. Jesse Jiryu Davis: Toro 0.7 Released

      Planet Python - Wed, 2014-10-29 10:09

      I've just released version 0.7 of Toro. Toro provides semaphores, locks, events, conditions, and queues for Tornado coroutines. It enables advanced coordination among coroutines, similar to what you do in a multithreaded application. Get the latest version with "pip install --upgrade toro". Toro's documentation, with plenty of examples, is on ReadTheDocs.

      There is one bugfix in this release. Semaphore.wait() is supposed to wait until the semaphore can be acquired again:

      @gen.coroutine def coro(): sem = toro.Semaphore(1) assert not sem.locked() # A semaphore with initial value of 1 can be acquired once, # then it's locked. sem.acquire() assert sem.locked() # Wait for another coroutine to release the semaphore. yield sem.wait()

      ... however, there was a bug and the semaphore didn't mark itself "locked" when it was acquired, so "wait" always returned immediately. I'm grateful to "abing" on GitHub for noticing the bug and contributing a fix.

      Categories: FLOSS Project Planets

      Code Karate: Finding the right brand

      Planet Drupal - Wed, 2014-10-29 07:28

      If you have been around CodeKarate.com for awhile you have noticed that our branding has been, we

      Categories: FLOSS Project Planets

      Mike Gabriel: Join us at "X2Go: The Gathering 2014"

      Planet Debian - Wed, 2014-10-29 07:27

      TL;DR; Those of you who are not able to join "X2Go: The Gathering 2014"... Join us on IRC (#x2go on Freenode) over the coming weekend. We will provide information, URLs to our TinyPads, etc. there. Spontaneous visitors are welcome during the working sessions (please let us know if you plan to come around), but we don't have spare beds anymore for accomodation. (We are still trying hard to set up some sort of video coverage--may it be life streaming or recorded sessions, this is still open, people who can offer help, see below).

      Our event "X2Go: The Gathering 2014" is approaching quickly. We will meet with a group of 13-15 people (number of people is still slightly fluctuating) at Linux Hotel, Essen. Thanks to the generous offerings of the Linux Hotel [1] to FLOSS community projects, costs of food and accommodation could be kept really low and affordable to many people.

      We are very happy that people from outside Germany are coming to that meeting (Michael DePaulo from the U.S., Kjetil Fleten (http://fleten.net) from Denmark / Norway). And we are also proud that Martin Wimpress (Mr. Ubuntu MATE Remix) will join our gathering.

      In advance, I want to send a big THANK YOU to all people who will sponsor our weekend, either by sending gift items, covering travel expenses or providing help and knowledge to make this event a success for the X2Go project and its community around.

      read more

      Categories: FLOSS Project Planets

      resolv_wrapper 1.0.0 – the new cwrap tool

      Planet KDE - Wed, 2014-10-29 07:24

      I’ve released a new preloadable wrapper named resolv_wrapper which can be used for nameserver redirection or DNS response faking. It can be used in testing environment to route DNS queries to a real nameserver separate from resolv.conf or fake one with simple config file. We tested it on Linux, FreeBSD and Solaris. It should work on other UNIX flavors too.

      You can download resolv_wrapper here.

      Categories: FLOSS Project Planets

      Acquia: Drupal in the Philipines, Own your Own Code & More - Luc Bézier

      Planet Drupal - Wed, 2014-10-29 07:00
      Language English On being an open source developer

      "Like a lot of people, I did both sides of technology; working on paid, proprietary systems [and open source]. There is a big difference. I can't imagine myself going back to any proprietary system where I have to pay; I can't share the code I am doing with anyone; I have to ask a company about the right tool to use. I love the way that everybody contributes to the same piece of code, trying to make it the best ... and for free!"

      Categories: FLOSS Project Planets

      Python Anywhere: Maintenance release: trusty + process listings

      Planet Python - Wed, 2014-10-29 03:51

      Hi All,

      A maintenance release today, so nothing too exciting. Still, a couple of things you may care about:

      • We've updated to Ubuntu Trusty. Although we weren't vulnerable to shellshock, it's nice to have the updated Bash, and to be on an LTS release

      • We've added an oft-requested feature to be able to view all your running console processes. You'll find it at the bottom of the consoles page. The UI probably needs a bit of work, you need to hit refresh to update the list, but it's a solution for when you think you have some detached processes chewing up your CPU quota! Let us know what you think.

      Other than that, we've updated our client-side for our Postgres beta to 9.4, and added some of the PostGIS utilities. (Email us if you want to check out the beta). We also fixed an issue where a redirect loop would break the "reload" button on web apps, and we've added weasyprint and python-svn to the batteries included.

      Categories: FLOSS Project Planets

      Chris Hostetter: Economic Stimulus from Washington: Prizes for Stumping The Chump!

      Planet Apache - Tue, 2014-10-28 22:09

      Most of the time, if you see “Washington”, “November” & “$” in the same article, you are probably reading about Elections, Campaign Finance Reform, Super-PACs, Attack Ads, and maybe even Criminal Investigations.

      This is not one of those articles.

      Today I’m here to remind you that on November 13th, you can “Win, Win! Win!!!” big prizes if you have a tough Lucene/Solr question that manages to Stump The Chump!

      • 1st Prize: $100 Amazon gift certificate
      • 2nd Prize: $50 Amazon gift certificate
      • 3rd Prize: $25 Amazon gift certificate

      To enter: just email your tough question to our panel of judges via stump@lucenerevolution.org any time until the day of the session. Even if you won’t be able to attend the conference in D.C., you can still participate — and maybe win a prize — by emailing in your tricky questions.

      To keep up with all the “Chump” news fit to print, you can subscribe to this blog (or just the “Chump” tag).

      The post Economic Stimulus from Washington: Prizes for Stumping The Chump! appeared first on Lucidworks.

      Categories: FLOSS Project Planets

      Flavio Percoco: Hiding unnecessary complexity

      Planet Python - Tue, 2014-10-28 20:29

      This post does not represent a strong opinion but something I've been thinking about for a bit. The content could be completely wrong or it could even make some sense. Regardless, I'd like to throw it out there and hopefully gather some feedback from people interested in this topic.

      Before I get into the details, I'd like to share why I care. Since I started programming, I've had the opportunity to work with experienced and non experienced folks in the field. This allowed me to learn from others the things I needed and to teach others the things they wanted to learn that I knew already. Lately, I've dedicated way more time to teaching others and welcoming new people to our field. Whether they already had some experience or not is not relevant. What is indeed relevant, though, is that there's something that needed to be taught, which required a base knowledge to exist.

      As silly as it may sound, I believe the process of learning, or simply the steps we follow to acquire new knowledge, can be represented in a directed graph. We can't learn everything at once, we must follow an order. When we want to learn something, we need to start somewhere and dig into the topic of our interest one step at a time.

      The thing I've been questioning lately is how deep does someone need to go to consider something as learned? When does the required knowledge to do/learn X ends? Furthermore, I'm most interested in what we - as developers or creators of these abstractions that will then be consumed - can do to define this.

      Learning new things is fascinating, at least for me. When I'm reading about a topic I know nothing about, I'd probably read until I feel satisfied with what I've discovered whereas when I'm digging into something I need to know to do something else, I'd probably read until I hit that a-ha moment and I feel I know enough to complete my task. Whether I'll keep digging afterwards or not depends on how interesting I think the topic is. However, the important bit here is that I'll focus on what I need to know and I leave everything else aside.

      I believe the same thing happens when we're consuming an API - regardless it's a library, a RESTFul API, RPC API, etc. We'll read the documentation - or just the API - and then we'll start using it. There's no need to read how it was implemented and, hopefully, no further reading will be necessary either. If we know enough and/or the API is simple enough - in terms of how it exposes the internal implementation, vocabulary, pattern, etc - we won't need to dig into any other topics that we may not know already.

      Whenever we are writing an API, we tend to either expose too many things or too few things. Finding the right balance between the things that should be kept private and the ones that should be made public is a never-ending crusade. Moreover, keeping the implementation simple and yet flexible becomes harder as we move on writing the API. Should we expose all the underlying context? What is the feeling a consumer of this API should have?

      By now, you are probably thinking that I just went nuts and this is all nonsense and you're probably right but I'll ignore that and I'll keep going. Let me try to explain what I mean by using some, hopefully more realistic, examples.

      Imagine you're writing an API for a messaging system - you saw this example coming, didn't you? - that is supposed to be simple, intuitive and yet powerful in terms of features and semantics. Now, before thinking about the API you should think about the things you want this service to support. As a full featured messaging service, you probably want it to support several messaging patterns. For the sake of this post, lets make a short list:

      These are the 2 messaging patterns - probably the most common ones - that you'd like to have support for in your API. Now, think about how you'd implement them.

      For the Producer/Consumer case you'd probably expose endpoints that will allow your users to post messages and get messages. So far so good, it's quite simple and straightforward. To make things a little bit more complicated, lets say you'd like to support grouping for messages. That is, you'd like to provide a simple way to keep a set of messages separated from another set of messages. A very simple way to do that is by supporting the concept of queues. However, Queue is probably a more complex type of resource which implicitly brings some properties into your system. For example, by adding queues to your API you're implicitly saying that messages have an order, therefore it's possible to walk through it - pagination, if you will - and these messages cannot - or shouldn't - be accessed randomly. You probably know all this, which makes the implementation quite simple and intuitive for you but, does the consumer of the API know this? will consuming the API be as simple and intuitive as implementing it was for you? Should the consumer actually care about what queue is? Keep in mind the only thing you wanted to add is grouping for messages.

      You may argue saying that you could use lightweight queues or just call it something else to avoid bringing all these properties in. You could, for example, call them topics or even just groups. The downside of doing this is that you'd be probably reinventing a concept that already exists and assigning to it a different name and custom properties. Nothing wrong with that, I guess.

      You've a choice to make now. Are you going to expose queues through the API for what they are? Or are you going to expose them in a simpler way and keep them as queues internally? Again, should your users actually care? What is it that they really need to know to use your API?

      As far as your user is concerned, the important bit of your API is that messages can be grouped, posting messages is a matter of sending data to your server and getting them is a matter of asking for messages. Nonetheless, many messaging services with support for queues would require the user to have a queue instance where messages should be posted but again: should users actually care?

      Would it be better for your API to be something like:

      MyClient.Queue('bucket').post('this is my message')

      or would it be simpler and enough to be something like:

      MyClient.post('this is my message', group='bucket')

      See the difference? Am I finally making a point? Leave aside CS and OOP technicality, really, should the final user care?

      Lets move onto the second messaging pattern we would like to have support for, publish/subscribe. At this point, you've some things already implemented that you could re-use. For instance, you already have a way to publish messages and the only thing you have to figure out for the publishing part of the message pattern is how to route the message being published to the right class. This shouldn't be hard to implement, the thing to resolve is how to expose it through the API. Should the user know this is a different message pattern? Should the user actually know that this is a publisher and that messages are going to be routed once they hit the server? Is there a way all these concepts can be hidden from the user?

      What about the subscriber? The simplest form of subscription for an messaging API is the one that does not require a connection to persist. That is, you expose an API that allows users to subscribe an external endpoint - HTTP, APN, etc - that will receive messages as they're pushed by the messaging service.

      You could implement the subscription model by exposing a subscribe endpoint that users would call to register the above-mentioned receivers. Again, should this subscriber concept be hidden from the user? What about asking the user where messages published to group G should be forwarded to instead of asking the users to register subscribers for the publish/subscribe pattern?

      Think about how emails - I hate myself for bringing emails as a comparison - work. You've an inbox where all your emails are organized. Your inbox will normally be presented as a list. You can also send an email to some user - or group of users - and they'll receive that email as you receive other's emails. In addition to this, your email service also provides a way to forward email, filter email and re-organize it. Do you see where I'm going with this? have you ever dug into how your email service works? have you ever wondered how all these things are implemented server side? Is your email provider using a queue or just a simple database? You may have wondered all these things but, were they essential for you to understand how to use your email client? I'd bet they weren't.

      Does the above make any sense? Depending on how you read the above it may seem like a silly and unnecessary way of reinventing concepts, theories and things that already exist or it may be a way to just ask the users to know what they really need to know to use your service as opposed to forcing them to dig into things they don't really need - or even care about. The more you adapt your service - or API - to what the user is expected to know, the easier it'll be for them to actually use it.

      If you got to this point, I'm impressed. I'm kinda feeling I may be really going nuts but I think this post has got me to sort of a fair conclusion and probably an open question.

      As much as purists may hate this, I think there's no need to force 'knowledge' into users just for the sake of it. People curious enough will dig into problems, concepts, implementations, etc. The rest of the people will do what you expect them to do, they'll use your API - for better or for worse - and they shouldn't care about the underlying implementation, theories or complexities. All these things should be hidden from the user.

      Think about newcomers and how difficult it could be for a person not familiar at all with messaging system to consume a library that requires Producers and Consumers to be instantiated separately. Think about this newcomer trying to understand why there are are producers, consumers, publishers and subscribers. What if this newcomer just wanted to send a message?

      As a final note, I'd probably add that the whole point here is not to use silly names for every well-known concept just to make lazy people happy. If that were the case, we wouldn't have sets and everything would be an array with random properties attached to it. The point being made here is that we tend to expose through our APIs lots of unnecessary theories and concepts that users couldn't care less about. When working on the APIs our users will consume, we should probably ask ourselves how likely it is for them to know all this already and how we can hide unnecessary concepts from them without preventing them for digging deeper into it.

      Although all this may sound like "Writing APIs 101", I don't believe it is as obvious for everyone as it seems.

      Categories: FLOSS Project Planets

      Justin Mason: Links for 2014-10-28

      Planet Apache - Tue, 2014-10-28 19:58
      • David Malone planning a commemoration of Dublin Mean Time next year

        Dublin had its own time zone, 25 minutes off what would become GMT, until 1916

        (tags: 1916 dublin rising time dublin-mean-time dmt gmt perfidious-albion dunsink)

      • Roshiak

        a Riak-based clone of Roshi, the CRDT server built on top of Redis. some day I’ll write up the CRDT we use on top of Voldemort in $work. Comments: https://lobste.rs/s/tim5xc

        (tags: riak roshi crdt redis storage time-series-data)

      • Vodafone UK, Verizon add mandatory device-tracking token on all web requests

        ‘Verizon Wireless is monitoring users’ mobile internet traffic, using a token slapped onto web requests, to facilitate targeted advertising even if a user has opted out. The unique identifier token header (UIDH) was launched two years ago, and has caused an uproar in tech circles after it was re-discovered Thursday by Electronic Frontier Foundation staffer Jacob Hoffman-Andrews. The Relevant Mobile Advertising program, under which the UIDH was used, allowed a restaurant to advertised to locals only or for retail websites to promote to previous visitors, according to Verizon Wireless.’

        (tags: uidh verizon vodafone privacy tracking http cookies advertising)

      • Cuckoo Filters

        ‘In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend “standard” Bloom filters to support deletion all degrade either space or performance. We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set member- ship tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters out-perform previous data structures that extend Bloom filters to support deletions substantially in both time and space.’

        (tags: algorithms cs coding cuckoo-filters bloom-filters sets data-structures)

      • Irish government in favour of ISDS court-evasion for multinationals

        This has _already_ been used to trump national law. As Simon McGarr noted at https://twitter.com/Tupp_Ed/statuses/526103760041680898 : ‘Philip Morris initiated a dispute under the Australia-Hong Kong Bilateral Investment Treaty to force #plainpacks repeal and compensation’. “Plain packs” anti-smoking is being bitterly fought at the moment here in Ireland. More from the US point of view: http://www.washingtonpost.com/opinions/harold-meyerson-allowing-foreign-firms-to-sue-nations-hurts-trade-deals/2014/10/01/4b3725b0-4964-11e4-891d-713f052086a0_story.html : ‘The Obama administration’s insistence on ISDS may please Wall Street, but it threatens to undermine some of the president’s landmark achievements in curbing pollution and fighting global warming, not to mention his commitment to a single standard of justice. It’s not worthy of the president, and he should join Europe in scrapping it.’

        (tags: isds national-law law ireland sovereignty multinationals philip-morris us-politics eu free-trade)

      • Jonathan Bergknoff: Building good docker images

        Good advice

        (tags: devops reference docker tips ops containers linux)

      • Game Day Exercises at Stripe: Learning from `kill -9`

        We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node, and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others. Excellent post. Game days are a great idea. Also: massive Redis clustering fail

        (tags: game-days redis testing stripe outages ops kill-9 failover)

      • The Laborers Who Keep Dick Pics and Beheadings Out of Your Facebook Feed | WIRED

        “Everybody hits the wall, generally between three and five months,” says a former YouTube content moderator I’ll call Rob. “You just think, ‘Holy shit, what am I spending my day doing? This is awful.’”

        (tags: facebook wired beheadings moderation nsfw google youtube social-media filtering porn abuse)

      Categories: FLOSS Project Planets

      Put LXQt to work on Fedora 21 and epel 7

      Planet KDE - Tue, 2014-10-28 19:35

      After i joined RedHat on some time ago, i slowly start to been involved more and more on Qt and KDE matters, even been more active joining Fedora KDE SIG,  as a newbie :-).

      One of the recently internal discussions lead to a necessity of bring lightweight desktop, at least one, in a polish state.

      Since LXQt is finally came on 0.8.0 to Qt5, and we’re actively working on Fedora 21, Qt5 builds and KDE Frameworks,  and there are needs to jump and do at least the first usable state of the project, i jumped the wagon.

      Eugene Pivnev was already working on initial packaging for Fedora, but i was advanced in the work when lead KDE packager from Fedora, Rex Dieter, told me, but at least i could use part of his work on sysstat packages and qtxdg packages. I borrowed his full packages.

      was only two days of work and is still in the baby steps, that’s why i packaged everything on Fedora copr buildsystem until we have proper review. We still have no group for install whole desktop and need to install packages by hand, and there’s an explicit dependency on openbox as window manager, but was a quick decision to make things work fast.

      I’m still deciding how to deal with lxqt-admin package due some dependencies, but is the only missing from the 0.8.0 series on http://www.lxqt.org

      So, if you want to try it, recompile, help me, complain to me, you can find the work here:

      https://heliocastro.fedorapeople.org/lxqt/

      For the repository on copr:

      • Fedora 21 – dnf copr enable heliocastro/lxqt or yum copr enable heliocastro/lxqt
      • Epel 7 – yum copr enable heliocastro/lxqt 

      Ps. I will not intend to compile/enable Qt 4 builds, only Qt 5 build because this is the direct goal

      Categories: FLOSS Project Planets

      The Cherry Hill Company: Islandora Camp Colorado - Bringing Islandora and Drupal closer

      Planet Drupal - Tue, 2014-10-28 18:40

      From October 13 - 16, 2014, I had the opportunity to go to (and the priviledge to present at) Islandora Camp Colorado (http://islandora.ca/camps/co2014). These were four fairly intensive days, including a last day workshop looking to the future with Fedora Commons 4.x. We had a one day introduction to Islandora, a day of workshops, and a final day of community presentations on how Libraries (and companies that work with Libraries such as ours) are using Islandora. The future looks quite interesting for the relationship between Fedora Commons and Drupal.

      • The new version of Islandora allows you to regenerate derivatives on the fly. You can specify which datastreams are derivatives of (what I am calling) parent datastreams. As a result, the new feature allows you to regenerate a derivative through the UI or possibly via Drush, which something the Colorado Alliance is working to have working with the ...
      Read more »
      Categories: FLOSS Project Planets

      Mikko Ohtamaa: Safe evaluation of math expressions in pure Python

      Planet Python - Tue, 2014-10-28 18:29

      This blog post discusses how to evaluate user inputted math expressions safely in Python without using any third party libraries.

      E.g. the user can write

      a + max(3, b) / c

      … and then you run this on the serve- side with variable substitutions to get the actual result.

      Table Of Content

      1. Background

      2. Protecting your service

      2. Sandbox escape

      2. CPU burning attack

      2. Memory burning attack

      3. The implementation

      1. Background

      There are use cases where you want to input math expressions or equations from the user as an alternative to fixed numbers. Examples include

      • In the vector drawing application, the user creates object coordinates procedurally to automatize complex layouts
      • In the eCommerce application, the product price is not fixed, but dynamically based on external inputs (market price, date, etc.)

      Traditionally math expressions are done by writing your own small domain specific language using a custom BNF grammar and a library like pyparsing. You parse the equation and write some kind of evaluation loop, desperately trying cover all cases of nested functions, forward and backward looking of arguments and so on. You can see an example here.

      However, Python REPL itself is pretty human-friendly calculator. Why to bother do all parsing and evaluation yourself when you just can use Python in Python?

      The approach presented here is based on the example of Aleksi Torhamo and his Python code verification research. No third party libraries needed.

      StackOverflow, always so helpful with moderators who understand. Luckily I got a tip towards solving this privately. I kindly ask you to vote to reopen this question, so that an answer can be posted.

      2. Protecting your service

      The user input is one of the web application attack vectors. It is especially problematic when you let the user to run something, like math expressions, on the server-side  – this opens many options for a malicious actors to cause harm. Below I have listed some potential attacks and how to mitigate against them.

      2. Sandbox escape

      This is the traditional rationale against using eval(). You cannot simply let the user to run arbitrary code on the server, as they could simply do import os ; os.remove(“/home”). The history of Python has various harded eval and sandbox implementations coming up now and then, but the general advice is that “Python eval() can never be safe“. There are even few nice blog posts dedicated how to escape Python sandbox. Only heavyish multiprocessing approaches, like pysandbox which enforces the security on operating system process level could be considered.

      In our approach, the trick is that we only do the first half of the eval(). We use the Python compile() function to prepare the Python expression to be the bytecode for evaling. Then, we actually don’t eval(). Instead we use custom opcode handlers and evaluate opcodes in a loop one after another. There cannot be a sandbox escape, because opcodes having should functionality are not implemented. Because compile() does the job of generating the microcode, we are saved from the headache of writing a custom parser. Python dis module helps us minimize the code needed for a stack-based virtual machine.

      2. CPU burning attack

      If you let the user to input arbitrary calculations, they can try to make the calculations so complex that your process keeps calculating the math expression forever, mounting a denial of service attack against the service. In Python’s case this means that math with really large numbers can be expensive, as Python 3 allows infinite large integers by default. Initially I was thinking the compile() approach could be easily secured against against such attacks.  One could simply limit the input value sizes in math opcodes. But unfortunately it turned out to be little trickier.

      For example if you compile() Python expression 2**9999999999**9999999 (very big exponent) the Python prompt keeps pushing the bits forever. This is because Python compile() evaluates constants during the compile time, so you are stuck in the compiling step forever.

      Even if we gap the power operation, which can be easily exploited to generate very large numbers, the malicious actors are quite creative of coming up with other different slow calculations. The only real mitigation for the execution time issue is having the hard limit for calculation time. We do this using Python’s multiprocess module. The calculation is run in a separate child process. This process is terminated if it doesn’t finish timely. In our case, we’ll give 100 milliseconds for the expression calculations to finish.

      As a side note we threads didn’t work here because it turned out compile() does not release GIL and thus thread.start() never returns if thread.run() contains a complex compile() – the original thread does not get GIL back.

      We are also protected against infinite loops. Python compile() allows only single line Python statements, thus for and while are out of the list (even they could be written on a single line).  We do not implement looping opcodes either.

      2. Memory burning attack

      Memory burning attack is similar to CPU burning attack. Instead of CPU cycles, the attacker tries to run make the process run out of memory. One could input a very long math expression. Alternatively, the power operation could be used to create very large numbers not fitting to the process memory.

      The example code does not hard limit the memory usage of the child process. This could be done on the operating system level e.g. by capping the maximum allowed child process memory to few megabytes. The main rationale skipping it for now is that I did not (yet) find a nice portable way to do this with Python multiprocess module. Probably the child process could set its own memory cap at the beginning of the process. All tips are welcome.

      Instead, a naive approach is chosen. The input size and the resulting bytecode size are checked against code size limit. If somebody knows how to make our restricted math operations program to eat a lot of memory please let me know and I’ll test against this too.

      3. The implementation

      Below is the sample code. The core code is ~200 lines + extras. The tests at the end of the file and having the timeout decorator consume some lines.

      If you feel creative and you find a way to destroy this nice piece of code, I’d be interested to hear about the possible attack methods.

      You can also find the code as Github gist.

      import opcode import dis import sys import multiprocessing import time # Python 3 required assert sys.version_info[0] == 3, "No country for old snakes" class UnknownSymbol(Exception): """ There was a function or constant in the expression we don't support. """ class BadValue(Exception): """ The user tried to input dangerously big value. """ MAX_ALLOWED_VALUE = 2**63 class BadCompilingInput(Exception): """ The user tried to input something which might cause compiler to slow down. """ class TimeoutException(Exception): """ It took too long to compile and execute. """ class RunnableProcessing(multiprocessing.Process): """ Run a function in a child process. Pass back any exception received. """ def __init__(self, func, *args, **kwargs): self.queue = multiprocessing.Queue(maxsize=1) args = (func,) + args multiprocessing.Process.__init__(self, target=self.run_func, args=args, kwargs=kwargs) def run_func(self, func, *args, **kwargs): try: result = func(*args, **kwargs) self.queue.put((True, result)) except Exception as e: self.queue.put((False, e)) def done(self): return self.queue.full() def result(self): return self.queue.get() def timeout(seconds, force_kill=True): """ Timeout decorator using Python multiprocessing. Courtesy of http://code.activestate.com/recipes/577853-timeout-decorator-with-multiprocessing/ """ def wrapper(function): def inner(*args, **kwargs): now = time.time() proc = RunnableProcessing(function, *args, **kwargs) proc.start() proc.join(seconds) if proc.is_alive(): if force_kill: proc.terminate() runtime = time.time() - now raise TimeoutException('timed out after {0} seconds'.format(runtime)) assert proc.done() success, result = proc.result() if success: return result else: raise result return inner return wrapper def disassemble(co): """ Loop through Python bytecode and match instructions with our internal opcodes. :param co: Python code object """ code = co.co_code n = len(code) i = 0 extended_arg = 0 result = [] while i < n: op = code[i] curi = i i = i+1 if op >= dis.HAVE_ARGUMENT: # Python 2 # oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg oparg = code[i] + code[i+1] * 256 + extended_arg extended_arg = 0 i = i+2 if op == dis.EXTENDED_ARG: # Python 2 #extended_arg = oparg*65536L extended_arg = oparg*65536 else: oparg = None # print(opcode.opname[op]) opv = globals()[opcode.opname[op].replace('+', '_')](co, curi, i, op, oparg) result.append(opv) return result # For the opcodes see dis.py # (Copy-paste) # https://docs.python.org/2/library/dis.html class Opcode: """ Base class for out internal opcodes. """ args = 0 pops = 0 pushes = 0 def __init__(self, co, i, nexti, op, oparg): self.co = co self.i = i self.nexti = nexti self.op = op self.oparg = oparg def get_pops(self): return self.pops def get_pushes(self): return self.pushes def touch_value(self, stack, frame): assert self.pushes == 0 for i in range(self.pops): stack.pop() class OpcodeArg(Opcode): args = 1 class OpcodeConst(OpcodeArg): def get_arg(self): return self.co.co_consts[self.oparg] class OpcodeName(OpcodeArg): def get_arg(self): return self.co.co_names[self.oparg] class POP_TOP(Opcode): """Removes the top-of-stack (TOS) item.""" pops = 1 def touch_value(self, stack, frame): stack.pop() class DUP_TOP(Opcode): """Duplicates the reference on top of the stack.""" # XXX: +-1 pops = 1 pushes = 2 def touch_value(self, stack, frame): stack[-1:] = 2 * stack[-1:] class ROT_TWO(Opcode): """Swaps the two top-most stack items.""" pops = 2 pushes = 2 def touch_value(self, stack, frame): stack[-2:] = stack[-2:][::-1] class ROT_THREE(Opcode): """Lifts second and third stack item one position up, moves top down to position three.""" pops = 3 pushes = 3 direct = True def touch_value(self, stack, frame): v3, v2, v1 = stack[-3:] stack[-3:] = [v1, v3, v2] class ROT_FOUR(Opcode): """Lifts second, third and forth stack item one position up, moves top down to position four.""" pops = 4 pushes = 4 direct = True def touch_value(self, stack, frame): v4, v3, v2, v1 = stack[-3:] stack[-3:] = [v1, v4, v3, v2] class UNARY(Opcode): """Unary Operations take the top of the stack, apply the operation, and push the result back on the stack.""" pops = 1 pushes = 1 class UNARY_POSITIVE(UNARY): """Implements TOS = +TOS.""" def touch_value(self, stack, frame): stack[-1] = +stack[-1] class UNARY_NEGATIVE(UNARY): """Implements TOS = -TOS.""" def touch_value(self, stack, frame): stack[-1] = -stack[-1] class BINARY(Opcode): """Binary operations remove the top of the stack (TOS) and the second top-most stack item (TOS1) from the stack. They perform the operation, and put the result back on the stack.""" pops = 2 pushes = 1 class BINARY_POWER(BINARY): """Implements TOS = TOS1 ** TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] print(TOS1, TOS) if abs(TOS1) > BadValue.MAX_ALLOWED_VALUE or abs(TOS) > BadValue.MAX_ALLOWED_VALUE: raise BadValue("The value for exponent was too big") stack[-2:] = [TOS1 ** TOS] class BINARY_MULTIPLY(BINARY): """Implements TOS = TOS1 * TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 * TOS] class BINARY_DIVIDE(BINARY): """Implements TOS = TOS1 / TOS when from __future__ import division is not in effect.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 / TOS] class BINARY_MODULO(BINARY): """Implements TOS = TOS1 % TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 % TOS] class BINARY_ADD(BINARY): """Implements TOS = TOS1 + TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 + TOS] class BINARY_SUBTRACT(BINARY): """Implements TOS = TOS1 - TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 - TOS] class BINARY_FLOOR_DIVIDE(BINARY): """Implements TOS = TOS1 // TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 // TOS] class BINARY_TRUE_DIVIDE(BINARY): """Implements TOS = TOS1 / TOS when from __future__ import division is in effect.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 / TOS] class BINARY_LSHIFT(BINARY): """Implements TOS = TOS1 << TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 << TOS] class BINARY_RSHIFT(BINARY): """Implements TOS = TOS1 >> TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 >> TOS] class BINARY_AND(BINARY): """Implements TOS = TOS1 & TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 & TOS] class BINARY_XOR(BINARY): """Implements TOS = TOS1 ^ TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 ^ TOS] class BINARY_OR(BINARY): """Implements TOS = TOS1 | TOS.""" def touch_value(self, stack, frame): TOS1, TOS = stack[-2:] stack[-2:] = [TOS1 | TOS] class RETURN_VALUE(Opcode): """Returns with TOS to the caller of the function.""" pops = 1 final = True def touch_value(self, stack, frame): value = stack.pop() return value class LOAD_CONST(OpcodeConst): """Pushes co_consts[consti] onto the stack.""" # consti pushes = 1 def touch_value(self, stack, frame): # XXX moo: Validate type value = self.get_arg() assert isinstance(value, (int, float)) stack.append(value) class LOAD_NAME(OpcodeName): """Pushes the value associated with co_names[namei] onto the stack.""" # namei pushes = 1 def touch_value(self, stack, frame): # XXX moo: Get name from dict of valid variables/functions name = self.get_arg() if name not in frame: raise UnknownSymbol("Does not know symbol {}".format(name)) stack.append(frame[name]) class CALL_FUNCTION(OpcodeArg): """Calls a function. The low byte of argc indicates the number of positional parameters, the high byte the number of keyword parameters. On the stack, the opcode finds the keyword parameters first. For each keyword argument, the value is on top of the key. Below the keyword parameters, the positional parameters are on the stack, with the right-most parameter on top. Below the parameters, the function object to call is on the stack. Pops all function arguments, and the function itself off the stack, and pushes the return value.""" # argc pops = None pushes = 1 def get_pops(self): args = self.oparg & 0xff kwargs = (self.oparg >> 8) & 0xff return 1 + args + 2 * kwargs def touch_value(self, stack, frame): argc = self.oparg & 0xff kwargc = (self.oparg >> 8) & 0xff assert kwargc == 0 if argc > 0: args = stack[-argc:] stack[:] = stack[:-argc] else: args = [] func = stack.pop() assert func in frame.values(), "Uh-oh somebody injected bad function. This does not happen." result = func(*args) stack.append(result) def check_for_pow(expr): """ Python evaluates power operator during the compile time if its on constants. You can do CPU / memory burning attack with ``2**999999999999999999999**9999999999999``. We mainly care about memory now, as we catch timeoutting in any case. We just disable pow and do not care about it. """ if "**" in expr: raise BadCompilingInput("Power operation is not allowed") def _safe_eval(expr, functions_and_constants={}, check_compiling_input=True): """ Evaluate a Pythonic math expression and return the output as a string. The expr is limited to 1024 characters / 1024 operations to prevent CPU burning or memory stealing. :param functions_and_constants: Supplied "built-in" data for evaluation """ # Some safety checks assert len(expr) < 1024 # Check for potential bad compiler input if check_compiling_input: check_for_pow(expr) # Compile Python source code to Python code for eval() code = compile(expr, '', 'eval') # Dissect bytecode back to Python opcodes ops = disassemble(code) assert len(ops) < 1024 stack = [] for op in ops: value = op.touch_value(stack, functions_and_constants) return value @timeout(0.1) def safe_eval_timeout(expr, functions_and_constants={}, check_compiling_input=True): """ Hardered compile + eval for long running maths. Mitigate against CPU burning attacks. If some nasty user figures out a way around our limitations to make really really slow calculations. """ return _safe_eval(expr, functions_and_constants, check_compiling_input) if __name__ == "__main__": # Run some self testing def test_eval(expected_result, *args): result = safe_eval_timeout(*args) if result != expected_result: raise AssertionError("Got: {} expected: {}".format(result, expected_result)) test_eval(2, "1+1") test_eval(2, "1 + 1") test_eval(3, "a + b", dict(a=1, b=2)) test_eval(2, "max(1, 2)", dict(max=max)) test_eval(2, "max(a, b)", dict(a=1, b=2, max=max)) test_eval(3, "max(a, c, b)", dict(a=1, b=2, c=3, max=max)) test_eval(3, "max(a, max(c, b))", dict(a=1, b=2, c=3, max=max)) test_eval("2", "str(1 + 1)", dict(str=str)) test_eval(2.5, "(a + b) / c", dict(a=4, b=1, c=2)) try: test_eval(None, "max(1, 0)") raise AssertionError("Should not be reached") except UnknownSymbol: pass # CPU burning try: test_eval(None, "2**999999999999999999999**9999999999") raise AssertionError("Should not be reached") except BadCompilingInput: pass # CPU burning, see out timeoutter works try: safe_eval_timeout("2**999999999999999999999**9999999999", check_compiling_input=False) raise AssertionError("Should not be reached") except TimeoutException: pass try: test_eval(None, "1 / 0") raise AssertionError("Should not be reached") except ZeroDivisionError: pass try: test_eval(None, "(((((((((((((((()") raise AssertionError("Should not be reached") except SyntaxError: # for i in range(0, 100): # ^ # SyntaxError: invalid synta pass try: test_eval(None, "") raise AssertionError("Should not be reached") except SyntaxError: # SyntaxError: unexpected EOF while parsing pass # compile() should not allow multiline stuff # http://stackoverflow.com/q/12698028/315168 try: test_eval(None, "for i in range(0, 100):\n pass", dict(i=-1)) raise AssertionError("Should not be reached") except SyntaxError: # for i in range(0, 100): # ^ # SyntaxError: invalid synta pass # No functions allowed try: test_eval(None, "lamdba x: x+1") raise AssertionError("Should not be reached") except SyntaxError: # SyntaxError: unexpected EOF while parsing pass

       

       

       

       Subscribe to RSS feed Follow me on Twitter Follow me on Facebook Follow me Google+

      Categories: FLOSS Project Planets

      Accessibility is alive (QtSpeech progress, Jovie's deprecation)

      Planet KDE - Tue, 2014-10-28 18:05

      For some time I've been considering what to do about Jovie which was previously known as ktts (KDE Text To Speech). Since before the first KDE Frameworks release actually, since kdelibs used to host a dbus interface definition for the KSpeech dbus interface that ktts and then Jovie implemented. I have a qt5 frameworks branch of Jovie, but it didn't make much sense to port it, since a lot of it is or could become part of the upcoming QtSpeech module. So Jovie has no official qt5 port and wont be getting one either.

      What will Okular, KNotify, and other applications that want to speak to users do instead? The answer is QtSpeech. QtSpeech is a project started by Frederik Gladhorn to bring speech api's to all the platforms that Qt supports. It is still in its infancy, but is quickly improving. A few weeks ago when I built my kf5 stack with kdesrc-build I noticed that kdepim(libs?) was depending on it and it hasn't been released yet, so I got motivated to send some improvements to qt-project. Frederik and Laurent Montel have been pushing fixes and improving it also. It is as easy if not easier to use than the KSpeech dbus api (and doesn't require dbus either) and can be used to speak text on linux/unix, osx, windows, and android platforms so far. If you are an expert on any of these platforms please send patches to implement api on these platforms in their backends, the more eyes on this project the faster we can get it solidified and released.

      You may be asking but what about feature X in Jovie that I will miss desperately. Yes there are a few things that QtSpeech will not do that Jovie did. These will either need to be done in individual applications or we can create a small framework to add these features (or possibly add them to QtSpeech itself if they make sense there). The features I'm thinking of are:

      1. Filtering - Changing ": Hey QtSpeech is really coming along now" to "jpwhiting says 'Hey QtSpeech is really coming along now'" for KNotifications and the like. This could likely be implemented easily in knotify itself and exposed in the notifications configuration dialog.
      2. Voice switching - Changing which voice to use based on the text, or the application it is coming from or anything else. This might make sense in QtSpeech itself, but time will tell if it's a wanted/needed feature.
      3. User configuration - Jovie had a decent (not ideal, but it was functional) ui to set some voice preferences, such as which voice you wanted to use, which pitch, volume, speed, gender, etc. This will become the only part of Jovie that will get ported, which is a KDE Control Module for speech-dispatcher settings. This may also change over time, as speech-dispatcher itself possibly grows a ui for it's settings.

      All in all, progress is being made. I expect QtSpeech to be ready for release with Qt 5.5, but we'll see what happens.

      Categories: FLOSS Project Planets
      Syndicate content