Feeds
Is MariaDB replacing MySQL?
Over the past month there has been a steady trickle of announcements from organisations switching from MySQL to MariaDB.
From next month, MariaDB will replace MySQL as the default database in Fedora. And now RedHat has announced its doing the same. Even Wikimedia started using it.
So what is MariaDB, why is the switch happening, and what are the implications?
MariaDB is a fork of MySQL intended to be, as much as possible, a drop-in replacement for MySQL in most cases. It has pretty much the same features, with some extras and claims performance improvements over the original.
However, I don’t think its really the pros and cons of the technology that matters so much as the backstory.
MySQL was acquired by Sun Microsystems in 2009. However, one of the original developers of MySQL, ‘Monty’ Widenius, was unhappy with the way things were working out at Sun, and left to start his own company, and his own fork of MySQL, MariaDB. The later acquisition of Sun by Oracle was also viewed negatively by Widenius, who issued a call to arms to “save MySQL”.
However, Widenius wasn’t the only one concerned about the future of MySQL, and other forks appeared such as the Facebook MySQL fork and Drizzle.
When comparing MariaDB and MySQL, we’re comparing cultures of development as well as feature sets.
Oracle is perceived – rightly or wrongly – as having a conflict of interest as the owners of MySQL and also suppliers of its major closed-source competitor, leading people to suspect that the company is being slow to develop the database further.
It is also seen as being less open in its development process, and losing the trust of the community. For example, where Oracle have added major new capabilities to MySQL, these are often closed-source extensions. Oracle also stopped using a public bug tracker and instead switched to it internal system. These moves – and others – are cited as examples of Oracle moving away from the open development model that made MySQL successful in the first place.
However, is MariaDB, any more “open” than Oracle?
MariaDB is managed by the MariaDb Foundation, and has been developing its community governance model. The board of MariaDB includes some well-known figures from the Open Source world including Simon Phipps of the Open Source Initiative, and Andrew Katz of Moorcrofts; in April, Phipps was elected as the MariaDB Foundation CEO. On the commercial side, MariaDB service providers include Monty Program, the company founded by Widenius, and SkySQL, founded by other former MySQL employees.
This separation in governance between service providers and the development community counters the perception of “conflict of interest” found with Oracle and MySQL, and also means that the Foundation can implement policies and practices that support openness and transparency in the running of the project.
Is openness and transparency reason enough to switch?
One of the claims made by MariaDB is that it applies community patches and implements new features much more rapidly than Oracle. If MariaDB is iterating faster than MySQL, fixing bugs and implementing new technologies, without sacrificing stability and quality, then this is a key product differentiator.
So for some, switching to MariaDb may mean they are more comfortable with the governance and management of the community. For others, it’s the benefits that stem from that openness that will drive them to switch.
(It’s difficult to determine which is the prime motivator, as even where companies switch out of a concern over Oracle’s stewardship of MySQL they are more likely to couch this in terms of the technical advantages of MariaDB rather than blatantly come out and say it.)
What this story also shows is how the strategic use of forking – the “nuclear option” of open source – can be a viable strategy once a project becomes perceived by its community as going down the wrong path. And probably a better option than waiting until the company discontinues the product, and then attempting to resurrect it.
Will MariaDB replace MySQL in the long run? Widenius certainly thinks so. However, as Daniel Bartholemew points out, “the competition between MariaDB and MySQL can only be good.”
And whatever happens at least we won’t need to stop using the acronym LAMP.
Are you looking to switch to MariaDB? If so, why? Let us know in the comments.
Photo by by bnorwood used under a CC-BY Creative Commons License
Shalin Shekhar Mangar: First Bangalore Lucene/Solr Meetup Report
The First Bangalore Lucene/Solr meetup was organized on Saturday, 8th June 2013 courtesy of the initiatives of Anshum Gupta and Varun Thacker. Although I joined in as a co-organizer but honestly I did nothing except tweet about it and show up with some slides.
I must say that I was pleasantly surprised at the rate at which the group went from zero to a hundred members (it stands at a 132 members as of writing this post). Our initial limit for the venue was fifty but it was increased to seventy five once the size of the venue was confirmed. Microsoft Accelerator was gracious enough to provide a conference room and refreshments for the attendees. 50+ people showed up which is pretty good considering that the meetup schedule clashed with some other popular meetups. The agenda was dominated by presentations but quite a bit of time was spent in Q&A.
Vinodh Kumar R (Head of BloomReach India) gave a talk on ranking models (adversarial vs implicit vs real time news) applicable for different kind of search applications.
Varun Thacker talked about search and faceting in Solr focusing on e-commerce applications. He introduced term, date and range facets and went on to cover multi-select faceting showing examples and use-cases for each.
Dikchant Sahi talked about using Solr’s DataImportHandler to index databases and xml files in Solr. He also gave a live demo of full and delta imports of a sample music data set into Solr.
I gave a presentation on SolrCloud and Shard Splitting which is something that Anshum Gupta and I have been working on the past few months.
Here are the slides that I presented:
SolrCloud and Shard Splitting from Shalin Mangar
At the start of my presentation, I solicited an informal poll from the audience to gauge their interest:
- Everyone in attendance was familiar with the projects
- Most of the attendees were already using Solr for search
- Everyone in attendance was using Solr 3.x and no one was on SolrCloud yet
- Almost everyone was evaluating, prototyping or testing SolrCloud
I met Umesh Prasad from Flipkart at the meetup and we chatted quite a bit on Solr’s performance under heavy bulk re-indexing workloads and also about accommodating large elevation and synonym files in SolrCloud. I’m happy to know that Flipkart uses Apache Solr for their excellent search. I also met a couple of search enthusiasts who have used Solr in the past and want to contribute back to the community.
All in all, I think it was a good first step towards establishing a strong Lucene/Solr community in Bangalore. I wish that the next meetup gives more time for one to one interactions and focused conversations around search issues. It’d be nice to have more about Lucene in the next meetup. A lot of people inquired about training on Apache Solr so we may organize a workshop for the next meetup.
Drop me a line if you are interested in attending a Solr training in Bangalore. Also, if you’re in Bangalore and interested in Lucene/Solr or search in general, do join the Bangalore Lucene/Solr Meetup group.
4.11 beta 1 packages available for openSUSE 12.3
As a consequence of the recent changes in the repositories, the openSUSE KDE team is happy to announce the availability of packages containing the first beta of the KDE Platform, Workspaces and Applications 4.11. Packages are available in the KDE:Distro:Factory repository. As it is beta software, it may have not-yet-discovered bugs, and its use is recommended only if you are willing to test packaging (reporting bugs to Novell’s bugzilla) or the software (reporting bugs directly to KDE). For specific queries on the 4.11 beta not related to specific openSUSE packaging, use the KDE Community Forums 4.11 Beta/RC area. Have a good test!
Gunnar Wolf: Cultural objects/goods: When a superhero is too famous for his own good
I found the following news item; if you can read Spanish, you will most probably prefer the original version in the Proceso magazine's site. The subject? The federal police (PGR) and army arrest 17 artisans for «making money out of» Spiderman.
The following translation is mine. Done past midnight, and being quite tired, and translated so this news item can reach a broader audience. All errors are mine (except those carried out by the security forces, that is).
June 13, 2013
Cuernavaca, Morelos. Policement from the General Republic Attorney (Procuraduría General de la República, PGR) and the Army entered and searched the "3 de mayo" neighbourhood, in the municipality of Emiliano Zapata, detaining 17 ceramist artisans that sold candies, dolls and piñatas shaped like Spiderman.
This search was done on the evening of last Wednesday, around 16:00. Federal ministerial policement and army soldiers closed a street with several informal stores and detained workers taht were selling this Marvel Comics character, following said company's denounce.
As a result for this operation, 17 artisants were detained, although the same day five of them were freed. The policemen also seized 12 bags of candies, piñatas, ceramics and wooden figures of the superhero.
PGR closed down 11 stores where ceramics with this same figure was being sold, accusing the detainees of plagiarizing Spiderman's image, protected under the copyright law.
The 12 that remained under detention were put at the Federal Justice's disposal, which prompted that this Thursday, around 10AM, hundreds of sellers of "3 de mayo" went out to PGR's building to demand their friends' freedom, who are facing a bail of up to 200,000 pesos (~USD$18,000).
Outraged because –they said– they were treated as if they were part of a drug ring, hundreds of artisans closed intermitently Avenida Cuauhnáhuac, where the PGR representation in Morelos state is located.
The artisans' pressure helped for the amount of the bail to be lowered from MX$200,000 to MX$16,000, and so they were set free.
Francisco Fernández Flores, president of the Ceramists Association, criticized the operation because, he said, it was as strong as if they were "drug dealers".
The artisans explained that they don't even make the Spiderman figures, they are made by the interns of the Centro Estatal de Reinserción Social de Atlacholoaya (prision), located in the Xochitepec municipality, who offered them to the ceramists so they could be sold.
"The Atlacholoaya inmates do them, we buy them to support them, and turns out we are the delinquents now", said Miriam Monroy, sister of one of the detainees.
This information was contradicted by Jesús Valencia Valencia, responsible for Morelos' state prision system, who assured that in said prision no ceramics are done.
Fernández Flores insisted though that from within the prision they are being offered piñatas, candies and "piggy banks" with Spiderman's shape.
José Luis Pozo, vicepresident of the Ceramists Union, said that to avoid more such federal operations for copyright breaches, they have committed not to produce or commercialize Marvel superhero figures, and any other characters the authority demands.
"We do commit to, from now on, those products singled out to us will not be commercialized", he said.
Pozo said that the PGR operation caused losses not just to the detained producers and salesmen, but to over 200 ceramists that had to close their stores in solidarity with their friends.
Acording to the artisans, the products were a success until the PGR came, seized the products and detained the salesmen.
And yes, the copyright insanity does not stop. Spiderman is by today a clear part of popular culture. Marvel brilliantly succeeded in creating such a popular icon that everybody recognizes, that everybody identifies with — And that everybody should be able to recreate.
We are not talking about brand protection. Marvel does not, and will never, commercialize piñatas, ceramics or wooden toys. And even if they were plastic-cast — While Spiderman is still under the protection of copyright, as the Berne Convention defines it (and of course, as the much stricter Mexican laws agree), that does not mean that any and every product resembling a Spiderman should be protected. Many ceramists and piñata makers will create unique pieces of art — Ok, handicraft. But reading the copyright law more strictly, Spiderman is more treated as a trademark than as a copyright. And it is a trademark that should be declared as having passed on to the public domain.
Using Spinlock in linux : Example
In the following example we create two threads, thread1 and thread2. We initialize a spinlock "my_lock" and both the threads try to acquire the lock .
Thread1 is made to get to the lock first. After acquiring the lock thread1 waits for one minute before releasing the lock.
During this time thread2 attempts to acquire the lock too. We have used the function spin_trylock, so if the thread2 is unable to hold the spinlock it will return immediately with a return value of non zero.
Thus we will define a spinlock using
static spinlock_t my_lock=SPIN_LOCK_UNLOCKED
Starting with the spinlock in unlocked state.
We will create the two threads thread_fn1 and thread_fn2 using the function kthread_create.
The first thread function thread_fn1 will hold the lock and loop for approximately on minute the code for which is
int thread_fn1() { unsigned long j0,j1; int delay = 60*HZ; j0 = jiffies; j1 = j0 + delay; spin_lock(&my_lock); while (time_before(jiffies, j1)) schedule(); printk(KERN_INFO "In thread1"); spin_unlock(&my_lock); return 0; }
The second thread function thread_fn2 will try to hold try to hold the lock using spin_trylock.A sleep of 100ms is added at the beginning to make sure that the first thread gets to the lock first.Thus the code for the second thread function will be
int thread_fn2() { int ret=0; msleep(100); ret=spin_trylock(&my_lock); if(!ret) { printk(KERN_INFO "Unable to hold lock"); return 0; } else { printk(KERN_INFO "Lock acquired"); spin_unlock(&my_lock); return 0; } }
If the thread_fn2 is able to get the lock we will see the message "Lock acquired" in the kernel log. If the thread_fn2 is unable to get the lock we will see the message "Unable to hold lock" in the kernel log.
The complete code will be
spinlock_example.c
#include #include #include #include // for threads #include // for task_struct #include #include static struct task_struct *thread1,*thread2; static spinlock_t my_lock = SPIN_LOCK_UNLOCKED; int thread_fn1() { unsigned long j0,j1; int delay = 60*HZ; j0 = jiffies; j1 = j0 + delay; spin_lock(&my_lock); while (time_before(jiffies, j1)) schedule(); spin_unlock(&my_lock); return 0; } int thread_fn2() { int ret=0; msleep(100); ret=spin_trylock(&my_lock); if(!ret) { printk(KERN_INFO "Unable to hold lock"); return 0; } else { printk(KERN_INFO "Lock acquired"); spin_unlock(&my_lock); return 0; } } int thread_init (void) { char name[8]="thread1"; char name1[8]="thread2"; thread1 = kthread_create(thread_fn1,NULL,name); if((thread1)) { wake_up_process(thread1); } thread2 = kthread_create(thread_fn2,NULL,name1); if((thread2)) { wake_up_process(thread2); } return 0; } void thread_cleanup(void) { int ret,ret1; ret = kthread_stop(thread1); if(!ret) printk(KERN_INFO "Thread stopped"); } MODULE_LICENSE("GPL"); module_init(thread_init); module_exit(thread_cleanup);
Save the file as spinlock_example.c
Makefile required for compilation.
ifneq ($(KERNELRELEASE),) obj-m := spinlock_example.o else KERNELDIR ?= /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KERNELDIR) M=$(PWD) modules endif clean: $(MAKE) -C $(KERNELDIR) M=$(PWD) clean
Compile and load the file
$ make $ insmod spinlock_example.ko
To check if second thread was able to get the lock run the command dmesg
$ dmesg Unable to hold lock
Thus we can see that the spin_trylock faile to acquire the lock and returned the error message. Instead of spin_trylock if we use spin_lock, the second thread will wait on the lock as long as the thread does not become available.
Note: Do not try changing the code from spin_trylock to spin_lock if you have a uniprocessor system. It might freeze your system and force you to do hard reboot.
Install FlareGet Download Manager on Ubuntu / Linux Mint
There is a module for that!: Ingesting the Facebook stream using Feeds and OAuth 2.0 (Updated for Drupal 7)
Now that Twitter 1.1 and Feeds are buddies, time to move to other data sources. Next up: Facebook. Using trusty Feeds and friends, I was able to ingest my own Facebook home feed. Here's how to replicate this:
For the impatient, attached is a feature that should get you set up quickly.
Coordinate Systems in KStars
This post describes a few of the coordinate systems that KStars uses to keep track of the positions of various astronomical objects, and how they relate to one another.
All of the points used in KStars can be thought of as lying on a sphere, because it really makes no difference how far away a sky object like a star is – we only care about the direction. We can then imagine that these points “live” on the celestial sphere, an imaginary sphere surrounding the Earth. The problem of rendering a map of the night sky is then the problem of figuring out how to transform this sphere onto the screen.
Horizontal CoordinatesHorizontal coordinates are defined strictly relative to the observer. We usually use two angles, the altitude, which measures the angle from the point to the horizon, so that 90° is straight up, 0° is on the horizon, and -90° is straight down, and the azimuth, which measures the angle eastward from the north direction to the direction of the point.
Equatorial CoordinatesThere are two important lines on the celestial sphere that we’ll use to define our coordinate systems. The first is the celestial equator, which is just the Earth’s equator, extended outwards and projected onto the celestial sphere. The second is the ecliptic, which is the intersection of the Earth’s orbital plane with the celestial sphere. Since the Earth’s axis is tilted, these two lines are distinct, and they intersect at two points, as seen in this drawing:
From the point of view of the Earth, the Sun travels along the ecliptic. When the ecliptic intersects the equator, it means that the Sun is directly above the equator, so night and day have the same length everywhere on earth. Thus, these points are the spring (vernal) and fall (autumnal) equinoxes.
The equatorial coordinate system describes a point p in terms of its relation to the vernal equinox, usually by means of two angles, called right ascension and declination. The right ascension, RA or α, is the angle running eastward from the vernal equinox along the equator, while the declination is the angle from the object to the equator.
Wikipedia provides a nice summary in GIF format:
J2000 and standard epochsOne thing to note about this system is that the Earth does not orbit the Sun perfectly. The Earth has precession and nutation, which are respectively a slow drift in the earth’s rotational axis and a slight, gradual wobbling motion. These motions cause the positions of the equinoxes to shift slightly, which means that the equatorial coordinates of even “fixed” objects vary over time, since they’re defined in reference to a moving point!
To solve this problem, astronomers agree on particular times (“epochs”) to reference their coordinates. The most common is the J2000 epoch, which is defined to be the equatorial coordinates of a point in reference to the vernal equinox at noon on January 1, 2000.
These coordinates have the time already specified, so we’re not missing any information, and we can use them in catalogs and databases.
As an example, the Andromeda galaxy currently has equatorial coordinates of RA = 00h 43m 29s, Dec = 41° 20’ 22“, but the catalog coordinates are RA = 00h 42m 44s, Dec = 41° 16’ 08”.
Ecliptic coordinatesEcliptic coordinates are pretty similar to equatorial coordinates, but they’re defined in terms of the ecliptic plane, instead of the equatorial plane. We describe a point in terms of its ecliptic longitude, which is measured eastward from the vernal equinox, and its ecliptic latitude, which is the angle from the point to the ecliptic plane.
The same points about time dependence hold for ecliptic coordinates as well as equatorial coordinates, but in KStars we don’t store any data in ecliptic coordinates, so it’s not as important to define reference epochs for ecliptic coordinates.
Galactic coordinatesFinally, there’s the galactic coordinate system. This system places the Sun at the centre and uses the disk of the Milky Way to define the galactic equator. It describes a point in terms of its galactic longitude, measured eastward from the centre of the galaxy, and its galactic latitude, which is the angle to the galactic plane.
KStars doesn’t actually make use of these, but we have the ability to calculate them, and this is exposed to the user as part of KStars’ tools collection.
Converting between coordinate systemsA lot of the work that KStars does goes into converting between these coordinate systems. We have a lot of different types of objects, which are grouped into SkyComponents. Currently, KStars deals with the problem of different objects needing different calculations by making each sky object have its own class, with its own object-oriented code for computation. This is really, really, bad for performance in a lot of ways, because it has fragmented memory access patterns, a virtual function call for each point, and a lot of duplicated work.
Simplfying this requires carefully laying out what kinds of computation we do for each kind of sky object, which I’ll do in a future post. For now, I’ll just give a brief description of what goes into the conversion from one coordinate system to another.
To convert from horizontal coordinates to equatorial coordinates, we need to know the time and location of the observer. Going back to the description we had of the equatorial coordinate system, we have a (relatively) fixed sphere, with the earth rotating around inside. All we need to do is compute the rotation of the earth, and apply that rotation to all of the points we want to convert. (Interestingly enough, at the moment KStars computes this rotation for each point we want to compute, because we never compute multiple conversions at the same time). If we want to be very precise, however, we need to compute the effects of atmospheric refraction, which makes our points appear higher in the sky then they actually are.
As we noted, equatorial coordinates change over time due to the effects of precession and nutation. Converting equatorial coordinates in one epoch to another epoch requires computing the effects of precession and nutation. But there’s an extra complication there, too: we also need to compute the effects of aberration, which is caused by relativistic effects: the earth is moving around the sun, and this movement causes the apparent position of the stars to shift depending on the relative velocity of the earth to the star.
Converting from equatorial to ecliptic coordinates is a simple rotation, and we just need to know the angle between the Earth’s equator and the ecliptic plane. This varies over time, so we need to compute precession and nutation as before.
However, this isn’t the full story: some objects have extra calculations that need to be done, and we also want to be able to do these computations in a way that avoids trigonometry as much as possible. To be continued…
Vasudev Ram: XMLtoPDFBook now supports chapter numbers and names
By Vasudev Ram
I've added support for chapter numbers and names to XMLtoPDFBook, which I blogged about recently. XMLtoPDFBook enables you to create simple PDF ebooks from chapters stored as text in an XML file.
The chapter numbers and names are printed in the header of the PDF file created. Chapter numbers are added automatically, starting from 1, and incremented by 1 for each chapter. For chapter names, you have to change the chapter elements in the XML file from the earlier format, which had no attributes for the chapter element, to add an attribute called 'name', with its value being the chapter name.
Earlier format for the chapter element:
<chapter>
New format for the chapter element:
<chapter name="chapter_name">
where you replace "chapter_name" with the name of each chapter, as desired.
That is the only change needed. The (updated) XMLtoPDFBook program takes care of the rest.
Chapter names, though supported, are optional. If a chapter element has no name attribute, it is not an error. No chapter name will be printed in the header for that chapter.
You can run XMLtoPDFBook the same way as I said in my first post about it:
python XMLtoPDFBook.py vi_quickstart.xml vi_quickstart.pdf
For viewing the PDF file, you may want to try using either Foxit PDF Reader or NitroReader. I've used Foxit Reader a lot, and it is fairly good. Just started trying NitroReader (*).
Here is a screenshot of page 1 of the generated PDF file, vi_quickstart.pdf, in NitroReader (right-click to open in a new tab and view larger size):
And here is a screenshot of page 5 of the same PDF file, vi_quickstart.pdf, in Foxit PDF Reader (right-click to open in a new tab and view larger size):
I also added some more error handling to the program.
I've uploaded XMLtoPDF to my Bitbucket repository for xtopdf, since it is now a part of my xtopdf toolkit. You can download it from here.
Incidentally, I saw on the NitroReader site that it was PDF's birthday this month; the PDF format is now 20 years old.
(*) And finally, it was a bit interesting to me to remember that NitroPDF (from the same company as NitroReader) was one of the topics of my very second blog post on my earlier blog, jugad's Journal :-). I ran that blog for about 3 years before moving to this one (which you are reading now), on Blogger, due to the takeover of LiveJournal by some other company.
- Vasudev Ram - Dancing Bison Enterprises
Contact me
Share | Vasudev Ram
Freedesktop Summit
A few days back I attended the first freedesktop summit/sprint where a few hackers from different free desktops met with the objective of working together. We were people from Razord-qt, GNOME, Unity and of course KDE.
Even though we did not had the chance to discuss all the topics I was specially interested in like Notifications or Session Inhibition I did had the chance to get involved in other topics that are equally interesting like the shared Desktop Files cache or the “Trash size cache” that will enable a cross desktop way of caching the size of the Trash folder getting better performance across desktops.
The social part of these kind of events is important as well, even though I already knew Ryan and David a week of working together makes the collaboration more smooth, and of course I also met new people as well like Lars, or Jeft.
I’m quite happy to have pushed together with Ryan this event, we definitively moved forward the collaboration between desktops and even though freedesktop is still far from being perfect I do believe we did a step into the right direction.
Can’t wait for the next Fd.o Summit.
Tim Retout: Sophie
It's my first Father's Day! Sophie was born 2 months ago (3345g or 7lb 6oz), and I've been on a blogging hiatus for quite a bit longer than that. She's very cute.
I am getting into the swing of fatherhood - lots of nappy changing. :) I took my two weeks of paternity leave, but spread the second "week" over two weeks by working just afternoons, which gave me lots of time with mummy and baby. We watched a DVD called "The Happiest Baby on the Block", and mastered the techniques therein (mainly swaddling and white noise). So all things considered, we're getting quite a bit of sleep.
Sophie is very curious about my typing, and leans towards anything she's interested in... so she's currently suspended at an angle besides me. Maybe she'll be interested in what her parents do, when she grows up. :) But for now, we're enjoying that she's learned to smile.
Daniel Pocock: Monitoring with Ganglia: an O'Reilly community book project
I recently had the opportunity to contribute to an O'Reilly community book project, developing the book Monitoring with Ganglia in collaboration with other members of the Ganglia team
The project itself, as a community book, pays no royalties back to the contributors, as we have chosen to donate all proceeds to charity. People who contributed to the book include
Robert Alexander, Jeff Buchbinder, Frederiko Costa, Alex Dean, Dave Josephsen, Bernard Li, Matt Massie, Brad Nicholes, Peter Phaal and Vladimir Vuksan and we also had generous assistance from various members of the open source community who assisted in the review process.
Ganglia itself started at University of California, Berkeley as an initiative of Matt Massie, for monitoring HPC cloud infrastructure
My own contact with Ganglia only began in 2008 when I was offered the opportunity to work full-time on the enterprise-wide monitoring systems for a large investment bank. Ganglia had been chosen for this huge project due to it's small footprint, support for many platforms and it's ability to work on a heterogeneous network as well as providing dedicated features for the bank's HPC grid.
This brings me to one important point about Ganglia: it's not just about HPC any more. While it is extremely useful for clusters, grids and clouds, it is also quite suitable for a mixed network of web servers, mail servers, databases and all the other applications you may find in a small business, education or ISP environment.
Instantly up and running with packagesOne of the most compelling features, even for small sites with less than 10 nodes, is the ease of installation: install the packages on Debian, Ubuntu, Fedora, OpenCSW and some other platforms, and it just works. Ganglia nodes will find each other over multicast, instantly, no manual configuration changes necessary. On one of the nodes, the web interface must be installed for viewing the statistics. Dare I say it: it is so easy, you hardly even need the book for a small installation.
Where the book is really compelling is if you have hundreds or thousands of nodes, if you want custom charts or custom metrics or anything else beyond just installing the package. If monitoring is more than 10% of your job, the book is probably a must-have.
Excellent open source architectureGanglia's simplicity is largely thanks to the way it leverages other open source projects such as Tobi Oetiker's RRDtool and PHP
Anybody familiar with these tools will find Ganglia is particularly easy to work with and customise.
Custom metrics: IO service timesOne of my own contributions to the project has been the creation of ganglia-modules-linux, some plugins for Linux-specific metrics and ganglia-modules-solaris providing some similar metrics for Solaris.
These projects on github provide an excellent base for people to fork and implement their own custom metrics in C or C++
The book provides a more detailed account of how to work with the various APIs for Python, C/C++, gmetric (command line/shell scripts) and Java.
The new web interfaceFor people who had tried earlier versions of Ganglia (and for those people who installed versions < 3.3.0 and still haven't updated), the new web interface is a major improvement and well worth the effort to install.
It is available on the most recent packages (for example, it is in Debian 7 (wheezy) but not in Debian 6.)
It was originally promoted as a standalone project (code-named gweb2) but was adopted as the official Ganglia web interface around the release of Ganglia 3.3.0. This web page provides a useful overview of what has changed and here is the original release announcement.
[one-liner]: Securing your Subversion Password using GPG Agent
If you’ve ever dealt with subversion on Unix, one of the annoyances is that it essentially stores it’s password in clear text under your $HOME/.subversion/auth/svn.simple directory in text files. Not a huge deal to a single developer or user but if you work in a large company or even a small one this is a pretty bad implementation. Well here’s a method which at least get’s the password out of these clear text files.
SolutionThe solution came up while I was research something else, as is usually the case. I found this paper titled: GPG-agent based secure password cache for Subversion Version Control System. The paper covers work that was done on behalf of Collabnet (the original owners of the subversion project).
Image of GNOME Keyring
I haven’t had a chance to try this out but this point is meant as a reminder to me and also to others that this is a reality (finally) with the stock Subversion software. It looks to be built in with the 1.8 release. This commit to the Subversion trunk highlights this new auth capability and how it works along with several security considerations if you plan on using it.
References local copiesNOTE: For further details regarding my one-liner blog posts, check out my one-liner style guide primer.
Dariusz Suchojad: Use Zato to integrate Django with exchange rate web services in 10 lines of code
(This is a re-post from Zato Blog as Planet Python doesn’t syndicate that one yet)
Summary: The post introduces Zato, an open-source integration platform in Python, and shows you how to integrate Django, or indeed any piece of Python software, with Zato and external web services using nothing but plain Python objects.
Applications in any programming language can be integrated using Zato but being written in Python itself, Zato offer a convenience client for software in Python and that will be used throughout the text.
Zato is a lightweight, yet complete, ESB (Enterprise Service Bus). And the project’s goal is to become a powerful, yet lightweight, one.
Start here for a gentle introduction to what ESB and SOA (Service-Oriented Architecture) are about, but in short, they let you integrate multiple applications each potentially using different formats, protocols and programming languages with the aim of supporting interesting processes you need to automate. And with Zato this is all in pure Python with as little headaches as possible.
How things should standAs a Python programmer, about the only thing I feel I should need in order to invoke web services exposed by any sort of systems is a simple API based on dicts or other dict-like objects, like Bunch.
It should be always possible to write code like what is below and expect it will just work regardless of the complexity of underlying protocols and data transports.
request = {'from':'EUR', 'to':'HRK'} response = get_exchange_rate(request) print(response.rate) # 1 EUR = 7.4680 HRK as of Jun 13, 2013, 5:00PM GMTGiven that it’s a blog of the Zato project it won’t come as a surprise that I am about to tell you that Zato allows you to achieve just that, to think in terms of services and dictionaries without having to worry about how everything is actually implemented underneath.
You delegate the job of an actual integration to Zato which becomes the component responsible for dealing with protocols and data formats, fetching information, straightening it and returning to you a unified view. This lets you focus on your job only and nicely follows the UNIX philosophy of separating software into clearly defined blocks interoperating in order to achieve an interesting result. Not to mention that this what the integrations industry has been using to tackle such scenarios for decades now.
This way you can focus on your own app, not on data integration. Someone else takes care of it.
The overall schemeThe diagram depicts what we will achieve:
Users enters a currency code to find EUR exchange rates to in an HTML form
A Django application invokes a Zato client providing a Python dictionary with currencies selected on input.
Behind the scenes, the dictionary is converted into an HTTP JSON call but this is completely transparent to you as a Django programmer.
Zato receives the call already converted to a Bunch instance and invokes 3 web services provided by:
- Yahoo! Finance
- Google CalculatorPseudo-JSON API
- European Central Bank XML API
Output from 3 different sources is converted to a clean Pythonic response sent back to Django
Django app receives a list of dictionaries on output ready to use in a template which is shown to the user
Implementation Django sideFirst, clone this repository (we’ll call the directory you’ll clone it to DJANGO_APP_DIR) and run DJANGO_APP_DIR/install.sh – this will use install or upgrade distribute and virtualenv and use pip/buildout to download a couple of dependencies and install everything under virtualenv.
DJANGO_APP_DIR$ git clone git://github.com/zatosource/zato-django-integration.git . DJANGO_APP_DIR$ ./install.sh [snip] DJANGO_APP_DIR$ ./bin/py sampleapp/src/run.pyYou can now go to http://127.0.0.1:8188 and witness an ‘[Errno 111] Connection refused’ error. This is OK. Zato is not running yet.
What you can already have a look though is the Django code. Basically, a middleware class is used to inject a Zato client and the client is used to invoke a service which will be defined in the next steps.
Let’s see, this is how the middleware looks like..
from zato.client import AnyServiceInvoker class ZatoMiddleware(object): def process_request(self, req): req.zato_client = AnyServiceInvoker('http://localhost:17010', '/django/sample', ('django-app', 'django-password')) from django.template.response import TemplateResponse def home(req): # A dictionary of input data read from HTTP GET. If no input was given # we translate from EUR to HRK. to = req.GET.get('to', 'HRK') request = {'from':'EUR', 'to':to} # Pass the dictionary into the client's invoke method along with the name # of a service you want to invoke response = req.zato_client.invoke('exchangerates.get-exchange-rate-list', request) # response.data has a bunch of attributes that can be fed to the template as is return TemplateResponse(req, 'rates.html', {'data':response.data['rates'], 'to':to})If it were a project where you’d be doing Django programming only then you could congratulate yourself. The code shown above is everything you need to write to invoke a Zato service and fetch the exchange rates.
This is 10 lines of Python code, counting imports or class definitions in. Without the boilerplate, it will be 2 or 3 lines of code needed to invoke web services.
OK, there’s also a trivial piece of HTML, the gist of which is here but that’s it. There is nothing else on Django side, job well done!
Zato sideFirst thing is, read at least the first part of the tutorial. This will install Zato and create a quickstart cluster. Done? OK, let’s continue. Save the code below as exchangerates.py..
# -*- coding: utf-8 -*- from __future__ import absolute_import, division, print_function, unicode_literals # stdlib from datetime import datetime from traceback import format_exc # anyjson from anyjson import loads # lxml from lxml import etree # Zato from zato.server.service import Service class GetExchangeRateList(Service): class SimpleIO: response_elem = 'rates' input_required = ('from', 'to') output_required = ('provider', 'rate', 'ts') output_repeated = True def get_yahoo(self, from_, to): # Response template out = {'provider':'Yahoo! Finance', 'rate':None, 'ts':None} # Grab a connection by its name conn = self.outgoing.plain_http.get('Yahoo! Finance').conn # Y! Finance needs a query string in that format # ?s=HRKEUR=X&f=snl1d1t1ab url_params = {'s':'{}{}=X'.format(from_, to), 'f':'snl1d1t1ab'} # Invoking the .get method issues a GET request response = conn.get(self.cid, url_params) # Y! gives us a CSV response response = response.text.split(',') # The string we receive is something like # u'"EURHRK=X","EUR to HRK",7.4608,"6/14/2013","5:55pm",7.4629,7.4588\r\n' # and we need the 3rd item. out['rate'] = response[2] out['ts'] = datetime.utcnow().isoformat() return out def get_google(self, from_, to): out = {'provider':'Google', 'rate':None, 'ts':None} # Grab a connection by its name conn = self.outgoing.plain_http.get('Google Calculator').conn # Google needs a query string in that format # ?q=1EUR=HRK url_params = {'q': '1{}={}'.format(from_, to)} # Invoking the .get method issues a GET request response = conn.get(self.cid, url_params) # Convert the pseudo-JSON from # {lhs: "1 Euro",rhs: "7.46464923 Croatian kune",error: "",icc: true} -> # {"lhs": "1 Euro","rhs": "7.46464923 Croatian kune","error": "","icc": true} # so it can be parsed as JSON. json = response.text replace = ('lhs', 'rhs', 'error', 'icc') for name in replace: json = json.replace(name, '"{}"'.format(name)) rate = loads(json)['rhs'].split()[0] out['rate'] = rate out['ts'] = datetime.utcnow().isoformat() return out def get_ecb(self, from_, to): out = {'provider':'European Central Bank', 'rate':None, 'ts':None} # Grab a connection by its name conn = self.outgoing.plain_http.get('European Central Bank').conn response = conn.get(self.cid) xml = etree.fromstring(response.text.encode('utf-8')) ns = {'xref': 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref'} rate = xml.xpath( "//xref:Cube[@currency='{}']/@rate".format(to), namespaces=ns)[0] out['rate'] = rate out['ts'] = datetime.utcnow().isoformat() return out def handle(self): from_ = self.request.input.get('from') to = self.request.input.to for func in(self.get_yahoo, self.get_google, self.get_ecb): try: rate = func(from_, to) except Exception, e: self.logger.warn('Caught an exception {}'.format(format_exc(e))) else: self.response.payload.append(rate)and hot-deploy it onto a running server
$ cp exchangerates.py ~/tmp/qs-1/server1/pickup-dir/Both servers will now confirm the deployment, each in its own log (~/tmp/qs-1/server1[2]/logs/server.log):
INFO - Uploaded package id:[1], payload_name:[exchangerates.py]The service is there but it can’t be used yet.
The way Zato is designed, unless you insist on it your services will never need to directly deal with any addresses, they only need to fetch a connection by its name (‘Yahoo Finance’, ‘Google Calculator’ and ‘European Central Bank’) and its Zato’s job to manage it. You only need to think about overall processes and I/O, not about where an external service to invoke is located. If the location ever changes, you’ll update it using GUI, CLI or API and servers will pick up changes automatically, without any restarts.
Another point to make is that with Zato your code never exposes your own services over any specific transport (HTTP, AMQP and so on). This is also done via GUI, CLI or API.
In fact, if you’re using SimpleIO (SIO), the very same service can be exposed over HTTP/AMQP/JMS WebSphere MQ/ZeroMQ with JSON, XML or SOAP (and CSV is coming soon) without any code changes at all. That depends on what the service does, if it’s a synchronous or asynchronous one but that’s the principle.
Also note that most of the abstractions Zato uses are usually convenience wrappers around best Python libraries out there.
For instance, you can use Python dicts but you can also always use the underlying requests library directly for HTTP calls – you’re never forced to use what Zato believes will be enough for you, there’s nothing preventing you from customizing things to your liking with tools Zato doesn’t offer out of the box.
Likewise, say Zato doesn’t have something by default, like SMTP connections. Given that you’re using Python you can still send out emails in 5 lines of code. (And by the way, SMTP will be added to Zato soon so this will become 1 line of code).
Zato GUILet’s fill out a couple of forms in Zato’s GUI to make all the resources need by the service available. Note that it all can be done in JSON and stored in a config repository of your liking but let’s use a GUI here.
Log in at http://localhost:8183 and create a couple of server objects
- HTTP Basic Auth definition
- Plain HTTP channel for Django to invoke
- 3 outgoing plain HTTP connections to
- Yahoo! Finance
- Google Calculator
- European Central Bank
You don’t need to restart server after creating any object.
HTTP Basic Auth definition
Create a new definition and update its password to ‘django-password’ after it’s created – by default passwords are set to randomly generated UUID4s (there are no default passwords in Zato at all).
Plain HTTP channel
Create a new channel object and assign a newly created security definition to it. Note that this particular Python client requires the service to be ‘zato.service.invoke’ and this is the service that invokes the one of yours.
Outgoing plain HTTP connections
An outgoing connection encapsulates information that is to do with particularities of a given transport method. This is everything that a service shouldn’t be concerned with in its own code, such as endpoints, queues, URLs, authentication and so on. Zato deals with it itself, you just need to focus on your own functionality.
Yahoo! Finance
Google Calculator
European Central Bank
Running it allNow that everything has been created you can visit the Django app at http://127.0.0.1:8188/ and play around with various currencies – this will fetch everything from backend web services and display in an HTML table.
What else is there?
Naturally, this isn’t everything. If you’ve already read the intro to ESB/SOA, you know the first question will be, is the service IRA? Is it Interesting, Reusable and Atomic
- Interesting- sure, if you need exchange rates in your projects such information will be certainly interesting on more than one occasion
- Reusable – almost, the list of providers is hard-coded but ultimately, there should be one or more default provider and client applications should be able to specify which ones they’re interested in
- Atomic – yes, as long as it will be given the feature mentioned above (default providers, client apps say which one to use)
It also makes sense to use Zato’s built-in scheduler and Redis to pre-fetch the rates periodically instead of accessing remote resources for each client request.
The good news is, such things are trivial to add with Zato and once you complete it, you’ll have a truly IRA service that can be reused across a wide range of projects without any code changes. And your client apps will be always able to use plain dicts only.
Can Zato do more?There’s a whole lot more Zato can do – JSON, SOAP, AMQP, JMS WebSphere MQ, ZeroMQ, Redis, SQL, FTP, load-balancing, scheduling, statistics, hooks, GUI, CLI, API – the features are there.
Note that Django was used in the text but the client is completely framework-agnostic, the same code will work with any Python application.
Also, Zato is in Python but it’s not for integrating Python apps only. As long as your application can speak any of the protocols mentioned (this is 99% of apps out there), you’re good to go.
What next?If you still haven’t done it yet, read the no-nonsense intro to ESB/SOA and visit the tutorial. This will explain all the core concepts so you can get started quickly.
Thanks for your time!
Joey Hess: little disasters
Interesting times.. While the big disasters are ongoing, little ones have been spicing up my life lately.
A pleasant week by the beach ended with a tropical storm passing over the beach house. I've never experienced this before, and though Andrea was diminished by passing over land, it was still more wind than I've ever seen. I love wind, and this was thrilling, right on the edge of danger but not quite there. At least, if you have sense to stay out of the water. Leaving the beach, I heard of someone who tried to go surfing that day, and drowned.
The night before last, I was startled to find nearly an inch of water seeping up from underneath the tile floor of the kitchen. Probably it has something to do with the pressure tank pumping system, which was repaired while I was away, and means I actually have indoor running water here. (Overrated.) This saw me scrambling to close every water valve, and out with a flashlight at 2 am closing the cutoff at the 1000 gallon water reservoir before it all drained into the house. While sopping up dozens of gallons of water from the floor at 3 am probably doesn't sound like fun, I found myself going through the motions elatedly.. Because this means I finally am coming to understand the source of the damp that infests the most earth-sheltered corner of this house. It's not condensation. It's bad plumbing!
Then yesterday, I went out to try a dip in the river, stopped by the neighborhood eatery and bait shop, and ended up sitting out on the back deck eating ribs and listening to a band with "possum playboys" in their name (which makes the full name fairly irrelevant), while looking out over the river and the old-timey green metal bridge. Which was unexpected fun, and the kind of thing you have to take in when it happens, but getting stuck in a newly installed hole in my driveway was not. My car was spinning, and I gave up and called it a night.
Here's the thing. I could feel my brain working on this stupid "underpowered car is stuck in a small rut" issue all night long. Same mental pathways activating that chew over bugs and design issues. Got up this morning with a set of plans and contingency plans all ready to go. The first one, of jacking it up and putting something under the tire was stymied; it seems I am missing a jack. But the second, of digging out all around the tire, and then filling in with gravel and cat litter (a tip from some offroading website I blearily surfed last night), and then riding the gas while releasing the bake, worked great.
All of which is to say, bring em on! But I still prefer my disasters in the form of software bugs.
Choqok-devel mailing list
I’m pleased to announce that choqok development mailing list is up and ready (thanks to KDE sysadmins)
and from now on, we will talk about development stuff there. So I invite anyone who is interested to involve in Choqok development and contribute code, subscribe to choqok-devel mailing list.
Ned Batchelder: 51 at MoMath
For my birthday (today), we visited the Museum of Math in New York (yesterday). I've been looking forward to getting there since it opened six months ago, and my reluctant family had to accede to my birthday destination, so all five of us spent the afternoon.
The museum is a fun place, with lots of interactive exhibits. Some were intriguing but baffling, like a polyhedron exploration device which looked great, but was impossible to control. We rode square-wheeled tricycles, made ourselves into fractal trees, explored cross-sections in the wall of fire, rolled weird shapes to make weird paths, looked at specular holography, and so on.
As is typical with these kinds of high-traffic interactive displays, a number of them were not working, which was disappointing. But overall, it was a lot of fun, and not the same feel as math class at all. One helpful museum worker kept popping up to tell us how to better use the exhibits, and Susan said, "if I had her as a math teacher, I might have learned something in high school!"
It was a great day. If you enjoy mathematical thinking, I heartily recommend the Museum of Math.
A few blocks north, we found the Museum of Sex, but decided not to go in with the kids....
Bert Boerland: 1M Drupal installs and counting! (11110100001001000000 initiative)
Within 1 year (Q1/Q2 2014?) we will have 1 million registered Drupal installs. We all know the total is higher due to mostly Drupal SaaServices that don't have the pingback enabled and we should always explain that.
However, 1.000.000 is still a freaking big number (11110100001001000000 in base 2 even bigger :-) and we should use this for marketing our product. Apart from the usual suspects there are not that many web based solutions that come even close to one tenth. Big proprietary CMS vendors do very well if they sell 10.000 per year.
All the reason to celebrate, for example by making
- an easy to embed history infographic,
- an interactive timeline like trends or zeitgeist, plotting the number of installs against events in time and or place from the community
- press releases (they never seem to work though :-( )
- give the one millionth Drupal user something, for example a Drupal 1.0 install on a 5 ¼" floppy disk signed by Dries
For sourcing, we can use the influence, people and money of the DA (if they would be in on this), ask companies to work on this pro bono, preferably together and have a fund raiser; "One tenth for one"; 1/10 of a monetary unit of your country per Drupal install you maintain. A dime per install per American, 10 eurocents per Drupal for EU’s, 1/10th of a Rand in South Africa etc. This shows that we are truly global and if people do not donate 10 cents but 10 dollars per install for example (we should hint to that, especially to the cheap Dutch :-) we could raise a bit as well, more than 1/10th of a million if played well.
What are your thoughts, how should we proceed? Please follow up on BAM on g.d.o.
Go Deh: Greedy Ranking Algorithm in Python
I mentioned in an earlier post that I had written my own ranker and thought I'd revisit this with some code.
I verify and ensure the safety of microprocessors for my day job. One way that very complex CPU's are tested is to create another model of the chip which can be used to generate pseudo-random instruction streams to run on CPU. The so-called ISG can create thousands (millions!) of these tests in very little time, and the ISG is written in such a way that it can be 'tweaked' to give some control or steering to what the instruction streams will exercise on the CPU.
Now simulating these instruction streams and gathering information on just what parts of the CPU are exercised, called covered, by each individual test takes time, and multiple ISG generated tests may cover the same regions of the CPU. To increase the overall coverage of of the CPU we run what is called a regression - all the tests are run and their coverage and the time they take to simulate are stored. at the end of the regression run you may have several thousands of tests that cover only part of the CPU.
If you take the regression results and rank them you can find that subset of the tests that give all the coverage. Usually thousands of pseudo-random tests might be ranked and generate a sub-list of only hundreds of tests that when run would give the same coverage. What we then usually do is look at what isn't covered and generate some more tests by the ISG or other methods to try and fill the gaps; run the new regression and rank again in a loop to fully exercise the CPU and hit some target coverage goal.
Ranking tests is an important part of the regression flow described above, and when it works well you forget about it. Unfortunately sometimes I want to to rank other data, for which the stock ranking program from the CAD tool vendors does not fit. So here is the guts of a ranking program that will scale to handling hundreds of thousands of tests and coverage points.
InputNormally I have to parse my input from text or HTML files of results generated by other CAD programs - it is tedious work that I will skip by providing idealised inputs in the form of a Python dict. (Sometimes the code for parsing input files can be as large or larger than the ranking algorithm).
Let us assume that each ISG test has a name, runs for a certain 'time' and when simulated is shown to 'cover' a set of numbered features of the design. after the parsing, the gathered input data is represented by the results dict in the program.
1
2 results = {
3 # 'TEST': ( TIME, set([COVERED_POINT ...])),
4 'test_00': ( 2.08, set([2, 3, 5, 11, 12, 16, 19, 23, 25, 26, 29, 36, 38, 40])),
5 'test_01': ( 58.04, set([0, 10, 13, 15, 17, 19, 20, 22, 27, 30, 31, 33, 34])),
6 'test_02': ( 34.82, set([3, 4, 6, 12, 15, 21, 23, 25, 26, 33, 34, 40])),
7 'test_03': ( 32.74, set([4, 5, 10, 16, 21, 22, 26, 39])),
8 'test_04': (100.00, set([0, 1, 4, 6, 7, 8, 9, 11, 12, 18, 26, 27, 31, 36])),
9 'test_05': ( 4.46, set([1, 2, 6, 11, 14, 16, 17, 21, 22, 23, 30, 31])),
10 'test_06': ( 69.57, set([10, 11, 15, 17, 19, 22, 26, 27, 30, 32, 38])),
11 'test_07': ( 85.71, set([0, 2, 4, 5, 9, 10, 14, 17, 24, 34, 36, 39])),
12 'test_08': ( 5.73, set([0, 3, 8, 9, 13, 19, 23, 25, 28, 36, 38])),
13 'test_09': ( 15.55, set([7, 15, 17, 25, 26, 30, 31, 33, 36, 38, 39])),
14 'test_10': ( 12.05, set([0, 4, 13, 14, 15, 24, 31, 35, 39])),
15 'test_11': ( 52.23, set([0, 3, 6, 10, 11, 13, 23, 34, 40])),
16 'test_12': ( 26.79, set([0, 1, 4, 5, 7, 8, 10, 12, 13, 31, 32, 40])),
17 'test_13': ( 16.07, set([2, 6, 9, 11, 13, 15, 17, 18, 34])),
18 'test_14': ( 40.62, set([1, 2, 8, 15, 16, 19, 22, 26, 29, 31, 33, 34, 38])),
19 }
20
Greedy ranking algorithmThe object of the algorithm is to select and order a subset of the tests that:
- Cover as many of the coverage points as possible by at least one test.
- After the above, reduce the number of tests needed to achieve that maximum coverage by as much as is possible.
- Generate a ranking of the tests selected to allow an even smaller set of tests to be selected if necessary.
- After all the above having increasing importance, it would be good to also reduce the total 'time' accrued by the ranking tests .
- Of course it needs to work for large sets of tests and points to cover.
If there are more than one test giving the same incremental additional coverage at any stage then the test taking the least 'time' is picked.
The following function implements the algorithm:
21 def greedyranker(results):
22 results = results.copy()
23 ranked, coveredsofar, costsofar, round = [], set(), 0, 0
24 noncontributing = []
25 while results:
26 round += 1
27 # What each test can contribute to the pool of what is covered so far
28 contributions = [(len(cover - coveredsofar), -cost, test)
29 for test, (cost, cover) in sorted(results.items()) ]
30 # Greedy ranking by taking the next greatest contributor
31 delta_cover, benefit, test = max( contributions )
32 if delta_cover > 0:
33 ranked.append((test, delta_cover))
34 cost, cover = results.pop(test)
35 coveredsofar.update(cover)
36 costsofar += cost
37 for delta_cover, benefit, test in contributions:
38 if delta_cover == 0:
39 # this test cannot contribute anything
40 noncontributing.append( (test, round) )
41 results.pop(test)
42 return coveredsofar, ranked, costsofar, noncontributing
43
Each time through the while loop (line 25), the next best test is appended to the ranking and tests that can nolonger contribute any extra coverage are discarded (lines 37-41)
The function above is a bit dry so I took a bit of time to annotate it with a tutor capability that when run prints out just what it is doing along the way:
The function with tutorIt implements the same thing but does it noisily:
44 def greedyranker(results, tutor=True):
45 results = results.copy()
46 ranked, coveredsofar, costsofar, round = [], set(), 0, 0
47 noncontributing = []
48 while results:
49 round += 1
50 # What each test can contribute to the pool of what is covered so far
51 contributions = [(len(cover - coveredsofar), -cost, test)
52 for test, (cost, cover) in sorted(results.items()) ]
53 if tutor:
54 print('\n## Round %i' % round)
55 print(' Covered so far: %2i points: ' % len(coveredsofar))
56 print(' Ranked so far: ' + repr([t for t, d in ranked]))
57 print(' What the remaining tests can contribute, largest contributors first:')
58 print(' # DELTA, BENEFIT, TEST')
59 deltas = sorted(contributions, reverse=True)
60 for delta_cover, benefit, test in deltas:
61 print(' %2i, %7.2f, %s' % (delta_cover, benefit, test))
62 if len(deltas)>=2 and deltas[0][0] == deltas[1][0]:
63 print(' Note: This time around, more than one test gives the same')
64 print(' maximum delta contribution of %i to the coverage so far'
65 % deltas[0][0])
66 if deltas[0][1] != deltas[1][1]:
67 print(' we order based on the next field of minimum cost')
68 print(' (equivalent to maximum negative cost).')
69 else:
70 print(' the next field of minimum cost is the same so')
71 print(' we arbitrarily order by test name.')
72 zeroes = [test for delta_cover, benefit, test in deltas
73 if delta_cover == 0]
74 if zeroes:
75 print(' The following test(s) cannot contribute more to coverage')
76 print(' and will be dropped:')
77 print(' ' + ', '.join(zeroes))
78
79 # Greedy ranking by taking the next greatest contributor
80 delta_cover, benefit, test = max( contributions )
81 if delta_cover > 0:
82 ranked.append((test, delta_cover))
83 cost, cover = results.pop(test)
84 if tutor:
85 print(' Ranking %s in round %2i giving extra coverage of: %r'
86 % (test, round, sorted(cover - coveredsofar)))
87 coveredsofar.update(cover)
88 costsofar += cost
89
90 for delta_cover, benefit, test in contributions:
91 if delta_cover == 0:
92 # this test cannot contribute anything
93 noncontributing.append( (test, round) )
94 results.pop(test)
95 if tutor:
96 print('\n## ALL TESTS NOW RANKED OR DISCARDED\n')
97 return coveredsofar, ranked, costsofar, noncontributing
Every block starting if tutor: above has the added code.
Sample outputThe code to call the ranker and print the results is:
98
99
100 totalcoverage, ranking, totalcost, nonranked = greedyranker(results)
101 print('''
102 A total of %i points were covered,
103 using only %i of the initial %i tests,
104 and should take %g time units to run.
105
106 The tests in order of coverage added:
107
108 TEST DELTA-COVERAGE'''
109 % (len(totalcoverage), len(ranking), len(results), totalcost))
110 print('\n'.join(' %6s %i' % r for r in ranking))
The output has a lot of stuff from the tutor followed by the result at the end.
For this pseudo randomly generate test case of 15 tests it shows that only seven are needed to generate the maximum total coverage. (And if you were willing to loose the coverage of three tests that each cover only one additional point then 4 out of 15 tests would give 92.5% of the maximum coverage possible).
## Round 1
Covered so far: 0 points:
Ranked so far: []
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
14, -2.08, test_00
14, -100.00, test_04
13, -40.62, test_14
13, -58.04, test_01
12, -4.46, test_05
12, -26.79, test_12
12, -34.82, test_02
12, -85.71, test_07
11, -5.73, test_08
11, -15.55, test_09
11, -69.57, test_06
9, -12.05, test_10
9, -16.07, test_13
9, -52.23, test_11
8, -32.74, test_03
Note: This time around, more than one test gives the same
maximum delta contribution of 14 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
Ranking test_00 in round 1 giving extra coverage of: [2, 3, 5, 11, 12, 16, 19, 23, 25, 26, 29, 36, 38, 40]
## Round 2
Covered so far: 14 points:
Ranked so far: ['test_00']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
12, -58.04, test_01
10, -100.00, test_04
9, -12.05, test_10
9, -26.79, test_12
9, -85.71, test_07
8, -4.46, test_05
7, -15.55, test_09
7, -16.07, test_13
7, -40.62, test_14
7, -69.57, test_06
6, -34.82, test_02
5, -5.73, test_08
5, -32.74, test_03
5, -52.23, test_11
Ranking test_01 in round 2 giving extra coverage of: [0, 10, 13, 15, 17, 20, 22, 27, 30, 31, 33, 34]
## Round 3
Covered so far: 26 points:
Ranked so far: ['test_00', 'test_01']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
7, -100.00, test_04
5, -12.05, test_10
5, -26.79, test_12
5, -85.71, test_07
4, -4.46, test_05
3, -5.73, test_08
3, -16.07, test_13
3, -32.74, test_03
3, -34.82, test_02
2, -15.55, test_09
2, -40.62, test_14
1, -52.23, test_11
1, -69.57, test_06
Ranking test_04 in round 3 giving extra coverage of: [1, 4, 6, 7, 8, 9, 18]
## Round 4
Covered so far: 33 points:
Ranked so far: ['test_00', 'test_01', 'test_04']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
4, -12.05, test_10
3, -85.71, test_07
2, -4.46, test_05
2, -32.74, test_03
1, -5.73, test_08
1, -15.55, test_09
1, -26.79, test_12
1, -34.82, test_02
1, -69.57, test_06
0, -16.07, test_13
0, -40.62, test_14
0, -52.23, test_11
The following test(s) cannot contribute more to coverage
and will be dropped:
test_13, test_14, test_11
Ranking test_10 in round 4 giving extra coverage of: [14, 24, 35, 39]
## Round 5
Covered so far: 37 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
1, -4.46, test_05
1, -5.73, test_08
1, -26.79, test_12
1, -32.74, test_03
1, -34.82, test_02
1, -69.57, test_06
0, -15.55, test_09
0, -85.71, test_07
Note: This time around, more than one test gives the same
maximum delta contribution of 1 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
The following test(s) cannot contribute more to coverage
and will be dropped:
test_09, test_07
Ranking test_05 in round 5 giving extra coverage of: [21]
## Round 6
Covered so far: 38 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10', 'test_05']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
1, -5.73, test_08
1, -26.79, test_12
1, -69.57, test_06
0, -32.74, test_03
0, -34.82, test_02
Note: This time around, more than one test gives the same
maximum delta contribution of 1 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
The following test(s) cannot contribute more to coverage
and will be dropped:
test_03, test_02
Ranking test_08 in round 6 giving extra coverage of: [28]
## Round 7
Covered so far: 39 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10', 'test_05', 'test_08']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
1, -26.79, test_12
1, -69.57, test_06
Note: This time around, more than one test gives the same
maximum delta contribution of 1 to the coverage so far
we order based on the next field of minimum cost
(equivalent to maximum negative cost).
Ranking test_12 in round 7 giving extra coverage of: [32]
## Round 8
Covered so far: 40 points:
Ranked so far: ['test_00', 'test_01', 'test_04', 'test_10', 'test_05', 'test_08', 'test_12']
What the remaining tests can contribute, largest contributors first:
# DELTA, BENEFIT, TEST
0, -69.57, test_06
The following test(s) cannot contribute more to coverage
and will be dropped:
test_06
## ALL TESTS NOW RANKED OR DISCARDED
A total of 40 points were covered,
using only 7 of the initial 15 tests,
and should take 209.15 time units to run.
The tests in order of coverage added:
TEST DELTA-COVERAGE
test_00 14
test_01 12
test_04 7
test_10 4
test_05 1
test_08 1
test_12 1
What should be nextThere is a new Unified Coverage Interoperability Standard for a database for storing test coverage data ideally the greedy ranker should be hooked up to that UCIS DB to get its inputs via its C-interface or maybe its XML output instead of parsing text files.
AddendumRandom results dict creatorAs used for testing:
def cover_creator(ntests=25, maxcoverpoints=100):
import random
results = {}
coveredrange = (maxcoverpoints * 1 // 6,
1 + maxcoverpoints * 2 // 6)
print coveredrange
for test in range(ntests):
name = 'test_%02i' % test
covered = sorted(set(random.randint(0, maxcoverpoints-1)
for i in range(random.randint(*coveredrange))))
time = len(covered) * (100 + (random.random() - 0.5) * 40) / 100.0
results[name] = ( float('%6.2f' % time), set(covered))
return results
END.
Tomer Filiba: TTYs: Never gets boring
Just a short rant: I'm working on an interactive console used for debugging a computer cluster. It connects to all nodes in the cluster and provides you with a single place to run queries. It uses the new (not yet officially-released) zero-deploy feature of RPyC, which sets up a secure, single-use RPyC server on a machine, requiring only SSH access. Once the client connection closes, the zero-deployed server will shut down and delete itself from the file system.
It's a cool feature on its own (and I'll blog about it soon), but there's a reason I'm getting you through all of the details here. You see, the debugging console fires up SSH subprocesses in the background, over which RPyC connections are tunneled... and then the strangest thing happened. I was running a query which was taking too long and hit Ctrl+C to kill it and return to the interpreter. The query indeed stopped, but all of my RPyC connections have died with it. Huh?
Here's a really short way to reproduce this scenario:
>>> from subprocess import Popen, PIPE >>> p=Popen(["sleep", "60"], stdin=PIPE, stdout=PIPE, stderr=PIPE) >>> >>> p.poll() # poll() returns None as the process is still running in the background >>> >>> # now hit Ctrl+C in the interactive prompt KeyboardInterrupt >>> >>> p.poll() # and voila, `sleep` was killed by SIGINT -2It's terribly confusing at first, but it happens because child processes inherit their paren't session ID. Terminal events, such as SIGINT and SIGHUP, are dispatched to all processes belonging to the terminal's process group, so it's not just the Python interpreter to receive the signal -- every child process it spawned will also suffer. In my case, it killed all of the SSH tunnels I had set up.
The solution is to setsid before execing the child:
>>> import os >>> p=Popen(["sleep", "60"], stdin=PIPE, stdout=PIPE, stderr=PIPE, preexec_fn=os.setsid) >>> p.poll() >>> KeyboardInterrupt >>> p.poll() >>>So I had to add this feature to plumbum, and while I was at it, I also added daemonization support. In other words, I'll have to release 1.3 soon -- even though I released 1.2 not two weeks ago. Life's a bitch and TTYs are the mother of all monsters :)