FLOSS Project Planets

Breeze Icons coming to LibreOffice

Planet KDE - Wed, 2015-03-25 08:27

Of course, with Free Software, there is a challenge. Help us find the missing icons.

Categories: FLOSS Project Planets

Breeze love LibreOffice

Planet KDE - Wed, 2015-03-25 06:20

Jonathan Riddell ask on planet kde in November 2014 if we can make a Breeze icon set for LibreOffice. He also start a wiki page about the used icons. In the end of November Uri the main icon designer make 150 LO icons and after that the infrastructure in LO was prepared to have the posibility for another icon set (https://gerrit.libreoffice.org/#/c/13043/)

Now 4 Months later I proudly present the new LO Breeze icon set. We make more than 2.500 icons so that the Breeze LO icon set is mature. In future we can offer Oxygen and Breeze icons for LibreOffice. What do you say? I say whow. And I say thanks to the LibreOffice design team to offer us the opportunity to customize LO and for the warm welcome. Maybe breeze will be the backup for Sifr the LO monochrome icon set, because breeze is much more complete than Sifr. You be welcome to change this ;-)

Now we have an easter egg challange for you. The last missing icons are sometimes not easy to find. So download the daily build (please wait until 26.03. I was to fast with the blog post) with the new breeze icons and search for missing icons. please comment on the blog post with an screenshot and a short description.

 

In addition to LO we also make Breeze icons for Kdenlive, Labplot, Yakuake and now I start with Digikam. If you want Breeze and Breeze-dark icons for your app, leave a post on the VDG Forum or an Issue on the Breeze Git repository.

We have also a realy nice resolution for icon temes, look and feel packages (megathemes) and app specific icons. YOU as the app developer can say which icon set you will support primary in your app. For example Oxygen. So you have to offer all app icons for oxygen. If the user want to use breeze-dark in the breeze-dark iconset from the system the app specific icons are included. So the user get an consistent look and feel on breeze and breeze-dark and the developers can offer what they like.

 

Of corse you can join us. Monochrome icons are sometimes difficult to find the right recognizion but easy to design. At google code in I had some realy realy good icon designers like Artem (no designer) or Buno. I will hope they join us again and you too. So leave a icon on https://github.com/NitruxSA/plasma-next-icons.

I ask my google code in student Artem why he support KDE:

My name is Artem, i’m 17 years old programmer from Amursk, a small town in the far east Russia. I’m study in public school and lead an ordinary life. I support KDE because… To be honest, i don’t know. It’s just fun. I do useful things and its great. Btw,i don’t even use Linux (i used to use it, and i wish i will switch my main os to kde based linux distro in the future, when dotnet will be officially crossplatform).


Categories: FLOSS Project Planets

Richard Hartmann: Visiting Hongkong and Shenzhen

Planet Debian - Wed, 2015-03-25 05:56

TSDgeos had a good idea:

Lazyweb travel recommodations.

So, dear lazyweb: What are things to do or to avoid in Hongkong and Shenzhen if you have one and a half week of holiday before and after work duties? Any hidden gems to look at? What electronic markets are good? Should I take a boat trip around the waters of Hongkong?

If you have any decent yet affordable sleeping options for 2-3 nights in Hongkong, that would also be interesting as my "proper" hotel stay does not start immediately. Not much in ways of comfort is needed as long as I have a safe place to lock my belongings.

In somewhat related news, this Friday's bug report stats may be early or late as I will be on a plane towards China on Friday.

Categories: FLOSS Project Planets

François Marier: Keeping up with noisy blog aggregators using PlanetFilter

Planet Python - Wed, 2015-03-25 05:55

I follow a few blog aggregators (or "planets") and it's always a struggle to keep up with the amount of posts that some of these get. The best strategy I have found so far to is to filter them so that I remove the blogs I am not interested in, which is why I wrote PlanetFilter.

Other options

In my opinion, the first step in starting a new free software project should be to look for a reason not to do it So I started by looking for another approach and by asking people around me how they dealt with the firehoses that are Planet Debian and Planet Mozilla.

It seems like a lot of people choose to "randomly sample" planet feeds and only read a fraction of the posts that are sent through there. Personally however, I find there are a lot of authors whose posts I never want to miss so this option doesn't work for me.

A better option that other people have suggested is to avoid subscribing to the planet feeds, but rather to subscribe to each of the author feeds separately and prune them as you go. Unfortunately, this whitelist approach is a high maintenance one since planets constantly add and remove feeds. I decided that I wanted to follow a blacklist approach instead.

PlanetFilter

PlanetFilter is a local application that you can configure to fetch your favorite planets and filter the posts you see.

If you get it via Debian or Ubuntu, it comes with a cronjob that looks at all configuration files in /etc/planetfilter.d/ and outputs filtered feeds in /var/cache/planetfilter/.

You can either:

  • add file:///var/cache/planetfilter/planetname.xml to your local feed reader
  • serve it locally (e.g. http://localhost/planetname.xml) using a webserver, or
  • host it on a server somewhere on the Internet.

The software will fetch new posts every hour and overwrite the local copy of each feed.

A basic configuration file looks like this:

[feed] url = http://planet.debian.org/atom.xml [blacklist] Filters

There are currently two ways of filtering posts out. The main one is by author name:

[blacklist] authors = Alice Jones John Doe

and the other one is by title:

[blacklist] titles = This week in review Wednesday meeting for

In both cases, if a blog entry contains one of the blacklisted authors or titles, it will be discarded from the generated feed.

Tor support

Since blog updates happen asynchronously in the background, they can work very well over Tor.

In order to set that up in the Debian version of planetfilter:

  1. Install the tor and polipo packages.
  2. Set the following in /etc/polipo/config:

    proxyAddress = "127.0.0.1" proxyPort = 8008 allowedClients = 127.0.0.1 allowedPorts = 1-65535 proxyName = "localhost" cacheIsShared = false socksParentProxy = "localhost:9050" socksProxyType = socks5 chunkHighMark = 67108864 diskCacheRoot = "" localDocumentRoot = "" disableLocalInterface = true disableConfiguration = true dnsQueryIPv6 = no dnsUseGethostbyname = yes disableVia = true censoredHeaders = from,accept-language,x-pad,link censorReferer = maybe
  3. Tell planetfilter to use the polipo proxy by adding the following to /etc/default/planetfilter:

    export http_proxy="localhost:8008" export https_proxy="localhost:8008"
Bugs and suggestions

The source code is available on repo.or.cz.

I've been using this for over a month and it's been working quite well for me. If you give it a go and run into any problems, please file a bug!

I'm also interested in any suggestions you may have.

Categories: FLOSS Project Planets

Francois Marier: Keeping up with noisy blog aggregators using PlanetFilter

Planet Debian - Wed, 2015-03-25 05:55

I follow a few blog aggregators (or "planets") and it's always a struggle to keep up with the amount of posts that some of these get. The best strategy I have found so far to is to filter them so that I remove the blogs I am not interested in, which is why I wrote PlanetFilter.

Other options

In my opinion, the first step in starting a new free software project should be to look for a reason not to do it So I started by looking for another approach and by asking people around me how they dealt with the firehoses that are Planet Debian and Planet Mozilla.

It seems like a lot of people choose to "randomly sample" planet feeds and only read a fraction of the posts that are sent through there. Personally however, I find there are a lot of authors whose posts I never want to miss so this option doesn't work for me.

A better option that other people have suggested is to avoid subscribing to the planet feeds, but rather to subscribe to each of the author feeds separately and prune them as you go. Unfortunately, this whitelist approach is a high maintenance one since planets constantly add and remove feeds. I decided that I wanted to follow a blacklist approach instead.

PlanetFilter

PlanetFilter is a local application that you can configure to fetch your favorite planets and filter the posts you see.

If you get it via Debian or Ubuntu, it comes with a cronjob that looks at all configuration files in /etc/planetfilter.d/ and outputs filtered feeds in /var/cache/planetfilter/.

You can either:

  • add file:///var/cache/planetfilter/planetname.xml to your local feed reader
  • serve it locally (e.g. http://localhost/planetname.xml) using a webserver, or
  • host it on a server somewhere on the Internet.

The software will fetch new posts every hour and overwrite the local copy of each feed.

A basic configuration file looks like this:

[feed] url = http://planet.debian.org/atom.xml [blacklist] Filters

There are currently two ways of filtering posts out. The main one is by author name:

[blacklist] authors = Alice Jones John Doe

and the other one is by title:

[blacklist] titles = This week in review Wednesday meeting for

In both cases, if a blog entry contains one of the blacklisted authors or titles, it will be discarded from the generated feed.

Tor support

Since blog updates happen asynchronously in the background, they can work very well over Tor.

In order to set that up in the Debian version of planetfilter:

  1. Install the tor and polipo packages.
  2. Set the following in /etc/polipo/config:

    proxyAddress = "127.0.0.1" proxyPort = 8008 allowedClients = 127.0.0.1 allowedPorts = 1-65535 proxyName = "localhost" cacheIsShared = false socksParentProxy = "localhost:9050" socksProxyType = socks5 chunkHighMark = 67108864 diskCacheRoot = "" localDocumentRoot = "" disableLocalInterface = true disableConfiguration = true dnsQueryIPv6 = no dnsUseGethostbyname = yes disableVia = true censoredHeaders = from,accept-language,x-pad,link censorReferer = maybe
  3. Tell planetfilter to use the polipo proxy by adding the following to /etc/default/planetfilter:

    export http_proxy="localhost:8008" export https_proxy="localhost:8008"
Bugs and suggestions

The source code is available on repo.or.cz.

I've been using this for over a month and it's been working quite well for me. If you give it a go and run into any problems, please file a bug!

I'm also interested in any suggestions you may have.

Categories: FLOSS Project Planets

KnackForge: PDOException - SQLSTATE[22003] - Numeric value out of range

Planet Drupal - Wed, 2015-03-25 05:20
This blog describes how to solve PDOException - SQLSTATE[22003] - Numeric value out of range: 1264 Out of range. When you try to store large integer value in 'integer' field type, then you will get this error. It is because the maximum value of 'integer' field type is exceeded.  For example, you want to store phone number in a content type. Then you may create new field with 'integer' field type in the content type. When you store 10 digit phone number in this field, "PDOException error" will be shown. MySQL allows you to store values from -2147483648 to 2147483647 in integer field type if signed, so you can't store phone number in 'integer' field type. MySQL allocates 4 bytes for 'integer' field type ie., 4 bytes = 4 x 8 = 32 bits 2^32 => 4294967296 values 2^32 - 1 => 4294967295 (if unsigned, 0 to 4294967295 values) -2^(32 - 1) to (2^(32 - 1) - 1) => -2147483648 to 2147483647 (if signed, -2147483648 to 2147483647 values). If you want to store large integer value, then you need to use field type either 'bigint' or 'text field'. If you choose 'bigint', then there is no need to add custom validation for phone number field. On the other hand, if you choose 'text field' then you need to add custom validation using either hook_form_alter() or hook_form_form_id_alter(). <?php /** * Implement hook_form_alter() */ function kf_form_alter(&$form, &$form_state, $form_id) { if ($form_id == 'your_form_id') {
Categories: FLOSS Project Planets

Scaling

Planet KDE - Wed, 2015-03-25 04:36

I’ve visited both FOSDEM and SCALE over the last weeks, where spoke with dozens of people and gave talks about ownCloud 8. We’ve been getting a lot of positive feedback on the work we’re doing in ownCloud (thanks!) and that has been very motivating.

Does it scale?

A question which comes up frequently is: “What hardware should I run ownCloud on?” This sounds like a simple questions but if you give it a second thought, it is actually not so easy to answer. I had a small cubox on the booth as a demonstration that this is a way to run ownCloud. But development boards like the Raspberry Pi and the cubox might give the impression ownCloud is only suitable for very small installations – while in reality, worlds’ largest on-premise sync and share deployment has brought ownCloud to 500,000 students! So, ownCloud scales, and that introduces the subject of this blog.

If you look up the term scalability on wikipedia you get the explanation that software scales well if you get double the performance gain out of it if you throw twice the hardware at the problem. This is called linear scalability, and rarely if ever achieved.

The secret to scalability

ownCloud runs on small Raspberry Pi’s for your friends and family at home but also on huge clusters of web servers where it can serve over hundreds of thousands of users and petabytes of data. The current Raspberry Pi doesn’t deliver blazing fast performance but it works and the new raspberry pi 2 announced last month should be great hardware for small ownCloud deployments. Big deployments like the one in Germany or at CERN are usually ‘spread out’ over multiple servers, which brings us to the secret sauce that makes scalable software possible.

This secret to building great scalable software is to avoid central components that can be bottlenecks and use components that can easily be clustered by just adding just more server nodes.

How ownCloud scales

The core ownCloud Server is written in PHP which usually runs together with a web server like Apache or ISS on an application server like Linux or Windows. There is zero communication needed between the application nodes and the load can be distributed between different application servers by standard HTTP load balancers. This scales completely linear so if you want to handle double the load because you have double the users, you can just double the number of application servers making ownCloud perfectly scalable software.

Unfortunately an ownCloud deployment still depends on a few centralized components that have the potential to become bottlenecks to scalability. These components are typically the file system, database, load balancer and sometimes session management. Let’s talk about each of those and what can be done to address potential performance issues in scaling them.

File system scalability

The file system is where ownCloud has its data stored, and it is thus very important for performance. The good news is that file systems are usually fast enough to not slow down ownCloud. A modern SSD, RAID setup, NFS server or Object Store can deliver data rates that are a lot faster than the typical internet network uplinks so you rarely bump into limits with data storage. And if you do, there are solutions like GlusterFS which help you scale performance quite easily. On the App server, a properly setup and configured Temp directory is important to achieve good performance as data has to flow via the ownCloud installation to the user (sync clients or web interface).

Sometimes, this isn’t enough. If you have to handle petabytes of data, ownCloud 8 offers a solution developed together with CERN. This solution lets ownCloud act as a storage broker to direct read and write requests directly to the storage node where the data resides. The result is that that no actual payload flows through the ownCloud servers anymore and the storage performance is entirely dependent upon the data storage itself. Thanks to this solution, the file system storage should never be the bottle neck for ownCloud.

Database scalability

ownCloud uses a database to store all kind of metadata so it depends on a database which is very fast and scalable to keep performance up. ownCloud can use enterprise databases like MSSQL and Oracle DB which offer all kinds of fancy clustering options. Unfortunately they are proprietary and not necessarily cheap. Luckily there are Open Source alternatives like PostgreSQL and MySQL/MariaDB which also offer impressive clustering and scalability options. Especially MySQL combined with a Galera cluster is a very nice and fast option that is used by a lot of the larger ownCloud installations.

Note that scalability doesn’t always mean big! Scalability also means that ownCloud should run fine on very tiny systems. Some embedded systems like the first Raspberry Pi had very limited RAM. In such situations it is nice to use SQLite which is embedded in the PHP runtime and has a very tiny memory footprint, saving precious system resources. This is is all about choice for the right system size!

load balancer scalability

If you have more than one application server than you need a way to distribute the incoming requests to the different servers. ownCloud uses a standard protocol like HTTP so that off the shelf solutions can be used for load balancing. There are standard and enterprise grade appliances from companies like F5 that are very fast and reliable if used for redundancy with a heat beat. Nowadays there are also very good and affordable options like the Open Source HAProxy on top of a standard Linux system available. This also works very well and is very fast. If you really have a lot of traffic and don’t want to buy hardware appliances you can combine several redundant HAProxy servers with DNS round robin. This has to be done very carefully so that you don’t compromise your high availability. There are several blogs and articles out there describing how to set up a system like this.

Session management scalability

There are two fundamentally different ways to do session management which are both supported by ownCloud. One is local session management on the application servers. The other is a centralized session management. I don’t want to discuss the pros and cons of both solutions here because they are a lot of aspects to consider. But with regards to scalability I want to mention that the simpler solution to have local session management together with sticky connections has the benefit that it does not need a central component. This means that it provides linear scalability. If a centralized session management is used then something like memcached is recommended and supported by ownCloud because it can also scale easily internally.

summary

ownCloud has been designed to scale from tiny embedded systems like a Raspberry Pi for a few users to a standard Linux server for a small workgroup to a big cluster for several hundred thousands of users. A wide variety of solutions and technologies can be used to make this possible and if you are interested in ways how to do this than have a look at the owncloud documentation for more information and look at the third party resources and white papers available for this on owncloud.com

Categories: FLOSS Project Planets

KnackForge: Drupal - Updating File name

Planet Drupal - Wed, 2015-03-25 03:45

If you know the file id, it is really simple,
Code:

$file = file_load($fid); file_move($file, 'public://new_file_name);

How it works:

We need a source file object to move file to the new location and update the files database entry. Moving a file is performed by copying the file to the new location and then deleting the original. To get source file object, we can use file_load() function.
file_load($fid) - loads a single file object from the database
$fid - file id
return object representing the file, or FALSE if the file was not found.

After that, we can use file_move() function to move the file new location and delete the original file..

file_move($file, 'public://new_file_name)

Parameter1 - Source File Object
Parameter2 - A string containing the destination that $source should be moved to. This must be a stream wrapper URI.
Parameter3 - Replace behaviour (Default value : FILE_EXISTS_RENAME)

For more details, check here.
 

Categories: FLOSS Project Planets

Kristian Polso: My talk on Drupal security in Drupal Days Milan

Planet Drupal - Wed, 2015-03-25 03:06
So I just came back from European Drupal Days in Milan. I had great fun at the event, it was well organized and filled with interesting talks. I'll be sure to attend it next year too!
Categories: FLOSS Project Planets

Python Software Foundation: Raspberry Pi 2: Even More Delicious!

Planet Python - Tue, 2015-03-24 22:26
For those of you not familiar with the Raspberry Pi Foundation, this UK based educational charity provides fun projects and opportunities for bringing coding literacy to students in the UK and to learners all over the world. This blog previously featured two of their projects, Astro Pi and Unicef’s Pi4Learning. There are many more, including Piper which uses the game, Minecraft, to teach electronics to kids, or the use of Raspberry Pis on weather balloons to observe and record (from the UK) today’s solar eclipse, or Picademy, which teaches programming skills to teachers (for these projects and many more, see RPF Blog). The one thing these widely-varied projects have in common is that they all rely on the high-performing, incredibly affordable, versatile, and fun to use Raspberry Pi! First produced for sale by the RP Foundation in 2011, the device has become hugely popular, with over 5 million in use around the world. And it just got even better!  The new Raspberry Pi 2 went on sale in February 2015. The reviews have begun pouring in, and the consensus is that it’s truly great!  Still selling for a mere $35 USD, still the size of a credit card, and of course still pre-loaded with Python (along with Scratch, Wolfram Mathematica, and much more), the new Raspberry Pi features increased speed and functionality over the B and B+ models. With 900MHz, quad-core ARM Cortex-A7 CPU, and 1 full GB of RAM (over model B+’s 512 MB), it’s been benchmarked at speeds of 6 to almost 10 times faster than the first B model (see Tao of MACPC World). Its 4-core processor can run all ARM GNU/Linux distributions and the new Pi is fully compatible with the earlier models. In addition, Windows is poised to release a version 10 that will work with the Pi, thus increasing its already broad appeal and versatility (see Raspberry Pi 2).
photo credit: da.wikipedia.org, under CC license Features it retains from the previous Model B+ include 4 USB ports, HDMI, Ethernet, Micro SD, Broadcom VideoCore IV Graphics, and Sound Card outputs via HDMI and 3.5mm analogue output (see PC Pro). Currently the ties between the PSF and the RPF are strong, with many Pythonistas using the Raspberry Pi and many Raspberry Pi projects being done in Python. We hope more people will take a look at this remarkable tool and use it to teach Python, spread programming skills, and put computing power in the hands of anyone who wants it.  I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.
Categories: FLOSS Project Planets

Dear Lazyweb: What to visit in Alaska?

Planet KDE - Tue, 2015-03-24 21:12

I'm holidaying in Alaska for a few weeks around June, anyone has been there and can share the stuff we should totally not miss/do when visiting?

Categories: FLOSS Project Planets

Steinar H. Gunderson: GCC 5 and AutoFDO

Planet Debian - Tue, 2015-03-24 20:22

Buried in the GCC 5 release notes, you can find this:

A new auto-FDO mode uses profiles collected by low overhead profiling tools (perf) instead of more expensive program instrumentation (via -fprofile-generate). SPEC2006 benchmarks on x86-64 improve by 4.7% with auto-FDO and by 7.3% with traditional feedback directed optimization.

This comes from Google, with some more information at this git repository and the GCC wiki, as far as I can tell. The basic idea is that you can do feedback-directed optimization by low-overhead sampling of your regular binaries instead of a specially instrumented one. It is somewhat less effective (you get approx. half the benefit of full FDO, it seems), but it means you don't need to write automated, representative benchmarks—you can just sample real use and feed that into the next build.

Now, question: Would it be feasible to do this for all of Debian? Have people volunteer running perf in the background every now and then (similar to popularity-contest), upload (anonymized) profiles to somewhere, and feed it into package building. (Of course, it means new challenges for reproducible builds, as you get more inputs to take care of.)

Categories: FLOSS Project Planets

Steinar H. Gunderson: GCC 5 and AutoFDO

Planet Debian - Tue, 2015-03-24 20:22

Buried in the GCC 5 release notes, you can find this:

A new auto-FDO mode uses profiles collected by low overhead profiling tools (perf) instead of more expensive program instrumentation (via -fprofile-generate). SPEC2006 benchmarks on x86-64 improve by 4.7% with auto-FDO and by 7.3% with traditional feedback directed optimization.

This comes from Google, with some more information at https://github.com/google/autofdo and https://gcc.gnu.org/wiki/AutoFDO, as far as I can tell. The basic idea is that you can do feedback-directed optimization by low-overhead sampling of your regular binaries instead of a specially instrumented one. It is somewhat less effective (you get approx. half the benefit of full FDO, it seems), but it means you don't need to write automated, representative benchmarks—you can just sample real use and feed that into the next build.

Now, question: Would it be feasible to do this for all of Debian? Have people volunteer running perf in the background every now and then (similar to popularity-contest), upload (anonymized) profiles to somewhere, and feed it into package building. (Of course, it means new challenges for reproducible builds, as you get more inputs to take care of.)

Categories: FLOSS Project Planets

Python Software Foundation: Manuel Kaufmann and Python in Argentina

Planet Python - Tue, 2015-03-24 20:15
Several recent blog posts have focused on Python-related and PSF-funded activities in Africa and the Middle East. But the Python community is truly global, and it has been exciting to witness its continued growth. New groups of people are being introduced to Python and to programming so frequently that it’s difficult to keep up with the news. Not only that, but the scope and lasting impact of work being accomplished by Pythonistas with very modest financial assistance from the PSF is astonishing. 

One example is the recent work in South America by Manuel Kaufmann. Manuel’s project is to promote the use of Python “to solve daily issues for common users." His choice of Python as the best language to achieve this end is due to his commitment to "the Software Libre philosophy,” in particular, collaboration rather than competition, as well as Python's ability "to develop powerful and complex software in an easy way."

Toward this end, one year ago, Manuel began his own project, spending his own money and giving his own time, traveling to various South American cities by car (again, his own), organizing meet-ups, tutorials, sprints, and other events to spread the word about Python and its potential to solve everyday problems (see Argentina en Python).

This definitely got the PSF's attention, so in January 2015, the PSF awarded him a $3,000 (USD) grant. With this award, Manuel has been able to continue his work, conducting events that have established new groups that are currently expanding further. This ripple effect of a small investment is something that the PSF has seen over and over again.

On January 17, Resistencia, Argentina was the setting for its first-ever Python Sprint. It was a fairly low-key affair, held at a pub/restaurant “with good internet access.” There were approximately 20 attendees (including 4 young women), who were for the most part beginners. After a general introduction, they broke into 2 work groups, with Manuel leading the beginners' group (see Resistencia, Chaco Sprint), by guiding them through some introductory materials and tutorials (e.g., Learning Python from PyAr's wiki).
Foto grupal con todos los asistentes (group photo of all attendees). 
Photo credit: Manuel Kaufmann

As can happen, momentum built, and the Sprint was followed by a Meet-up on January 30 to consolidate gains and to begin to build a local community. The Meet-up's group of 15 spent the time exploring the capabilities of Python, Brython, Javascript, Django, PHP, OpenStreet Map, and more, in relation to needed projects, and a new Python community was born (see Meetup at Resistencia, Chaco).

The next event in Argentina, the province of Formosa's first official Python gathering, was held on February 14. According to Manuel, it was a great success, attended by around 50 people. The day was structured to have more time for free discussion, which allowed for more interaction and exchange of ideas. In Manuel’s opinion, this structure really helped to forge and strengthen the community. The explicit focus on real world applications, with discussion of a Python/Django software application developed for and currently in use at Formosa’s Tourist Information Office, was especially compelling and of great interest to the attendees. See PyDay Formosa and for pictures, see PyDay Pics.

It looks as though these successes are just the beginning: Manuel has many more events scheduled:
  • 28 Mar - PyDay at Asunción (Gran Asunción, Paraguay and PyDay Asuncion); Manuel reports that registration for this event has already exceeded 100 people, after only 3 days of opening. In addition, the event organizers are working to establish a permanent “Python Paraguay” community!
  • 7 May - PyDay at Apóstoles, Misiones;
  • 20-22 May - Educational Track for secondary students at SciPy LA 2015, Posadas, Misiones, Argentina (SciPy LA and Educational Track); and
  • 30 May - PyDay at Encarnación, Itapúa, Paraguay. 
You can learn more and follow Manuel’s project at the links provided and at Twitter. And stay tuned to this blog, because I plan to cover more of his exciting journey to bring Python, open source, and coding empowerment to many more South Americans.

I would love to hear from readers. Please send feedback, comments, or blog ideas to me at msushi@gnosis.cx.
Categories: FLOSS Project Planets

ThinkShout: The How and Why of Nonprofits Contributing to Open Source

Planet Drupal - Tue, 2015-03-24 20:00

Originally published on February 23rd, 2015 on NTEN.org. Republished with permission.

For the last 15 years or so, we’ve seen consistent growth in nonprofits’ appreciation for how open source tools can support their goals for online engagement. Rarely do we run across an RFP for a nonprofit website redesign that doesn’t specify either Drupal or WordPress as the preferred CMS platform. The immediate benefits of implementing an open source solution are pretty clear:

  • With open source tools, organizations avoid costly licensing fees.

  • Open source tools are generally easier to customize.

  • Open source tools often have stronger and more diverse vendor/support options.

  • Open source platforms are often better suited for integration with other tools and services.

The list goes on… And without going down a rabbit hole, I’ll simply throw out that the benefits of open source go well beyond content management use cases these days.

But the benefits of nonprofits supporting and contributing to these open source projects and communities are a little less obvious, and sometimes less immediate. While our customers generally appreciate the contributions we make to the larger community in solving their specific problems, we still often get asked the following in the sales cycle:

"So let me get this straight: First you want me to pay you to build my organization a website. Then you want me to pay you to give away everything you built for us to other organizations, many of whom we compete with for eyeballs and donations?"

This is a legitimate question! One of the additional benefits of using an open source solution is that you get a lot of functionality "for free." You can save budget over building entirely custom solutions with open source because they offer so much functionality out of the box. So, presumably, some of that saving could be lost if additional budget is spent on releasing code to the larger community.

There are many other arguments against open sourcing. Some organizations think that exposing the tools that underpin their website is a security risk. Others worry that if they open source their solutions, the larger community will change the direction of projects they support and rely upon. But most of the time, it comes down to that first argument:

"We know our organization benefits from open source, but we’re not in a position to give back financially or in terms of our time."

Again, this is an understandable concern, but one that can be mitigated pretty easily with proper planning, good project management, and sound and sustainable engineering practices.

Debunking the Myths of Contributing to Open Source

Myth #1: "Open sourcing components of our website is a security risk."

Not really true. Presumably the concern here is that if a would-be hacker were to see the code that underlies parts of your website, they could exploit security holes in that code. While yes, that could happen, the chances are that working with a software developer who has a strong reputation for contributing to an open source project is pretty safe. More importantly, most strong open source communities, such as the Drupal community, have dedicated security teams and thousands of developers who actively review and report issues that could compromise the security of these contributions. In our experience, unreviewed code and code developed by engineers working in isolation are much more likely to present security risks. And on the off chance that someone in the community does report a security issue, more often than not, the reporter will work with you, for free, to come up with a security patch that fixes the issue.

Myth #2: "If we give away our code, we are giving away our organization’s competitive advantage."

As a software vendor that’s given away code that powers over 45,000 Drupal websites, we can say with confidence: there is no secret sauce. Trust me, all of our competitors use Drupal modules that we’ve released - and vice versa.

By leveraging open source tools, your organization can take advantage of being part of a larger community of practice. And frankly, if your organization is trying to do something new, something that’s not supported by such a community, giving away tools is a great way to build a community around your ideas.

We’ve seen many examples of this. Four years ago, we helped a local nonprofit implement a robust mobile mapping solution on top of the Leaflet Javascript library. At the time, there wasn’t an integration for this library and Drupal. So, as part of this project we asked the client invest 20 hours or so for us release the barebones scaffolding of their mapping tool as a contributed Drupal module.

At first, this contributed module was simply a developer tool. It didn’t have an interface allowing Drupal site builders to use it. It just provided an easier starting point for custom map development. However, this 20 hour starting point lowered the cost for us to build mapping solutions for other clients, who also pitched in a little extra development time here and there to the open source project. Within a few months, the Leaflet module gained enough momentum that other developers from other shops started giving back. Now the module is leveraged on over 5,700 websites and has been supported by code contributions from 37 Drupal developers.

What did that first nonprofit and the other handful of early adopters get for supporting the initial release? Within less than a year of initially contributing to this Drupal module, they opened the door to many tens of thousands of dollars worth of free enhancements to their website and mapping tools.

Did they lose their competitive advantage or the uniqueness of their implementation of these online maps? I think you know what I’m gonna say: No! In fact, the usefulness of their mapping interfaces improved dramatically as those of us with an interest in these tools collaborated and iterated on each other’s ideas and design patterns.

Myth #3: "Contributing to an open source project will take time and money away from solving our organization’s specific problems."

This perception may or may not be true, depending on some of the specifics of the problems your organization is trying to solve. More importantly, this depends upon the approach you use to contribute to an open source project. We’ve definitely seen organizations get buried in the weeds of trying to do things in an open source way. We’ve seen organizations contribute financially to open source projects on spec (on speculation that the project will succeed). This can present challenges. We’ve also seen vendors try to abstract too much of what they’re building for clients up front, and that can lead to problems as well.

Generally, our preferred approach is generally to solve our clients immediate problems first, and then abstract useful bits that can be reused by the community towards the end of the project. There are situations when the abstraction, or the open source contribution, needs to come first. But for the most part, we encourage our clients to solve their own problems first, and in so doing so provide real-life use cases for the solutions that they open source. Then, abstraction can happen later as a way of future-proofing their investment.

Myth #4: "If we open source our tools, we’ll lose control over the direction of the technologies in which we’ve invested."

Don’t worry, this isn’t true! In fact:

Contributing to an open source project is positively selfish.

By this I mean that by contributing to an open source project, your organization actually gets to have a stronger say in the direction of that project. Most open source communities are guided by those that just get up and do, rather than by committee or council.

Our team loves the fact that so many organizations leverage our Drupal modules to meet their own needs. It’s great showing up at nonprofit technology conferences and having folks come up to us to thank us for our contributions. But what’s even better is knowing that these projects have been guided by the direct business needs of our nonprofit clients.

How to Go About Contributing to Open Source

There are a number of ways that your nonprofit organization can contribute to open source. In most of the examples above, we speak to financial contributions towards the release of open source code. Those are obviously great, but meaningful community contributions can start much smaller:

  • Participate in an open source community event. By engaging with other organizations with similar needs, you can help guide the conversation regarding how a platform like Drupal can support your organization’s needs. Events like Drupal Day at the NTC are a great place to start.

  • Host a code sprint or hackathon. Sometimes developers just need a space to hack on stuff. You’d be surprised at the meaningful that connections and support that can come from just coordinating a local hackathon. One of our clients, Feeding Texas, recently took this idea further and hosted a dedicated sprint on a hunger mapping project called SNAPshot Texas. As part of this sprint, four developers volunteered a weekend to helping Feeding Texas build a data visualization of Food Stamp data across the state. This effort built upon the work of Feeding America volunteers across the country and became a cornerstone of our redesign of FeedingTexas.org. Feeding Texas believes so strongly in the benefits they received from this work that they felt comfortable open sourcing their entire website on GitHub.

Of course, if your organization is considering a more direct contribution to an open source project, for example, by releasing a module as part of a website redesign, we have some advice for you as well:

  • First and foremost, solve your organization’s immediate problems first. As mentioned earlier in the article, the failure of many open source projects is that their sponsors have to handle too many use cases all at once. Rest assured that if you solve your organization’s problems, you’re likely to create something that’s useful to others. Not every contribution needs to solve every problem.

  • Know when to start with abstraction vs. when to end with abstraction. We have been involved in client-driven open source projects, such as the release of RedHen Raiser, a peer-to-peer fundraising platform, for which the open source contribution needed to be made first, before addressing our client’s specific requirements. In the case of RedHen Raiser, the Capital Area Food Bank of Washington, DC came to us with a need for a Drupal-based peer-to-peer fundraising solution. Learning that nothing like that existed, they were excited to help us get something started that they could then leverage. In this case, starting with abstraction made the most sense, given the technical complexities of releasing such a tool on Drupal. However, for the most part, the majority of open source contributions come from easy wins that are abstracted after the fact. Of course, there’s no hard and fast rule about this - it’s just something that you need to consider.

  • Celebrate your contributions and the development team! It might sound silly, but many software nerds take great pride in just knowing that the stuff they build is going to be seen by their peers. By offering to open source even just small components of your project, you are more likely to motivate your development partners. They will generally work harder and do better work, which again adds immediate value to your project.

In conclusion, I hope that this article helps you better understand that there’s a lot of value in contributing to open source. It doesn’t have to be that daunting of an effort and it doesn’t have to take you off task.

Categories: FLOSS Project Planets

Matthew Rocklin: Partition and Shuffle

Planet Python - Tue, 2015-03-24 20:00

This work is supported by Continuum Analytics and the XDATA Program as part of the Blaze Project

This post primarily targets developers.

tl;dr We partition out-of-core dataframes efficiently.

Partition Data

Many efficient parallel algorithms require intelligently partitioned data.

For time-series data we might partition into month-long blocks. For text-indexed data we might have all of the “A”s in one group and all of the “B”s in another. These divisions let us arrange work with foresight.

To extend Pandas operations to larger-than-memory data efficient partition algorithms are critical. This is tricky when data doesn’t fit in memory.

Partitioning is fundamentally hard Data locality is the root of all performance -- A Good Programmer

Partitioning/shuffling is inherently non-local. Every block of input data needs to separate and send bits to every block of output data. If we have a thousand partitions then that’s a million little partition shards to communicate. Ouch.

Consider the following setup

100GB dataset / 100MB partitions = 1,000 input partitions

To partition we need shuffle data in the input partitions to a similar number of output partitions

1,000 input partitions * 1,000 output partitions = 1,000,000 partition shards

If our communication/storage of those shards has even a millisecond of latency then we run into problems.

1,000,000 partition shards x 1ms = 18 minutes

Previously I stored the partition-shards individually on the filesystem using cPickle. This was a mistake. It was very slow because it treated each of the million shards independently. Now we aggregate shards headed for the same out-block and write out many at a time, bundling overhead. We balance this against memory constraints. This stresses both Python latencies and memory use.

BColz, now for very small data

Fortunately we have a nice on-disk chunked array container that supports append in Cython. BColz (formerly BLZ, formerly CArray) does this for us. It wasn’t originally designed for this use case but performs admirably.

Briefly, BColz is…

  • A binary store (like HDF5)
  • With columnar access (useful for tabular computations)
  • That stores data in cache-friendly sized blocks
  • With a focus on compression
  • Written mostly by Francesc Alted (PyTables) and Valentin Haenel

It includes two main objects:

  • carray: An on-disk numpy array
  • ctable: A named collection of carrays to represent a table/dataframe
Partitioned Frame

We use carray to make a new data structure pframe with the following operations:

  • Append DataFrame to collection, and partition it along the index on known block divisions blockdivs
  • Extract DataFrame corresponding to a particular partition

Internally we invent two new data structures:

  • cframe: Like ctable this stores column information in a collection of carrays. Unlike ctable this maps perfectly onto the custom block structure used internally by Pandas. For internal use only.
  • pframe: A collection of cframes, one for each partition.

Through bcolz.carray, cframe manages efficient incremental storage to disk. PFrame partitions incoming data and feeds it to the appropriate cframe.

Example

Create test dataset

In [1]: import pandas as pd In [2]: df = pd.DataFrame({'a': [1, 2, 3, 4], ... 'b': [1., 2., 3., 4.]}, ... index=[1, 4, 10, 20])

Create pframe like our test dataset, partitioning on divisions 5, 15. Append the single test dataframe.

In [3]: from pframe import pframe In [4]: pf = pframe(like=df, blockdivs=[5, 15]) In [5]: pf.append(df)

Pull out partitions

In [6]: pf.get_partition(0) Out[6]: a b 1 1 1 4 2 2 In [7]: pf.get_partition(1) Out[7]: a b 10 3 3 In [8]: pf.get_partition(2) Out[8]: a b 20 4 4

Continue to append data…

In [9]: df2 = pd.DataFrame({'a': [10, 20, 30, 40], ... 'b': [10., 20., 30., 40.]}, ... index=[1, 4, 10, 20]) In [10]: pf.append(df2)

… and partitions grow accordingly.

In [12]: pf.get_partition(0) Out[12]: a b 1 1 1 4 2 2 1 10 10 4 20 20

We can continue this until our disk fills up. This runs near peak I/O speeds (on my low-power laptop with admittedly poor I/O.)

Performance

I’ve partitioned the NYCTaxi trip dataset a lot this week and posting my results to the Continuum chat with messages like the following

I think I've got it to work, though it took all night and my hard drive filled up. Down to six hours and it actually works. Three hours! By removing object dtypes we're down to 30 minutes 20! This is actually usable. OK, I've got this to six minutes. Thank goodness for Pandas categoricals. Five. Down to about three and a half with multithreading, but only if we stop blosc from segfaulting.

And thats where I am now. It’s been a fun week. Here is a tiny benchmark.

>>> import pandas as pd >>> import numpy as np >>> from pframe import pframe >>> df = pd.DataFrame({'a': np.random.random(1000000), 'b': np.random.poisson(100, size=1000000), 'c': np.random.random(1000000), 'd': np.random.random(1000000).astype('f4')}).set_index('a')

Set up a pframe to match the structure of this DataFrame Partition index into divisions of size 0.1

>>> pf = pframe(like=df, ... blockdivs=[.1, .2, .3, .4, .5, .6, .7, .8, .9], ... chunklen=2**15)

Dump the random data into the Partition Frame one hundred times and compute effective bandwidths.

>>> for i in range(100): ... pf.append(df) CPU times: user 39.4 s, sys: 3.01 s, total: 42.4 s Wall time: 40.6 s >>> pf.nbytes 2800000000 >>> pf.nbytes / 40.6 / 1e6 # MB/s 68.9655172413793 >>> pf.cbytes / 40.6 / 1e6 # Actual compressed bytes on disk 41.5172952955665

We partition and store on disk random-ish data at 68MB/s (cheating with compression). This is on my old small notebook computer with a weak processor and hard drive I/O bandwidth at around 100 MB/s.

Theoretical Comparison to External Sort

There isn’t much literature to back up my approach. That concerns me. There is a lot of literature however on external sorting and they often site our partitioning problem as a use case. Perhaps we should do an external sort?

I thought I’d quickly give some reasons why I think the current approach is theoretically better than an out-of-core sort; hopefully someone smarter can come by and tell me why I’m wrong.

We don’t need a full sort, we need something far weaker. External sort requires at least two passes over the data while the method above requires one full pass through the data as well as one additional pass through the index column to determine good block divisions. These divisions should be of approximately equal size. The approximate size can be pretty rough. I don’t think we would notice a variation of a factor of five in block sizes. Task scheduling lets us be pretty sloppy with load imbalance as long as we have many tasks.

I haven’t implemented a good external sort though so I’m only able to argue theory here. I’m likely missing important implementation details.

Links
  • PFrame code lives in a dask branch at the moment. It depends on a couple of BColz PRs (#163, #164)
Categories: FLOSS Project Planets

Justin Mason: Links for 2015-03-24

Planet Apache - Tue, 2015-03-24 19:58
Categories: FLOSS Project Planets

btmash.com: Using Dev Desktop with Composer Drush

Planet Drupal - Tue, 2015-03-24 19:55

Before recently settling into the path of using Vagrant + Ansible for site development (speaking of Ansible: I absolutely love it and need to blog about some of my fun with that), I had been using Acquia Dev Desktop. Even now, I'll use it from time to time since it is easy to work with.

planet drupaldrush
Categories: FLOSS Project Planets

Simon Josefsson: Laptop indecision

Planet Debian - Tue, 2015-03-24 18:11

I wrote last month about buying a new laptop and I still haven’t made a decision. One reason for this is because Dell doesn’t seem to be shipping the E7250. Some online shops claim to be able to deliver it, but aren’t clear on what configuration it has – and I really don’t want to end up with Dell Wifi.

Another issue has been the graphic issues with the Broadwell GPU (see the comment section of my last post). It seems unlikely that this will be fixed in time for Debian Jessie. I really want a stable OS on this machine, as it will be a work-horse and not a toy machine. I haven’t made up my mind whether the graphics issue is a deal-breaker for me.

Meanwhile, a couple of more sub-1.5kg (sub-3.3lbs) Broadwell i7’s have hit the market. Some of these models were suggested in comments to my last post. I have decided that the 5500U CPU would also be acceptable to me, because some newer laptops doesn’t come with the 5600U. The difference is that the 5500U is a bit slower (say 5-10%) and lacks vPro, which I have no need for and mostly consider a security risk. I’m not aware of any other feature differences.

Since the last round, I have tightened my weight requirement to be sub-1.4kg (sub-3lbs), which excludes some recently introduced models, and actually excludes most of the models I looked at before (X250, X1 Carbon, HP 1040/810). Since I’m leaning towards the E7250, with the X250 as a “reliable” fallback option, I wanted to cut down on the number of further models to consider. Weigth is a simple distinguisher. The 1.4-1.5kg (3-3.3lbs) models I am aware that of that is excluded are the Asus Zenbook UX303LN, the HP Spectre X360, and the Acer TravelMate P645.

The Acer Aspire S7-393 (1.3kg) and Toshiba Kira-107 (1.26kg) would have been options if they had RJ45 ports. They may be interesting to consider for others.

The new models I am aware of are below. I’m including the E7250 and X250 for comparison, since they are my preferred choices from the first round. A column for maximum RAM is added too, since this may be a deciding factor for me. Higher weigth is with touch screens.

Toshiba Z30-B 1.2-1.34kg 16GB 13.3″ 1920×1080 Fujitsu Lifebook S935 1.24-1.36kg 12GB 13.3″ 1920×1080 HP EliteBook 820 G2 1.34-1.52kg 16GB 12.5″ 1920×1080 Dell Latitude E7250 1.25kg 8/16GB? 12.5″ 1366×768 Lenovo X250 1.42kg 8GB 12.5″ 1366×768

It appears unclear whether the E7250 is memory upgradeable, some sites say max 8GB some say max 16GB. The X250 and 820 has DisplayPort, the S935 and Z30-B has HDMI, and the E7250 has both DisplayPort/HDMI. The E7250 does not have VGA which the rest has. All of them have 3 USB 3.0 ports except for X250 that only has 2 ports. The E7250 and 820 claims NFC support, but Debian support is not given. Interestingly, all of them have a smartcard reader. All support SDXC memory cards.

The S935 has an interesting modular bay which can actually fit a CD reader or an additional battery. There is a detailed QuickSpec PDF for the HP 820 G2, haven’t found similar detailed information for the other models. It mentions support for Ubuntu, which is nice.

Comparing these laptops is really just academic until I have decided what to think about the Broadwell GPU issues. It may be that I’ll go back to a fourth-gen i7 laptop, and then I’ll probably pick a cheap reliable machine such as the X240.

Categories: FLOSS Project Planets

Docutils Snippets

Planet KDE - Tue, 2015-03-24 18:00

Last week I had to work with docutils, a Python library to turn reStructuredText (.rst) into documentation. I was using it to extend a Sphinx-based documentation I am setting up. It was quite a frustrating experience: despite loads of search, I could not find any simple, working examples demonstrating Docutils API usage. To save me (and possibly you) some more frustration next time I need to use this library, I am writing down a few examples.

My goal was to create a custom Docutils Directive. A Directive is a class which can be referred to from a .rst file to generate custom content. Its main method is run(), which must return a list of Docutils nodes, representing the custom content. Each node can itself contain other nodes, so run() actually returns a list of node trees.

Available nodes are listed in the Docutils Document Tree. reStructuredText is powerful and expressive, which means creating simple text structures can require quite a lot of nodes, as we shall see.

Let's start with an "Hello World", a simple paragraph:

from docutils import nodes # ... class HelloWorld(Directive): def run(self): para = nodes.paragraph(text='Hello World') return [para]

An error I made a lot when starting was to pass the text of the paragraph as a positional argument. I kept writing that:

nodes.paragraph('Hello World')

Instead of this:

nodes.paragraph(text='Hello World')

It does not work because the first argument of paragraph() is the raw source: the string which would produce the paragraph if it came from a .rst document.

Next example, let's create some sections, the equivalent of this .rst source:

Hello ===== Some text. A Level 2 Title --------------- More text.

The code:

class Sections(Directive): def run(self): section = nodes.section() section += nodes.title(text='Hello') section += nodes.paragraph(text='Some text.') subsection = nodes.section() section += subsection subsection += nodes.title(text='A Level 2 Title') subsection += nodes.paragraph(text='More text.') return [section]

Let's now create a bullet list, like the one which would be created by this .rst:

- Apples - Oranges - Bananas

This is done with a bullet_list node, which contains list_item nodes, which themselves contain paragraph nodes.

class BulletList(Directive): def run(self): fruits = ['Apples', 'Oranges', 'Bananas'] lst = nodes.bullet_list() for fruit in fruits: item = nodes.list_item() lst += item item += nodes.paragraph(text=fruit) return [lst]

And now for something a bit crazier, what about a table? The rough equivalent of:

============ ========== ======== ===== Product Unit Price Quantity Price ------------ ---------- -------- ----- Coffee 2 2 4 Orange Juice 3 1 3 Croissant 1.5 2 3 ============ ========== ======== =====

This one is a bit more involved:

class TableExample(Directive): def run(self): header = ('Product', 'Unit Price', 'Quantity', 'Price') colwidths = (2, 1, 1, 1) data = [ ('Coffee', '2', '2', '4'), ('Orange Juice', '3', '1', '3'), ('Croissant', '1.5', '2', '3'), ] table = nodes.table() tgroup = nodes.tgroup(cols=len(header)) table += tgroup for colwidth in colwidths: tgroup += nodes.colspec(colwidth=colwidth) thead = nodes.thead() tgroup += thead thead += self.create_table_row(header) tbody = nodes.tbody() tgroup += tbody for data_row in data: tbody += self.create_table_row(data_row) return [table] def create_table_row(self, row_cells): row = nodes.row() for cell in row_cells: entry = nodes.entry() row += entry entry += nodes.paragraph(text=cell) return row

That's it for today, hope this was helpful for some of you. If you want to experiment with this, here is the source code for all these examples: docutils_snippets.py.

PS: I am no Docutils expert, this article may suggest wrong ways to do things, please leave a comment if you notice any error.

Categories: FLOSS Project Planets
Syndicate content