FLOSS Project Planets

Gocept Weblog: Move documentation from pythonhosted.org to readthedocs.io

Planet Python - Fri, 2017-09-22 02:28

Today we migrated the documentation of zodb.py3migrate from pythonhosted.org to zodbpy3migrate.readthedocs.io.

This requires a directory – for this example I name it redir – containing a file named index.html with the following content:

<html> <head> <title>zodb.py3migrate</title> <meta http-equiv="refresh" content="0; url=http://zodbpy3migrate.rtfd.io" /> </head> <body> <p> <a href="http://zodbpy3migrate.rtfd.io"> Redirect to zodbpy3migrate.rtfd.io </a> </p> </body> </html>

To upload it to pythonhosted.org I called:

py27 setup.py upload_docs --upload-dir=redir

Now pythonhosted.org/zodb.py3migrate points to read the docs.

Credits: The HTML was taken from the Trello board of the avocado-framework.


Categories: FLOSS Project Planets

Sandipan Dey: Some more Social Network Analysis with Python: Centrality, PageRank/HITS, Random Network Generation Models, Link Prediction

Planet Python - Thu, 2017-09-21 20:02
In this article, some more social networking concepts will be illustrated with a few problems. The problems appeared in the programming assignments in the coursera course Applied Social Network Analysis in Python.  The descriptions of the problems are taken from the assignments. The analysis is done using NetworkX. The following theory is going to be used to solve … Continue reading Some more Social Network Analysis with Python: Centrality, PageRank/HITS, Random Network Generation Models, Link Prediction
Categories: FLOSS Project Planets

Sandipan Dey: Some more Social Network Analysis with Python

Planet Python - Thu, 2017-09-21 20:02
In this article, some more social networking concepts will be illustrated with a few problems. The problems appeared in the programming assignments in the coursera course Applied Social Network Analysis in Python.  The descriptions of the problems are taken from the assignments. The analysis is done using NetworkX.   1. Measures of  Centrality In this assignment, we explore … Continue reading Some more Social Network Analysis with Python
Categories: FLOSS Project Planets

Lullabot: The State of Media in Drupal Core

Planet Drupal - Thu, 2017-09-21 20:00
Matt and Mike talk with Drupal Media Initiative folks, Janez Urevc, Sean Blommaert, and Lullabot's own Marcos Cano about getting modern media management in Drupal core.
Categories: FLOSS Project Planets

Justin Mason: Links for 2017-09-21

Planet Apache - Thu, 2017-09-21 19:58
Categories: FLOSS Project Planets

parallel @ Savannah: GNU Parallel 20170922 ('Mexico City') released

GNU Planet! - Thu, 2017-09-21 19:09

GNU Parallel 20170922 ('Mexico City') has been released. It is available for download at: http://ftpmirror.gnu.org/parallel/

Haiku of the month:

--limit can
limit jobs dynamic'ly
given a command
--ole-tange

New in this release:

  • Use '--limit myprog' to make a dynamic job limit. Just return 0 to spawn another job, 1 to not spawn another job, and 2 to kill the youngest job.
  • PARALLEL_RSYNC_OPTS and --rsync-opts sets the options for rsync (Default: -rlDzR).
  • Bug fixes and man page updates.

GNU Parallel - For people who live life in the parallel lane.

About GNU Parallel

GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with: (wget -O - pi.dk/3 || curl pi.dk/3/) | bash

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial (man parallel_tutorial). Your commandline will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2011): GNU Parallel - The Command-Line Power Tool, ;login: The USENIX Magazine, February 2011:42-47.

If you like GNU Parallel:

  • Give a demo at your local user group/team/colleagues
  • Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
  • Get the merchandise https://www.gnu.org/s/parallel/merchandise.html
  • Request or write a review for your favourite blog or magazine
  • Request or build a package for your favourite distribution (if it is not already there)
  • Invite me for your next conference

If you use programs that use GNU Parallel for research:

  • Please cite GNU Parallel in you publications (use --citation)

If GNU Parallel saves you money:

About GNU SQL

GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.

About GNU Niceload

GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

Categories: FLOSS Project Planets

Clint Adams: PTT

Planet Debian - Thu, 2017-09-21 18:32

“Hello,” said Adrian, but Adrian was lying.

“My name is Adrian,” said Adrian, but Adrian was lying.

“Today I took a pic of myself pulling a train,” announced Adrian.

Posted on 2017-09-21 Tags: bgs
Categories: FLOSS Project Planets

Enrico Zini: Systemd Truelite course

Planet Debian - Thu, 2017-09-21 18:00

These are the notes of a training course on systemd I gave as part of my work with Truelite.

There is quite a lot of material, so I split them into a series of posts, running once a day for the next 9 days.

Units

Everything managed by systemd is called a unit (see man systemd.unit), and each unit is described by a configuration in ini-style format.

For example, this unit continuously plays an alarm sound when the system is in emergency or rescue mode:

[Unit] Description=Beeps when in emergency or rescue mode DefaultDependencies=false StopWhenUnneeded=true [Install] WantedBy=emergency.target rescue.target [Service] Type=simple ExecStart=/bin/sh -ec 'while true; do /usr/bin/aplay -q /tmp/beep.wav; sleep 2; done'

Units can be described by configuration files, which have different extensions based on what kind of thing they describe:

  • .service: daemons
  • .socket: communication sockets
  • .device: hardware devices
  • .mount: mount points
  • .automount: automounting
  • .swap: swap files or partitions
  • .target: only dependencies, like Debian metapackages
  • .path: inotify monitoring of paths
  • .timer: cron-like activation
  • .slice: group processes for common resource management
  • .scope: group processes for common resource management

System unit files can be installed in:

  • /lib/systemd/system/: for units provided by packaged software
  • /run/systemd/system/: runtime-generated units
  • /etc/systemd/system/: for units provided by systemd administrators

Unit files in /etc/ override unit files in /lib/. Note that while Debian uses /lib/, other distributions may use /usr/lib/ instead.

If there is a directory with the same name as the unit file plus a .d suffix, any file *.conf it contains is parsed after the unit, and can be used to add or override configuration options.

For example:

  • /lib/systemd/system/beep.service.d/foo.conf can be used to tweak the contents of /lib/systemd/system/beep.service, so it is possible for a package to distribute a tweak to the configuration of another package.
  • /etc/systemd/system/beep.service.d/foo.conf can be used to tweak the contents of /lib/systemd/system/beep.service, so it is possible a system administrator to extend a packaged unit without needing to replace it entirely.

Similarly, a unitname.wants/ or unitname.requires/ directory can be used to extend Wants= and Requires= dependencies on other units, by placing symlinks to other units in them.

See also:

Categories: FLOSS Project Planets

Brian Okken: What info do you need to start testing?

Planet Python - Thu, 2017-09-21 16:42

I got some feedback today from someone who said they were looking through the pythontesting.net/start-here page and were still a bit confused as to how to start testing. I totally understand that and want to improve it. I want to make the site more friendly to people new to testing. What info would help? (try […]

The post What info do you need to start testing? appeared first on Python Testing.

Categories: FLOSS Project Planets

Short update on recent UX improvements

Planet KDE - Thu, 2017-09-21 15:51

One of the usual data visualization workflows supported by LabPlot involves the import of some external data into the application and the creation of plots. Once the data is imported, e.g. into a spreadsheet, there are basically the following steps to do in order to get the data visualized:

  • create a worksheet (if not available yet)
  • add a plot (if not available yet)
  • add a xy-curve to the plot and select the spreadsheet columns as data sources for the curve

Despite the apparently small number of steps required here, this can result in a lot of tedious work if you need to plot many different imported data sets in one or several plots.

For the upcoming release we decided to improve here and to provide the user a more faster way to get the data plotted. The new “Plot Data” dialog, reachable via the context menu of the spreadsheet, helps the user to create all the intermediate steps mentioned above for many data sets in one single step:




In this dialog, the process of creation of all the required objects can be influenced via different options where the user can decide how many plots to create and where to add the generated curves. The dialog respects the selection in the worksheet and only the selected columns are plotted. In case nothing is selected in the spreadsheet, all data sets (columns) are considered:



When generating curves, LabPlot tries to automatically recognize the column that has to be used as the source for the x-data by evaluating the “plot designation” property attached to every column in the spreadsheet. This greatly helps if you need to plot a set of data series having the same x-values (usually an index or timestamp). In case there is no column with the plot designation “X”, the first selected column is used as X.

Similar for data analysis: if you wanted to, let’s say, smooth some data in a spreadsheet and to compare the result with the original data in a plot, you had previously to

  • create a worksheet (if not available yet)
  • add a plot (if not available yet)
  • add a xy-curve to the plot and select the spreadsheet columns as data sources for the curve
  • add a smoothing curve and select the spreadsheet columns as data sources for the smooth curve
  • click on the “Recalculate” button to perform the calculation of the smoothed data

Also this kind of tasks can result in a lot of manual tedious work, which already was the reason for some user complains. We addressed this, too. With the recent improvements in LabPlot, you simply select the corresponding analysis function from the “Analyze and plot data” sub-menu in spreadsheet’s context menu and let LabPlot do all the required steps




The procedure here is basically the same as described above with the only difference that the user has now the chance to additionally decide in the “Plot Data” dialog whether they want to plot the result of the analysis function together with the original data or to plot the result curve only.

With the analysis workflow sketched above, the analysis was initiated from the spreadsheet. For the use cases where you already have the data plotted and now want to smooth the already visualized data, to fit a certain model to the data, etc., you can initiate the analysis step via the context menu of the curve directly in the plot:




To further speed up the creation of data visualizations, we implemented the feature that allows the user to simply drag the columns in the project explorer and to drop them on the already existing plots – either by dropping in the Project Explorer or by dropping in the worksheet:



Until the next release, planned for December this year, we still have some time for fine-tuning and fixing the remaining issues in the new features and improvements briefly described here. All of them are already available in master and can be tried out.

Categories: FLOSS Project Planets

Mediacurrent: Infographic: Drupal vs Adobe vs Sitecore

Planet Drupal - Thu, 2017-09-21 14:50

There’s a lot of information to sift through when comparing enterprise-level Content Management Systems (CMS) — features, functionality, cost, and more. In our recent whitepaper, we give you the pros and cons of Adobe, Drupal, and Sitecore through the lens of five different stakeholder perspectives.

To help give you an overview, we put together an at-a-glance view of how these three CMS's compare. 

Categories: FLOSS Project Planets

Continuum Analytics Blog: Back to School: Data Science &amp; AI Open New Doors for Students

Planet Python - Thu, 2017-09-21 14:40
School is back in session and now is the time students are thinking about their future. When considering options, science-oriented students are likely thinking about what is arguably today’s hottest technology: artificial intelligence (AI).
Categories: FLOSS Project Planets

Stack Abuse: Flask vs Django

Planet Python - Thu, 2017-09-21 11:52

In this article, we will take a look at two of the most popular web frameworks in Python: Django and Flask.

Here, we will be covering how each of these frameworks compares when looking at their learning curves, how easy it is to get started. Next, we'll also be looking at how these two stands against each other with concluding by when to use one of them.

Getting Started

One of the easiest ways to compare two frameworks is by installing them and taking note how easily a user can get started with it, which is exactly what we will do next. We will try setting up Django and Flask on a Linux machine and create an app to see how easy (or difficult) the process is with each one.

Setting up Django

In this section, we will setup Django on an Linux-powered machine. Best way to get started with any Python framework is by using virtual environments. We will install it using pip.

$ sudo apt-get install python3-pip $ pip3 install virtualenv $ virtualenv --python=`which python3` ~/.virtualenvs/django_env

Note: If the pip3 command gives you an error, you may need to prefix it with sudo to make it work.

Once we're done setting up our virtual environment, which we've named django_env, we must activate it to start using it:

$ source ~/.virtualenvs/django_env/bin/activate

Once activated, we can finally install Django:

$ pip install Django

Suppose our project is called mysite. Make a new directory and enter it, run following commands:

$ mkdir mysite $ cd mysite $ django-admin startproject mysite

If you inspect the resulting project, your directory structure will be shown as:

mysite/ manage.py mysite/ __init__.py settings.py urls.py wsgi.py

Let's take a look at what is significant about each of the directories and files that were created.

  • The root mysite/ directory is the container directory for our project
  • mysite.py is a command line tool that enables us to work with the project in different ways
  • mysite/ directory is the Python package of our project code
  • mysite/__init__.py is a file which informs Python that current directory should be considered a Python package
  • mysite/settings.py will contain the configuration properties for current project
  • mysite/urls.py is a Python file which contains the URL definitions for this project
  • mysite/wsgi.py acts as an entry for a WSGI web server that forwards requests to your project

From here, we can actually run the app using the manage.py tool. The following command does some system checks, checks for database migrations, and some other things before actually running your server:

$ python manage.py runserver Performing system checks... System check identified no issues (0 silenced). You have unapplied migrations; your app may not work properly until they are applied. Run 'python manage.py migrate' to apply them. September 20, 2017 - 15:50:53 Django version 1.11, using settings 'mysite.settings' Starting development server at http://127.0.0.1:8000/ Quit the server with CONTROL-C.

Note: Running your server in this way is meant for development only, and not production environments.

To check out your app, head to http://localhost:8000/, where you should see a page saying "It worked!".

But wait, you're still not done! To actually create any pages/functionality in your site, you need to create an app within your project. But why do you need an app? In Django, apps are web applications that do something, which could be a blog, a forum, or a commenting system. The project is a collection of your apps, as well as configuration for the apps and entire website.

So, to create your app, move in to your project directory and run the following command:

$ cd mysite $ python manage.py startapp myapp

This will create another directory structure where you can actually manage your models, views, etc.

manage.py myapp/ __init__.py admin.py apps.py migrations/ models.py tests.py views.py mysite/ __init__.py settings.py urls.py wsgi.py

From here, you need to set up your views in views.py and URL routing in urls.py, which we'll save for another tutorial.

But you get the point, right? It takes a few commands and quite a few files to get your Django project up and running.

Setting up Flask

Just like Django, we will use a virtual environment with Flask as well. So the commands for activating a virtual environment will remain the same as before. After that, instead of installing Django, we'll install Flask instead.

$ pip install Flask

Once the installation completes, we can start creating our Flask application. Now, unlike Django, Flask doesn't have a complicated directory structure. The structure of your Flask project is entirely up to you.

Borrowing an example from the Flask homepage, you can create a runnable Flask app from just a single file:

from flask import Flask app = Flask(__name__) @app.route("/") def hello(): return "Hello World!"

And running the app is about as easy as setting it up:

$ FLASK_APP=hello.py flask run * Running on http://localhost:5000/

Visiting the URL http://localhost:5000/ should display the text "Hello World!" in your browser.

I'd encourage you to look for some sample apps on the Flask homepage to learn more. Learning by example is one of the best ways to get up and running quickly.

The framework that "wins" this area is really up to your needs and experience. Django may be more favorable for beginners since it makes decisions for you (i.e. how to structure your app), whereas in Flask you need to handle this yourself.

On the other hand, Flask is simpler to get running since it requires very little to get going. An entire Flask app can be composed from a single file. The trade-offs really depend no what you need most.

Learning Curve

Regarding learning curve, as we saw in the last section with Flask, it was very easy to get started. The app doesn't require a complicated directory structure where you needed to remember which directory/file does what. Instead, you can add files and directories as you go according to your usage. This is what Flask is about, as a micro-framework for web development.

Django, on the other hand, has a bit higher learning curve since it is more "picky" about how things are set up and work. Because of this, you need to take more time learning how to compose modules and work within the confines of the framework.

This isn't all bad though, since this allows you to easily plug in 3rd party components in to your app without having to do any work integrating them.

Employability

Which of these frameworks will help you land a job? For many developers, this is one of the more important question regarding certain libraries and frameworks: which will help me get hired?

Django has quite a few large companies on its resume, which is because many companies that use Python for web development tend to use (or at least started off with) Django to power their site. Django, being a full-fledged framework, is often used early on in development because you get much more resources and power with it out of the box.

Here are just a few companies that use (or have used) Django for their sites:

  • Pinterest
  • Instagram
  • Disqus
  • NASA

Flask is a bit harder to gauge here, mostly because of the way it is used. Flask tends to be used more for microservices, which makes it harder to tell which companies are using it. Plus, companies with a microservice architecture are less likely to say their service is "powered by Flask" since they likely have many services potentially using many different frameworks.

There are, however, hints out there of who uses Flask based on job postings, tech talks, blog posts, etc. From those, we know that the following companies have used Flask somewhere in their backend infrastructure:

  • Twilio
  • Linkedin
  • Pinterest
  • Uber
  • Mailgun

While Django may be more popular among companies, Flask is arguably more common among the more tech-focused companies as they are more likely to use microservices, and therefore micro-frameworks like Flask.

Project Size and Scope

Our comparison of each framework can become very subjective thanks to many different factors, like project scope, developer experience, type of project, etc. If the project is fairly small and it doesn't need all of the overhead that Django comes with, then Flask is the ideal choice to get started and get something done very quickly.

However, if the project is larger in duration and scope, then Django is likely the way to go as it already includes much of what you need. This basically means that many common components of a web-service/website either already comes with Django, or it is already available through 3rd party open source software. In some cases you can just create a Django project, plug in a bunch of components, create your views/templates, and you're done.

While we do praise Django for its extensibility, we can't ignore that Flask does have some extensions of its own. While they aren't quite as big in scope as Django (and many of these extensions come standard in Django), it's a step in the right direction.

Django's add-on components can be as big as a blog add-on, to as small as small middleware input validation. Most of Flask's extensions are small middleware components, which is still better than nothing and very helpful, considering the average size of Flask projects.

Limitations

Every piece of tech has its problems, and these frameworks are no different. So before you choose which to use, you might want to know what disadvantages each has, which we'll be talking about in this section.

Django

So, what are the aspects of Django that work against it to be selected as your framework of choice?

Django is a very large project. Once a developer, especially beginners, starts learning Django, it's easy for them to get lost in the source code, the built-in features, and components it provides, without even using them in an app.

Django is a fairly large framework to deploy for simple use-cases, as it hides much of the control from you. If you want to use something that isn't "standard" in Django, then you have to put in some extra work to do so.

Understanding components in Django can be a little difficult and tricky at times and can lead to tough decisions, like deciding if an existing component will work for your use-case, or if it'll end up causing you more work than it is worth.

Flask

Now that we've seen some of the problems with Django, let's not forget about Flask. Since the Flask framework is so small, there isn't a lot to complain about. Well, except for that fact right there: It's so small.

Flask is a micro-framework, which means it only provides the bare-bones functionality to get you started. This doesn't mean it can't be powerful and can't scale, it just means that you'll have to create much of the functionality of your service yourself. This means you'll need to handle integrating your database, data validation, file serving, etc.

While this could be considered an advantage to those who want control over everything, it also means it'll take you longer to get set up with a fully-functional website.

Choosing Flask or Django

While it's easy to talk about what each framework does and doesn't do, let's try and make a more direct comparison of each, which we'll do in this section.

When simplicity is a factor, Flask is the way to go. It allows much more control over your app and lets you decide how you want to implement things in a project. In contrast to this, Django provides a more inclusive experience, such as providing a default admin panel for your data, an ORM on top of your database, and protection against things like SQL injection, cross-site scripting, CSRF, etc.

If you put a lot of emphasis on community support, then Django is probably better in this regard given its history. It has been around since 2005, whereas Flask was created in 2010. At the time of writing this article, Django has about 3.5x more questions/answers on Stack Overflow than Flask (about 2600 Django questions to Flask's 750).

The Flask framework is relatively lightweight. In fact, it's almost 2.5x smaller than Django in terms of amount of code. That's a big difference, especially if you need to understand the inner-workings of your web framework. In this aspect, Flask will be much easier to read and understand for most developers.

Flask should be selected for development if you need complete control over your app, which ORM you want to use, which database you need to integrate with excellent opportunities to learn more about web services. Django, on the other hand, is better when there is a more clear path to creating what you want, or you're creating something that has been done before. For example, a blog would be a good use-case for Django.

Learn More

Want to learn more about either of these frameworks? There are quite a few resources out there. Here are a few courses that I've found to be pretty helpful, and will get you up to speed much quicker:


Python and Django Full Stack Web Developer Bootcamp


REST APIs with Flask and Python

Otherwise you can also get a great start by visiting each framework's respective websites:

Either way, the most important thing is you actually try them out, work through some examples, and decide on your own which is best for you.

Conclusion

In this article, we compared the two web frameworks, Django and Flask, by looking at their different properties and setting up a simple "Hello World!" app with each one.

You may find that if you're new to web development and decide to learn Django, it may take you a bit longer to truly understand what all of the underlying components do, and how to change them to actually do what you want. But there are many positives as well, and once you become proficient with Django, it'll end up saving you a lot of time in the end, given its huge list of components and vast community support.

A more advanced comparison for any frameworks can be done only with advanced use cases and scenarios. Just know that you can't really go wrong with either one, and learning either will set you up well for finding a job.

If you need a recommendation, then I'd personally go with Flask. By learning a framework that doesn't hide so much stuff from you, you can learn much, much more. Once you have a better understanding of the core concepts of web development and HTTP, you can start to use add-ons that abstract this away from you. But having that solid foundation of understanding is more important early on, in my opinion.

Which framework do you use, and why? Let us know in the comments!

Categories: FLOSS Project Planets

Redfin Solutions: CashNET module for Ubercart / Drupal 7.x

Planet Drupal - Thu, 2017-09-21 11:51
CashNET module for Ubercart / Drupal 7.x

Redfin is happy to announce that thanks to the efforts of vetchneons, we have at long last released a -dev version of the CashNET module for Ubercart in Drupal 7. CashNET is a payment processor used by a lot of institutions in the higher education realm.

We would love for any folks using Ubercart in 7 to test it out, so the module can be promoted to a stable release. 

Chris September 21, 2017
Categories: FLOSS Project Planets

GNUnet News: gnURL 7.55.1-4 released

GNU Planet! - Thu, 2017-09-21 10:58

Today gnURL has been released in version 7.55.1-4 as a patch release.

Mainly this fixes:
You no longer have to run "./buildconf" before compiling gnURL, and therefore autoconf + automake as dependency are dropped.

Categories: FLOSS Project Planets

Continuum Analytics Blog: Anaconda to Present at Strata Data Conference, New York

Planet Python - Thu, 2017-09-21 10:11
Anaconda, the most popular Python data science platform provider, today announced that several company experts will present two sessions and one tutorial at The Strata Data Conference on September 26 and 27 at the Javits Center in New York City.
Categories: FLOSS Project Planets

DataCamp: How Not To Plot Hurricane Predictions

Planet Python - Thu, 2017-09-21 08:59

Visualizations help us make sense of the world and allow us to convey large amounts of complex information, data and predictions in a concise form. Expert predictions that need to be conveyed to non-expert audiences, whether they be the path of a hurricane or the outcome of an election, always contain a degree of uncertainty. If this uncertainty is not conveyed in the relevant visualizations, the results can be misleading and even dangerous.

Here, we explore the role of data visualization in plotting the predicted paths of hurricanes. We explore different visual methods to convey the uncertainty of expert predictions and the impact on layperson interpretation. We connect this to a broader discussion of best practices with respect to how news media outlets report on both expert models and scientific results on topics important to the population at large.

No Spaghetti Plots?

We have recently seen the damage wreaked by tropical storm systems in the Americas. News outlets such as the New York Times have conveyed a great deal of what has been going on using interactive visualizations for Hurricanes Harvey and Irma, for example. Visualizations include geographical visualisation of percentage of people without electricity, amount of rainfall, amount of damage and number of people in shelters, among many other things.

One particular type of plot has understandably been coming up recently and raising controversy: how to plot the predicted path of a hurricane, say, over the next 72 hours. There are several ways to visualize predicted paths, each way with its own pitfalls and misconceptions. Recently, we even saw an article in Ars Technica called Please, please stop sharing spaghetti plots of hurricane models, directed at Nate Silver and fivethirtyeight.

In what follows, I'll compare three common ways, explore their pros and cons and make suggestions for further types of plots. I'll also delve into why these types are important, which will help us decide which visual methods and techniques are most appropriate.

Disclaimer: I am definitively a non-expert in metereological matters and hurricane forecasting. But I have thought a lot about visual methods to convey data, predictions and models. I welcome and actively encourage the feedback of experts, along with that of others.

Visualizing Predicted Hurricane Paths

There are three common ways of creating visualizations for predicted hurricane paths. Before talking about at them, I want you to look at them and consider what information you can get from each of them. Do your best to interpret what each of them is trying to tell you, in turn, and then we'll delve into what their intentions are, along with their pros and cons:

The Cone of Uncertainty

From the National Hurricane Center

Spaghetti Plots (Type I)

From South Florida Water Management District via fivethirtyeight

Spaghetti Plots (Type II)

From The New York Times. Surrounding text tells us 'One of the best hurricane forecasting systems is a model developed by an independent intergovernmental organization in Europe, according to Jeff Masters, a founder of the Weather Underground. The system produces 52 distinct forecasts of the storm’s path, each represented by a line [above].'

Interpretation and Impact of Visualizations of Hurricanes' Predicted Paths The Cone of Uncertainty

The cone of uncertainty, a tool used by the National Hurricane Center (NHC) and communicated by many news outlets, shows us the most likely path of the hurricane over the next five days, given by the black dots in the cone. It also shows how certain they are of this path. As time goes on, the prediction is less certain and this is captured by the cone, in that there is an approximately 66.6% chance that the centre of the hurricane will fall in the bounds of the cone.

Was this apparent from the plot itself?

It wasn't to me initially and I gathered this information from the plot itself, the NHC's 'about the cone of uncertainty' page and weather.com's demystification of the cone post. There are three more salient points, all of which we'll return to:

  • It is a common initial misconception that the widening of the cone over time suggests that the storm will grow;
  • The plot contains no information about the size of the storm, only about the potential path of its centre, and so is of limited use in telling us where to expect, for example, hurricane-force winds;
  • There is essential information contained in the text that accompanies the visualization, as well as the visualization itself, such as the note placed prominently at the top, '[t]he cone contains the probable path of the storm center but does not show the size of the storm...'; when judging the efficacy of a data visualization, we'll need to take into consideration all its properties, including text (and whether we can actually expect people to read it!); note that interactivity is a property that these visualizations do not have (but maybe should).
Spaghetti Plots (Type I)

This type of plots shows several predictions in one plot. One any given Type I spaghetti plot, the visualized trajectories are predictions from models from different agencies (NHC, the National Oceanic and Atmospheric Administration and the UK Met Office, for example). They are useful in that, like the cone of uncertainty, they inform us of the general region that may be in the hurricane's path. They are wonderfully unuseful and actually misleading in the fact that they weight each model (or prediction) equally.

In the Type I spaghetti plot above, there are predictions with varying degree uncertainty from agencies that have previously made predictions with variable degrees of success. So some paths are more likely than others, given what we currently know. This information is not present. Even more alarmingly, some of the paths are barely even predictions. Take the black dotted line XTRP, which is a straight-line prediction given the storm's current trajectory. This is not even a model. Eric Berger goes into more detail in this Ars Technica article.

Essentially, this type of plots provide an ensemble model (compare with aggregate polling). Yet, a key aspect of ensemble models is that each model is given an appropriate weight and these weights need be communicated in any data visualization. We'll soon see how to do this using a variation on Type I.

Spaghetti Plots (Type II)

These plots show many, say 50, different realizations of any given model. The point is that if we simulate (run) a model several times, it will given a different trajectory each time. Why? Nate Cohen put it well in The Upshot:

"It’s really tough to forecast exactly when a storm will make a turn. Even a 15- or 20-mile difference in when it turns north could change whether Miami is hit by the eye wall, the fierce ring of thunderstorms that include the storm’s strongest winds and surround the calmer eye."

These are perhaps my favourite of the three for several reasons:

  • By simulating multiple runs of the model, they provide an indication of the uncertainty underlying each model;
  • They give a picture of relative likelihood of the storm centre going through any given location. Put simply, if more of the plotted trajectories go through location A than through location B, then under the current model it is more likely that the centre of the storm will go through location A;
  • They are unlikely to be misinterpreted (at least compared to the cone of uncertainty and the Type I plots). All the words required on the visualization are 'Each line represents one forecast of Irma's path'.

One con of Type II is that they are not representative of multiple models but, as we'll see, this can be altered by combining them with Type I plots. Another con is that they, like the others, only communicate the path of the centre of the storm and say nothing about its size. Soon we'll also see how we can remedy this. Note that the distinction between Type I and Type II spaghetti plots is not one that I have found in the literature, but one that I created because these plots have such different interpretations and effects.

For the time being, however, note that we've been discussing the efficacy of certain types of plots without explicitly discussing their purpose, that is, why we need them at all. Before going any further, let's step back a bit and try to answer the question 'What is the purpose of visualizing the predicted path of a hurricane?' Performing such ostensibly naive tasks is often illuminating.

Why Plot Predicted Paths of Hurricanes?

Why are we trying to convey the predicted path of a tropical storm? I'll provide several answers to this in a minute.

But first, let me say what these visualizations are not intended for. We are not using these visualizations to help people decide whether or not to evacuate their homes or towns. Ordering or advising evacuation is something that is done by local authorities, after repeated consultation with experts, scientists, modelers and other key stakeholders.

The major point of this type of visualization is to allow the general populace to be as well-informed as possible about the possible paths of the hurricane and allow them to prepare for the worst if there's a chance that where they are or will be is in the path of destruction. It is not to unduly scare people. As weather.com states with respect to the function of the cone of uncertainty, '[e]ach tropical system is given a forecast cone to help the public better understand where it's headed' and '[t]he cone is designed to show increasing forecast uncertainty over time.'

To this end, I think that an important property would be for a reader to be able to look at it and say 'it is very likely/likely/50% possible/not likely/very unlikely' that my house (for example) will be significantly damaged by the hurricane.

Even better, to be able to say "There's a 30-40% chance, given the current state-of-the-art modeling, that my house will be significantly damaged".

Then we have a hierarchy of what we want our visualization to communicate:

  • At a bare minimum, we want civilians to be aware of the possible paths of the hurricane.
  • Then we would like civilians to be able to say whether it is very likely, likely, unlikely or very unlikely that their house, for example, is in the path.
  • Ideally, a civilian would look at the visualization and be able to read off quantitatively what the probability (or range of probabilities) of their house being in the hurricane's path is.

On top of this, we want our visualizations to be neither misleading nor easy to misinterpret.

The Cone of Uncertainty versus Spaghetti Plots

All three methods perform the minimum required function, to alert civilians to the possible paths of the hurricane. The cone of uncertainty does a pretty good job at allowing a civilian to say how likely it is that a hurricane goes through a particular location (within the cone, it's about two-thirds likely). At least qualitatively, Type II spaghetti plots also do a good job here, as described above, 'if more of the trajectories go through location A than through location B, then under the current model it is more likely that the centre of the storm will go through location A'.

If you plot 50 trajectories, you get a sense of where the centre of the storm will likely be, that is, if around half of the trajectories go through a location, then there's an approximately 50% chance (according to our model) that the centre of the storm will hit that location. None of these methods yet perform the 3rd function and we'll see below how combining Type I and Type II spaghetti plots will allow us to do this.

The major problem with the cone of uncertainty and Type I spaghetti models is that the cone of uncertainty is easy to misinterpret (in that many people interpret the cone as a growing storm and do not appreciate the role of uncertainty) and that the Type I spaghetti models are misleading (they make all models look equally believable). These models then don't satisfy the basic requirement that 'we want our visualizations to be neither misleading nor easy to misinterpret.'

Best Practices for Visualizing Hurricane Prediction Paths

Type II spaghetti plots are the most descriptive and the least open to misinterpretation. But they do fail at presenting the results of all models. That is, they don't aggregate over multiple models like we saw in Type I.

So what if we combined Type I and Type II?

To answer this, I did a small experiment using python, folium and numpy. You can find all the code here.

I first took one the NHC's Hurricane Irma's prediction paths from last week, added some random noise and plotted 50 trajectories. Note that, once again, I am a non-expert in all matters meteorological. The noise that I generated and added to the predicted signal/path was not based on any models and, in a real use case, would come from the models themselves (if you're interested, I used Gaussian noise). For the record, I also found it difficult to find data concerning any of the predicted paths reported in the media. The data I finally used I found here.

Here's a simple Type II spaghetti plot with 50 trajectories:

But these are possible trajectories generated by a single model. What if we had multiple models from different agencies? Well, we can plot 50 trajectories from each:

One of the really cool aspects of Type II spaghetti plots is that, if we plot enough of them, each trajectory becomes indistinct and we begin to see a heatmap of where the centre of the hurricane is likely to be. All this means is that the more blue in a given region, the more likely it is for the path to go through there. Zoom in to check it out.

Moreover, if we believe that one model is more likely than another (if, for example, the experts who produced that model have produced far more accurate models previously), we can weight these models accordingly via, for example, transparency of the trajectories, as we do below. Note that weighting these models is a task for an expert and an essential part of this process of aggregate modeling.

What the above does is solve the tasks required by the first two properties that we want our visualizations to have. To achieve the 3rd, a reader being able to read off that it's, say 30-40% likely for the centre of a hurricane to pass through a particular location, there are two solutions:

  • to alter the heatmap so that it moves between, say, red and blue and include a key that says, for example, red means a probability of greater than 90%;
  • To transform the heatmap into a contour map that shows regions in which the probability takes on certain values.

Also, do note that this will tell somebody the probability that a given location will be hit by the hurricane's center. You could combine (well, convolve) this with information about the size of the hurricane to transform the heatmap into one of the probability of a location being hit by hurricane-force winds. If you'd like to do this, go and hack around the code that I wrote to generate the plots above (I plan to write a follow-up post doing this and walking through the code).

Visualizing Uncertainty and Data Journalism

What can we take away from this? We have explored several types of visualization methods for predicted hurricane paths, discussed the pros and cons of each and suggested a way forward for more informative and less misleading plots of such paths, plots that communicate not only the results but also the uncertainty around the models.

This is part of a broader conversation that we need to be having about reporting uncertainty in visualizations and data journalism, in general. We need to actively participate in conversations about how experts report uncertainty to civilians via news media outlets. Here's a great piece from The Upshot demonstrating what the jobs report could look like due to statistical noise, even if jobs were steady. Here's another Upshot piece showing the role of noise and uncertainty in interpreting polls. I'm well aware that we need headlines to sell news and the role of click-bait in the modern news media landscape, but we need to be communicating not merely results, but uncertainty around those results so as not mislead the general public and potentially ourselves. Perhaps more importantly, the education system needs to shift and equip all civilians with levels of data literacy and statistical literacy in order to deal with this movement into the data-driven age. We can all contribute to this.

Categories: FLOSS Project Planets

KDE: Randa 2017! KDE Neon Snappy and more

Planet KDE - Thu, 2017-09-21 08:54

Another successful Randa meeting! I spent most of my days working on snappy packaging for KDE core applications, and I have most of them done!

Snappy Builds on KDE Neon

We need testers! Please see Using snappy to get started.

In the evenings I worked on getting all my appimage work moved into the KDE infrastructure so that the community can take over.

I learned a great deal about accessibility and have been formulating ways to improve KDE neon in this area.

Randa meetings are crucial to the KDE community for developer interaction, brainstorming, and bringing great new things to KDE.
I encourage all of you to please consider a donation at https://www.kde.org/fundraisers/randameetings2017/

Categories: FLOSS Project Planets

OSTraining: Create a One Page Drupal Site with Views Infinite Scroll Module

Planet Drupal - Thu, 2017-09-21 08:08

You most likely already navigated across some sites, blogs or galleries, that present the content in an infinite scroll mode.

Such scrolling can easily be implemented with the Views Infinite Scroll contribution module in Drupal 8. No additional libraries or plugins required.

In this tutorial, we’re going to create a gallery of article teasers of all countries in the Americas. Let’s get started!

Categories: FLOSS Project Planets

PyCharm: PyCharm 2017.3 EAP 2

Planet Python - Thu, 2017-09-21 07:44

After a strong start, we continue our early access program (EAP) with its second release. Download EAP 2 now!

Testing RESTful Applications

Many of us work on web applications which expose a RESTful API, or at least an API that pretends to be RESTful. To test these some of us use cURL, some browser extension, or some other piece of software. There is a REST client in PyCharm, but we’ve decided it can use some improvement, so we’re making an all new one.

The new REST client is entirely editor based, you write your request in a file, and then run the request to get a response. Sounds easy enough, right?

To see how it works, we’ll take the sample application from Flask-Restplus, which as you might expect exposes a todo API.

We’ll start out by creating a new todo. This is done by POST-ing to the /todos/ endpoint. To use the new PyCharm REST client, we should start by creating a .http file. If we don’t intend to save this, we can create a scratch file. Press Ctrl+Alt+Shift+Insert (Shift+Command+N on macOS) to start creating a scratch file and choose ‘HTTP request’ as the type. Let’s type our request into the file:

### Post a todo POST http://localhost:5000/todos/ Accept: application/json Content-Type: application/json { "task": "Create my task!" }

 

Now click the green play button next to the first line, and you should see that the task was created:

You can see the response in the Run tool window, and you might also notice that PyCharm wrote a new line in our file with the name of a .json file. This file contains the response, so if we Ctrl+Click (Cmd+Click) the filename, or use Ctrl+B (Cmd+B) to go to definition we see the full response in a separate file.

Those files become really useful when we do the same request a couple times but get different results. If we use a GET request to get our todo, and then use a PUT to change it, and redo our GET, we’ll now have two files there. We can then use the blue icon with the arrows to see the difference between the responses:

Now it’s your turn! Get the EAP and try it yourself

Further Improvements
  • Code completion for CSS now supports more notations for colors (rgb, hsl), added completion for CSS transitions, and more
  • React and TypeScript code intelligence: TypeScript 2.4 mapped types, and more
  • Docker improvements: you can now pass –build-arg arguments to a Docker run configuration

To read about all improvements in this version, see the release notes

As always, we really appreciate your feedback! Please let us know on YouTrack about any issues you experience or suggestions for improvement. You can also reach us on Twitter, or by leaving a comment below.

Categories: FLOSS Project Planets
Syndicate content