FLOSS Project Planets

Ritesh Raj Sarraf: Laptop Mode Tools 1.71

Planet Debian - Thu, 2017-01-12 03:54

I am pleased to announce the 1.71 release of Laptop Mode Tools. This release includes some new modules, some bug fixes, and there are some efficiency improvements too. Many thanks to our users; most changes in this release are contributions from our users.

A filtered list of changes in mentioned below. For the full log, please refer to the git repository. 

Source tarball, Feodra/SUSE RPM Packages available at:
https://github.com/rickysarraf/laptop-mode-tools/releases

Debian packages will be available soon in Unstable.

Homepage: https://github.com/rickysarraf/laptop-mode-tools/wiki
Mailing List: https://groups.google.com/d/forum/laptop-mode-tools

 

1.71 - Thu Jan 12 13:30:50 IST 2017     * Fix incorrect import of os.putenv     * Merge pull request #74 from Coucouf/fix-os-putenv     * Fix documentation on where we read battery capacity from     * cpuhotplug: allow disabling specific cpus     * Merge pull request #78 from aartamonau/cpuhotplug     * runtime-pm: refactor listed_by_id()     * wireless-power: Use iw and fallback to iwconfig if it not available     * Prefer available AC supply information over battery state to determine ON_AC     * On startup, we want to force the full execution of LMT.     * Device hotplugs need a forced execution for LMT to apply the proper settings     * runtime-pm: Refactor list_by_type()     * kbd-backlight: New module to control keyboard backlight brightness     * Include Transmit power saving in wireless cards     * Don't run in a subshell     * Try harder to check battery charge     * New module: vgaswitcheroo     * Revive bluetooth module. Use rfkill primarily. Also don't unload (incomplete list of) kernel modules

 

What is Laptop Mode Tools Description: Tools for Power Savings based on battery/AC status  Laptop mode is a Linux kernel feature that allows your laptop to save  considerable power, by allowing the hard drive to spin down for longer  periods of time. This package contains the userland scripts that are  needed to enable laptop mode.  .  It includes support for automatically enabling laptop mode when the  computer is working on batteries. It also supports various other power  management features, such as starting and stopping daemons depending on  power mode, automatically hibernating if battery levels are too low, and  adjusting terminal blanking and X11 screen blanking  .  laptop-mode-tools uses the Linux kernel's Laptop Mode feature and thus  is also used on Desktops and Servers to conserve power Categories: Keywords: Like: 
Categories: FLOSS Project Planets

Mike Crittenden: Exporting and importing big Drupal databases

Planet Drupal - Wed, 2017-01-11 23:15
Exporting and importing big Drupal databases

Once your site's database dump file gets to be 1GB or more, phrases like "oh, just download and import a DB dump" can't really be taken for granted anymore. So here are some tips for dealing with large databases, especially those of the Drupal variety.

Exporting

Before we can import, we must export. With a big DB, you don't want to just do a regular old mysqldump > outfile.sql and call it a day. Here are some tips.

Find the size before exporting

It can sometimes be useful to see how big the export is going to be before you actually export anything. That way, you can know ahead of time if you need to be doing this or that to reduce the size, or if it won't matter since the whole thing won't be that big anyway.

Here's a query you can run to see the size per DB table:

SELECT TABLE_SCHEMA, TABLE_NAME, DATA_LENGTH / POWER(1024,1) Data_KB, DATA_LENGTH / POWER(1024,2) Data_MB, DATA_LENGTH / POWER(1024,3) Data_GB FROM information_schema.tables WHERE table_schema NOT IN   ('information_schema','performance_schema','mysql') ORDER BY DATA_LENGTH;

And here's another query you can run to see what the total size for the entire DB is: 

SELECT Data_BB / POWER(1024,1) Data_KB, Data_BB / POWER(1024,2) Data_MB, Data_BB / POWER(1024,3) Data_GB FROM (SELECT SUM(data_length)   Data_BB FROM information_schema.tables WHERE table_schema NOT IN   ('information_schema','performance_schema','mysql')); Dump without unnecessary data

For those cases where you need the database structure for all of the tables, but you don't need the data for all of them, here's a technique you can use. This will grab the entire DB structure, but lets you exclude data for any tables that you want. For example, search_index, cache_*, or sessions tables will be good places to cut out some fat.

# First we export the table structure. mysqldump --no-data database_name > /export.sql # Grab table data, excluding tables we don't need. mysqldump --no-create-info   --ignore-table=database_name.table_name1   --ignore-table=database_name.table_name2   database_name >> export.sql

Just replace "table_name1" and "table_name2" with the tables that you want to skip, and you're golden. Also note that you can use the % character as a wildcard, so for example, you could ignore "cache%" for all cache tables.

After you do that, you'll have a single export.sql file that contains the DB structure for all tables and the DB data for all tables except the ones you excluded. Then, you'll probably want to compress it...

Compress all the things

This one may go without saying, but if you're not compressing your database dumps then either they're really tiny, or you're dumber than a dummy. 

drush sql-dump --gzip --result-file=db.sql

Compare that with the regular old:

drush sql-dump --result-file=db.sql

...and you're going to see a huge difference.

Or if you already have the SQL dump that you need to compress, you can compress the file directly using:

gzip -v db.sql

That will output a db.sql.gz file for you.

Importing

Now you have a nice clean compressed DB dump with everything you need and nothing you don't, and you're ready to import. Here are a few ways to ease the pain.

Import a compressed dump directly

Instead of having to decompress the dump before importing, you can do it inline:

gunzip -c db.sql.gz | drush sqlc Exclude data when importing

If you receive a DB dump that has a lot of data you don't need (caches, sessions, search index, etc.), then you can just ignore that stuff when importing it as well. Here's a little one-liner for this:

gunzip -c db.sql.gz | grep -Ev "^INSERT INTO \`(cache_|search_index|sessions)" | drush sqlc

What this is doing is using "grep" as a middleman and saying "skip any lines that are insertion lines for these specific tables we don't care about". You can edit what's in the parenthesis to add/remove tables as needed.

Monitor import progress

There's nothing worse than just sitting and waiting and having no idea how far along the import has made it. Monitoring progress makes a long import seem faster, because there's no wondering. 

If you have the ability to install it (from Homebrew or apt-get or whatever), the "pv" (Pipe Viewer) command is great here:

pv db.sql | drush sqlc

Or if your database is compressed:

pv db.sql.gz | gunzip | drush sqlc

Using "pv" will show you a progress bar and a completion percentage. It's pretty awesome.

If you don't have "pv" then you can settle for the poor man's version:

watch "mysql database_name -Be 'SHOW TABLES' | tail -n2"

That slick little guy will show you the table that is currently importing, and auto-updates as it runs, so you can at least see how far through the table list it has gone.

Tools and Resource

In this post I tried to focus on commands that everyone already has. If this just isn't cutting it for you, then look into these tools which could help even more:

  • SyncDB - a couple Drush commands that split DB dumps into separate files and import them in parallel, drastically speeding things up
  • Drush SQL Sync Pipe - an alternative to "drush sql-sync" that uses pipes where possible to speed things up
mcrittenden Wed, 01/11/2017 - 23:15
Categories: FLOSS Project Planets

Codementor: Customizing your Navigation Drawer in Kivy & KivyMD

Planet Python - Wed, 2017-01-11 22:16
Kivy & KivyMD: NavigationDrawer

Kivy is an open source, cross-platform Python framework for the development of applications that makes use of innovative, multi-touch user interfaces.

KivyMD is a collection of Material Design compliant widgets for use with Kivy.

Prerequisites

This tutorial is meant for those who have a little or good familiarity with Kivy but don’t know how to move forward with implementing their own widgets or for those who don’t find Kivy visually attractive.

Some really cool resources for you:

Content
  • KivyMD’s Navigation Drawer.
  • Modify the Navigation Drawer by replacing the Title with a Circular image
Structure

Before you start, make sure that you have this file structure.
Download the files from here.

- navigationdrawer - __init__.py # our modified navigarion drawer. - kivymd - ... - navigationdrawer.py - ... - images # contains the image - me.jpg - main.py

Before we start let’s see how our main.py looks like.

  • NavigateApp class
    • Navigator’s object
    • Theme class’s object
  • Navigator class
    • NavigationDrawerIconButton

And we also have:

main.py from kivy.app import App from kivy.lang import Builder from kivy.properties import ObjectProperty, StringProperty from kivymd.theming import ThemeManager from kivymd.navigationdrawer import NavigationDrawer # from navigationdrawer import NavigationDrawer main_widget_kv = ''' #:import Toolbar kivymd.toolbar.Toolbar BoxLayout: orientation: 'vertical' Toolbar: id: toolbar title: 'Welcome' background_color: app.theme_cls.primary_dark left_action_items: [['menu', lambda x: app.nav_drawer.toggle()]] right_action_items: [['more-vert', lambda x: app.raised_button.open(self.parent)]] Label: <Navigator>: NavigationDrawerIconButton: icon: 'face' text: 'Kuldeep Singh' NavigationDrawerIconButton: icon: 'email' text: 'kuldeepbb.grewal@gmail.com' on_release: app.root.ids.scr_mngr.current = 'bottomsheet' NavigationDrawerIconButton: icon: 'phone' text: '+91-7727XXXXXX' NavigationDrawerIconButton: icon: 'cake' text: '26/11/1994' NavigationDrawerIconButton: icon: 'city-alt' text: 'Rohtak' NavigationDrawerIconButton: icon: 'settings' text: 'Settings' ''' class Navigator(NavigationDrawer): image_source = StringProperty('images/me.png') class NavigateApp(App): theme_cls = ThemeManager() nav_drawer = ObjectProperty() def build(self): main_widget = Builder.load_string(main_widget_kv) self.nav_drawer = Navigator() return main_widget NavigateApp().run()

Now that we have seen how the Navigation Drawer looks like, let’s look at its source code.

Navigationdrawer.py from KivyMD. (Source)

kivymd/navigationdrawer.py

# -*- coding: utf-8 -*- from kivy.lang import Builder from kivymd.label import MDLabel from kivy.animation import Animation from kivymd.slidingpanel import SlidingPanel from kivymd.icon_definitions import md_icons from kivymd.theming import ThemableBehavior from kivymd.elevationbehavior import ElevationBehavior from kivy.properties import StringProperty, ObjectProperty from kivymd.list import OneLineIconListItem, ILeftBody, BaseListItem Builder.load_string(''' <NavDrawerToolbar@Toolbar> canvas: Color: rgba: root.theme_cls.divider_color Line: points: self.x, self.y, self.x+self.width,self.y <NavigationDrawer> _list: list elevation: 0 canvas: Color: rgba: root.theme_cls.bg_light Rectangle: size: root.size pos: root.pos NavDrawerToolbar: title: root.title opposite_colors: False title_theme_color: 'Secondary' background_color: root.theme_cls.bg_light elevation: 0 ScrollView: do_scroll_x: False MDList: id: ml id: list <NavigationDrawerIconButton> NDIconLabel: id: _icon font_style: 'Icon' theme_text_color: 'Secondary' ''') class NavigationDrawer(SlidingPanel, ThemableBehavior, ElevationBehavior): title = StringProperty() _list = ObjectProperty() def add_widget(self, widget, index=0): if issubclass(widget.__class__, BaseListItem): self._list.add_widget(widget, index) widget.bind(on_release=lambda x: self.toggle()) else: super(NavigationDrawer, self).add_widget(widget, index) def _get_main_animation(self, duration, t, x, is_closing): a = super(NavigationDrawer, self)._get_main_animation(duration, t, x, is_closing) a &= Animation(elevation=0 if is_closing else 5, t=t, duration=duration) return a class NDIconLabel(ILeftBody, MDLabel): pass class NavigationDrawerIconButton(OneLineIconListItem): icon = StringProperty() def on_icon(self, instance, value): self.ids['_icon'].text = u"{}".format(md_icons[value])

Here we see that NavigationDrawer class has a widget named as NavDrawerToolbar which contains the Title property.
We want to add a Circular Image there.

How to do it? By modifying the NavigationDrawer class.

Modify the Navigation Drawer by replacing the title with a circular image

Modification in the kv lang.

Original: <NavigationDrawer> ... NavDrawerToolbar: title: root.title opposite_colors: False title_theme_color: 'Secondary' background_color: root.theme_cls.bg_light elevation: 0 ... Modified: <NavigationDrawer> ... BoxLayout: size_hint: (1, .4) NavDrawerToolbar: padding: 10, 10 canvas.after: Color: rgba: (1, 1, 1, 1) RoundedRectangle: size: (self.size[1]-dp(14), self.size[1]-dp(14)) pos: (self.pos[0]+(self.size[0]-self.size[1])/2, self.pos[1]+dp(7)) source: root.image_source radius: [self.size[1]-(self.size[1]/2)] ...

Modification on the Python side.

Original: class NavigationDrawer(SlidingPanel, ThemableBehavior, ElevationBehavior): title = StringProperty() ... Modified: class NavigationDrawer(SlidingPanel, ThemableBehavior, ElevationBehavior): image_source = StringProperty() ... Modified Navigationdrawer.py

navigationdrawer/__init__.py

# -*- coding: utf-8 -*- from kivy.animation import Animation from kivy.lang import Builder from kivy.properties import StringProperty, ObjectProperty from kivymd.elevationbehavior import ElevationBehavior from kivymd.icon_definitions import md_icons from kivymd.label import MDLabel from kivymd.list import OneLineIconListItem, ILeftBody, BaseListItem from kivymd.slidingpanel import SlidingPanel from kivymd.theming import ThemableBehavior Builder.load_string(''' <NavDrawerToolbar@Label> canvas: Color: rgba: root.parent.parent.theme_cls.divider_color Line: points: self.x, self.y, self.x+self.width,self.y <NavigationDrawer> _list: list elevation: 0 canvas: Color: rgba: root.theme_cls.bg_light Rectangle: size: root.size pos: root.pos BoxLayout: size_hint: (1, .4) NavDrawerToolbar: padding: 10, 10 canvas.after: Color: rgba: (1, 1, 1, 1) RoundedRectangle: size: (self.size[1]-dp(14), self.size[1]-dp(14)) pos: (self.pos[0]+(self.size[0]-self.size[1])/2, self.pos[1]+dp(7)) source: root.image_source radius: [self.size[1]-(self.size[1]/2)] ScrollView: do_scroll_x: False MDList: id: ml id: list <NavigationDrawerIconButton> NDIconLabel: id: _icon font_style: 'Icon' theme_text_color: 'Secondary' ''') class NavigationDrawer(SlidingPanel, ThemableBehavior, ElevationBehavior): image_source = StringProperty() _list = ObjectProperty() def add_widget(self, widget, index=0): if issubclass(widget.__class__, BaseListItem): self._list.add_widget(widget, index) widget.bind(on_release=lambda x: self.toggle()) else: super(NavigationDrawer, self).add_widget(widget, index) def _get_main_animation(self, duration, t, x, is_closing): a = super(NavigationDrawer, self)._get_main_animation(duration, t, x, is_closing) a &= Animation(elevation=0 if is_closing else 5, t=t, duration=duration) return a class NDIconLabel(ILeftBody, MDLabel): pass class NavigationDrawerIconButton(OneLineIconListItem): icon = StringProperty() def on_icon(self, instance, value): self.ids['_icon'].text = u"{}".format(md_icons[value])

Now that we have modified our Navigation Drawer let’s test it.

But before you do, make sure you uncomment NavigationDrawer from the navigationdrawer folder and comment out the NavigationDrawer from the kivymd in the main.py file.

# from kivymd.navigationdrawer import NavigationDrawer from navigationdrawer import NavigationDrawer

And here it is. Our Navigation Drawer with a circular image.

About me

My online existence is mainly here at Codementor and at these places :)

You can send me a token of appreciation at http://paypal.me/kiok46. Thanks ;).

Categories: FLOSS Project Planets

Wingware Blog: Using Multiple Selections to Edit Code in Wing IDE

Planet Python - Wed, 2017-01-11 20:00
Wing IDE 6 improves and extends support for multiple selections on the editor, making it easier to select and then apply edits to a number of selections at once.
Categories: FLOSS Project Planets

Matthew Rocklin: Distributed Pandas on a Cluster with Dask DataFrames

Planet Python - Wed, 2017-01-11 19:00

This work is supported by Continuum Analytics the XDATA Program and the Data Driven Discovery Initiative from the Moore Foundation

Summary

Dask Dataframe extends the popular Pandas library to operate on big data-sets on a distributed cluster. We show its capabilities by running through common dataframe operations on a common dataset. We break up these computations into the following sections:

  1. Introduction: Pandas is intuitive and fast, but needs Dask to scale
  2. Read CSV and Basic operations
    1. Read CSV
    2. Basic Aggregations and Groupbys
    3. Joins and Correlations
  3. Shuffles and Time Series
  4. Parquet I/O
  5. Final thoughts
  6. What we could have done better
Accompanying Plots

Throughout this post we accompany computational examples with profiles of exactly what task ran where on our cluster and when. These profiles are interactive Bokeh plots that include every task that every worker in our cluster runs over time. For example the following computation read_csv computation produces the following profile:

>>> df = dd.read_csv('s3://dask-data/nyc-taxi/2015/*.csv')

If you are reading this through a syndicated website like planet.python.org or through an RSS reader then these plots will not show up. You may want to visit http://matthewrocklin.com/blog/work/2017/01/12/dask-dataframes directly.

Dask.dataframe breaks up reading this data into many small tasks of different types. For example reading bytes and parsing those bytes into pandas dataframes. Each rectangle corresponds to one task. The y-axis enumerates each of the worker processes. We have 64 processes spread over 8 machines so there are 64 rows. You can hover over any rectangle to get more information about that task. You can also use the tools in the upper right to zoom around and focus on different regions in the computation. In this computation we can see that workers interleave reading bytes from S3 (light green) and parsing bytes to dataframes (dark green). The entire computation took about a minute and most of the workers were busy the entire time (little white space). Inter-worker communication is always depicted in red (which is absent in this relatively straightforward computation.)

Introduction

Pandas provides an intuitive, powerful, and fast data analysis experience on tabular data. However, because Pandas uses only one thread of execution and requires all data to be in memory at once, it doesn’t scale well to datasets much beyond the gigabyte scale. That component is missing. Generally people move to Spark DataFrames on HDFS or a proper relational database to resolve this scaling issue. Dask is a Python library for parallel and distributed computing that aims to fill this need for parallelism among the PyData projects (NumPy, Pandas, Scikit-Learn, etc.). Dask dataframes combine Dask and Pandas to deliver a faithful “big data” version of Pandas operating in parallel over a cluster.

I’ve written about this topic before. This blogpost is newer and will focus on performance and newer features like fast shuffles and the Parquet format.

CSV Data and Basic Operations

I have an eight node cluster on EC2 of m4.2xlarges (eight cores, 30GB RAM each). Dask is running on each node with one process per core.

We have the 2015 Yellow Cab NYC Taxi data as 12 CSV files on S3. We look at that data briefly with s3fs

>>> import s3fs >>> s3 = S3FileSystem() >>> s3.ls('dask-data/nyc-taxi/2015/') ['dask-data/nyc-taxi/2015/yellow_tripdata_2015-01.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-02.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-03.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-04.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-05.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-06.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-07.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-08.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-09.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-10.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-11.csv', 'dask-data/nyc-taxi/2015/yellow_tripdata_2015-12.csv']

This data is too large to fit into Pandas on a single computer. However, it can fit in memory if we break it up into many small pieces and load these pieces onto different computers across a cluster.

We connect a client to our Dask cluster, composed of one centralized dask-scheduler process and several dask-worker processes running on each of the machines in our cluster.

from dask.distributed import Client client = Client('scheduler-address:8786')

And we load our CSV data using dask.dataframe which looks and feels just like Pandas, even though it’s actually coordinating hundreds of small Pandas dataframes. This takes about a minute to load and parse.

import dask.dataframe as dd df = dd.read_csv('s3://dask-data/nyc-taxi/2015/*.csv', parse_dates=['tpep_pickup_datetime', 'tpep_dropoff_datetime'], storage_options={'anon': True}) df = client.persist(df)

This cuts up our 12 CSV files on S3 into a few hundred blocks of bytes, each 64MB large. On each of these 64MB blocks we then call pandas.read_csv to create a few hundred Pandas dataframes across our cluster, one for each block of bytes. Our single Dask Dataframe object, df, coordinates all of those Pandas dataframes. Because we’re just using Pandas calls it’s very easy for Dask dataframes to use all of the tricks from Pandas. For example we can use most of the keyword arguments from pd.read_csv in dd.read_csv without having to relearn anything.

This data is about 20GB on disk or 60GB in RAM. It’s not huge, but is also larger than we’d like to manage on a laptop, especially if we value interactivity. The interactive image above is a trace over time of what each of our 64 cores was doing at any given moment. By hovering your mouse over the rectangles you can see that cores switched between downloading byte ranges from S3 and parsing those bytes with pandas.read_csv.

Our dataset includes every cab ride in the city of New York in the year of 2015, including when and where it started and stopped, a breakdown of the fare, etc.

>>> df.head() VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude RateCodeID store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount 0 2 2015-01-15 19:05:39 2015-01-15 19:23:42 1 1.59 -73.993896 40.750111 1 N -73.974785 40.750618 1 12.0 1.0 0.5 3.25 0.0 0.3 17.05 1 1 2015-01-10 20:33:38 2015-01-10 20:53:28 1 3.30 -74.001648 40.724243 1 N -73.994415 40.759109 1 14.5 0.5 0.5 2.00 0.0 0.3 17.80 2 1 2015-01-10 20:33:38 2015-01-10 20:43:41 1 1.80 -73.963341 40.802788 1 N -73.951820 40.824413 2 9.5 0.5 0.5 0.00 0.0 0.3 10.80 3 1 2015-01-10 20:33:39 2015-01-10 20:35:31 1 0.50 -74.009087 40.713818 1 N -74.004326 40.719986 2 3.5 0.5 0.5 0.00 0.0 0.3 4.80 4 1 2015-01-10 20:33:39 2015-01-10 20:52:58 1 3.00 -73.971176 40.762428 1 N -74.004181 40.742653 2 15.0 0.5 0.5 0.00 0.0 0.3 16.30 Basic Aggregations and Groupbys

As a quick exercise, we compute the length of the dataframe. When we call len(df) Dask.dataframe translates this into many len calls on each of the constituent Pandas dataframes, followed by communication of the intermediate results to one node, followed by a sum of all of the intermediate lengths.

>>> len(df) 146112989

This takes around 400-500ms. You can see that a few hundred length computations happened quickly on the left, followed by some delay, then a bit of data transfer (the red bar in the plot), and a final summation call.

More complex operations like simple groupbys look similar, although sometimes with more communications. Throughout this post we’re going to do more and more complex computations and our profiles will similarly become more and more rich with information. Here we compute the average trip distance, grouped by number of passengers. We find that single and double person rides go far longer distances on average. We acheive this one big-data-groupby by performing many small Pandas groupbys and then cleverly combining their results.

>>> df.groupby(df.passenger_count).trip_distance.mean().compute() passenger_count 0 2.279183 1 15.541413 2 11.815871 3 1.620052 4 7.481066 5 3.066019 6 2.977158 9 5.459763 7 3.303054 8 3.866298 Name: trip_distance, dtype: float64

As a more complex operation we see how well New Yorkers tip by hour of day and by day of week.

df2 = df[(df.tip_amount > 0) & (df.fare_amount > 0)] # filter out bad rows df2['tip_fraction'] = df2.tip_amount / df2.fare_amount # make new column dayofweek = (df2.groupby(df2.tpep_pickup_datetime.dt.dayofweek) .tip_fraction .mean()) hour = (df2.groupby(df2.tpep_pickup_datetime.dt.hour) .tip_fraction .mean())

We see that New Yorkers are generally pretty generous, tipping around 20%-25% on average. We also notice that they become very generous at 4am, tipping an average of 38%.

This more complex operation uses more of the Dask dataframe API (which mimics the Pandas API). Pandas users should find the code above fairly familiar. We remove rows with zero fare or zero tip (not every tip gets recorded), make a new column which is the ratio of the tip amount to the fare amount, and then groupby the day of week and hour of day, computing the average tip fraction for each hour/day.

Dask evaluates this computation with thousands of small Pandas calls across the cluster (try clicking the wheel zoom icon in the upper right of the image above and zooming in). The answer comes back in about 3 seconds.

Joins and Correlations

To show off more basic functionality we’ll join this Dask dataframe against a smaller Pandas dataframe that includes names of some of the more cryptic columns. Then we’ll correlate two derived columns to determine if there is a relationship between paying Cash and the recorded tip.

>>> payments = pd.Series({1: 'Credit Card', 2: 'Cash', 3: 'No Charge', 4: 'Dispute', 5: 'Unknown', 6: 'Voided trip'}) >>> df2 = df.merge(payments, left_on='payment_type', right_index=True) >>> df2.groupby(df2.payment_name).tip_amount.mean().compute() payment_name Cash 0.000217 Credit Card 2.757708 Dispute -0.011553 No charge 0.003902 Unknown 0.428571 Name: tip_amount, dtype: float64

We see that while the average tip for a credit card transaction is $2.75, the average tip for a cash transaction is very close to zero. At first glance it seems like cash tips aren’t being reported. To investigate this a bit further lets compute the Pearson correlation between paying cash and having zero tip. Again, this code should look very familiar to Pandas users.

zero_tip = df2.tip_amount == 0 cash = df2.payment_name == 'Cash' dd.concat([zero_tip, cash], axis=1).corr().compute() tip_amount payment_name tip_amount 1.000000 0.943123 payment_name 0.943123 1.000000

So we see that standard operations like row filtering, column selection, groupby-aggregations, joining with a Pandas dataframe, correlations, etc. all look and feel like the Pandas interface. Additionally, we’ve seen through profile plots that most of the time is spent just running Pandas functions on our workers, so Dask.dataframe is, in most cases, adding relatively little overhead. These little functions represented by the rectangles in these plots are just pandas functions. For example the plot above has many rectangles labeled merge if you hover over them. This is just the standard pandas.merge function that we love and know to be very fast in memory.

Shuffles and Time Series

Distributed dataframe experts will know that none of the operations above require a shuffle. That is we can do most of our work with relatively little inter-node communication. However not all operations can avoid communication like this and sometimes we need to exchange most of the data between different workers.

For example if our dataset is sorted by customer ID but we want to sort it by time then we need to collect all the rows for January over to one Pandas dataframe, all the rows for February over to another, etc.. This operation is called a shuffle and is the base of computations like groupby-apply, distributed joins on columns that are not the index, etc..

You can do a lot with dask.dataframe without performing shuffles, but sometimes it’s necessary. In the following example we sort our data by pickup datetime. This will allow fast lookups, fast joins, and fast time series operations, all common cases. We do one shuffle ahead of time to make all future computations fast.

We set the index as the pickup datetime column. This takes anywhere from 25-40s and is largely network bound (60GB, some text, eight machines with eight cores each on AWS non-enhanced network). This also requires running something like 16000 tiny tasks on the cluster. It’s worth zooming in on the plot below.

>>> df = c.persist(df.set_index('tpep_pickup_datetime'))

This operation is expensive, far more expensive than it was with Pandas when all of the data was in the same memory space on the same computer. This is a good time to point out that you should only use distributed tools like Dask.datframe and Spark after tools like Pandas break down. We should only move to distributed systems when absolutely necessary. However, when it does become necessary, it’s nice knowing that Dask.dataframe can faithfully execute Pandas operations, even if some of them take a bit longer.

As a result of this shuffle our data is now nicely sorted by time, which will keep future operations close to optimal. We can see how the dataset is sorted by pickup time by quickly looking at the first entries, last entries, and entries for a particular day.

>>> df.head() # has the first entries of 2015 VendorID tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude RateCodeID store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount tpep_pickup_datetime 2015-01-01 00:00:00 2 2015-01-01 00:00:00 3 1.56 -74.001320 40.729057 1 N -74.010208 40.719662 1 7.5 0.5 0.5 0.0 0.0 0.3 8.8 2015-01-01 00:00:00 2 2015-01-01 00:00:00 1 1.68 -73.991547 40.750069 1 N 0.000000 0.000000 2 10.0 0.0 0.5 0.0 0.0 0.3 10.8 2015-01-01 00:00:00 1 2015-01-01 00:11:26 5 4.00 -73.971436 40.760201 1 N -73.921181 40.768269 2 13.5 0.5 0.5 0.0 0.0 0.0 14.5 >>> df.tail() # has the last entries of 2015 VendorID tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude RateCodeID store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount tpep_pickup_datetime 2015-12-31 23:59:56 1 2016-01-01 00:09:25 1 1.00 -73.973900 40.742893 1 N -73.989571 40.750549 1 8.0 0.5 0.5 1.85 0.0 0.3 11.15 2015-12-31 23:59:58 1 2016-01-01 00:05:19 2 2.00 -73.965271 40.760281 1 N -73.939514 40.752388 2 7.5 0.5 0.5 0.00 0.0 0.3 8.80 2015-12-31 23:59:59 2 2016-01-01 00:10:26 1 1.96 -73.997559 40.725693 1 N -74.017120 40.705322 2 8.5 0.5 0.5 0.00 0.0 0.3 9.80 >>> df.loc['2015-05-05'].head() # has the entries for just May 5th VendorID tpep_dropoff_datetime passenger_count trip_distance pickup_longitude pickup_latitude RateCodeID store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type fare_amount extra mta_tax tip_amount tolls_amount improvement_surcharge total_amount tpep_pickup_datetime 2015-05-05 2 2015-05-05 00:00:00 1 1.20 -73.981941 40.766460 1 N -73.972771 40.758007 2 6.5 1.0 0.5 0.00 0.00 0.3 8.30 2015-05-05 1 2015-05-05 00:10:12 1 1.70 -73.994675 40.750507 1 N -73.980247 40.738560 1 9.0 0.5 0.5 2.57 0.00 0.3 12.87 2015-05-05 1 2015-05-05 00:07:50 1 2.50 -74.002930 40.733681 1 N -74.013603 40.702362 2 9.5 0.5 0.5 0.00 0.00 0.3 10.80

Because we know exactly which Pandas dataframe holds which data we can execute row-local queries like this very quickly. The total round trip from pressing enter in the interpreter or notebook is about 40ms. For reference, 40ms is the delay between two frames in a movie running at 25 Hz. This means that it’s fast enough that human users perceive this query to be entirely fluid.

Time Series

Additionally, once we have a nice datetime index all of Pandas’ time series functionality becomes available to us.

For example we can resample by day:

>>> (df.passenger_count .resample('1d') .mean() .compute() .plot())

We observe a strong periodic signal here. The number of passengers is reliably higher on the weekends.

We can perform a rolling aggregation in about a second:

>>> s = client.persist(df.passenger_count.rolling(10).mean())

Because Dask.dataframe inherits the Pandas index all of these operations become very fast and intuitive.

Parquet

Pandas’ standard “fast” recommended storage solution has generally been the HDF5 data format. Unfortunately the HDF5 file format is not ideal for distributed computing, so most Dask dataframe users have had to switch down to CSV historically. This is unfortunate because CSV is slow, doesn’t support partial queries (you can’t read in just one column), and also isn’t supported well by the other standard distributed Dataframe solution, Spark. This makes it hard to move data back and forth.

Fortunately there are now two decent Python readers for Parquet, a fast columnar binary store that shards nicely on distributed data stores like the Hadoop File System (HDFS, not to be confused with HDF5) and Amazon’s S3. The already fast Parquet-cpp project has been growing Python and Pandas support through Arrow, and the Fastparquet project, which is an offshoot from the pure-python parquet library has been growing speed through use of NumPy and Numba.

Using Fastparquet under the hood, Dask.dataframe users can now happily read and write to Parquet files. This increases speed, decreases storage costs, and provides a shared format that both Dask dataframes and Spark dataframes can understand, improving the ability to use both computational systems in the same workflow.

Writing our Dask dataframe to S3 can be as simple as the following:

df.to_parquet('s3://dask-data/nyc-taxi/tmp/parquet')

However there are also a variety of options we can use to store our data more compactly through compression, encodings, etc.. Expert users will probably recognize some of the terms below.

df = df.astype({'VendorID': 'uint8', 'passenger_count': 'uint8', 'RateCodeID': 'uint8', 'payment_type': 'uint8'}) df.to_parquet('s3://dask-data/nyc-taxi/tmp/parquet', compression='snappy', has_nulls=False, object_encoding='utf8', fixed_text={'store_and_fwd_flag': 1})

We can then read our nicely indexed dataframe back with the dd.read_parquet function:

>>> df2 = dd.read_parquet('s3://dask-data/nyc-taxi/tmp/parquet')

The main benefit here is that we can quickly compute on single columns. The following computation runs in around 6 seconds, even though we don’t have any data in memory to start (recall that we started this blogpost with a minute-long call to read_csv.and Client.persist)

>>> df2.passenger_count.value_counts().compute() 1 102991045 2 20901372 5 7939001 3 6135107 6 5123951 4 2981071 0 40853 7 239 8 181 9 169 Name: passenger_count, dtype: int64 Final Thoughts

With the recent addition of faster shuffles and Parquet support, Dask dataframes become significantly more attractive. This blogpost gave a few categories of common computations, along with precise profiles of their execution on a small cluster. Hopefully people find this combination of Pandas syntax and scalable computing useful.

Now would also be a good time to remind people that Dask dataframe is only one module among many within the Dask project. Dataframes are nice, certainly, but Dask’s main strength is its flexibility to move beyond just plain dataframe computations to handle even more complex problems.

Learn More

If you’d like to learn more about Dask dataframe, the Dask distributed system, or other components you should look at the following documentation:

  1. http://dask.pydata.org/en/latest/
  2. http://distributed.readthedocs.io/en/latest/

The workflows presented here are captured in the following notebooks (among other examples):

  1. NYC Taxi example, shuffling, others
  2. Parquet
What we could have done better

As always with computational posts we include a section on what went wrong, or what could have gone better.

  1. The 400ms computation of len(df) is a regression from previous versions where this was closer to 100ms. We’re getting bogged down somewhere in many small inter-worker communications.
  2. It would be nice to repeat this computation at a larger scale. Dask deployments in the wild are often closer to 1000 cores rather than the 64 core cluster we have here and datasets are often in the terrabyte scale rather than our 60 GB NYC Taxi dataset. Unfortunately representative large open datasets are hard to find.
  3. The Parquet timings are nice, but there is still room for improvement. We seem to be making many small expensive queries of S3 when reading Thrift headers.
  4. It would be nice to support both Python Parquet readers, both the Numba solution fastparquet and the C++ solution parquet-cpp
Categories: FLOSS Project Planets

Mike Driscoll: New in Python: Underscores in Numeric Literals

Planet Python - Wed, 2017-01-11 18:15

Python 3.6 added some interesting new features. The one that we will be looking at in this article comes from PEP 515: Underscores in Numeric Literals. As the name of the PEP implies, this basically gives you the ability to write long numbers with underscores where the comma normally would be. In other words, 1000000 can now be written as 1_000_000. Let’s take a look at some simple examples:

>>> 1_234_567 1234567 >>>'{:_}'.format(123456789) '123_456_789' >>> '{:_}'.format(1234567) '1_234_567'

The first example just shows how Python interprets a large number with underscores in it. The second example demonstrates that we can now give Python a string formatter, the “_” (underscore), in place of a comma. The results speak for themselves.

The numeric literals that include underscores behave the same way as normal numeric literals when doing calculations:

>>> 120_000 + 30_000 150000 >>> 120_000 - 30_000 90000

The Python documentation and the PEP also mention that you can use the underscores after any base specifier. Here are a couple of examples taken from the PEP and the documentation:

>>> flags = 0b_0011_1111_0100_1110 >>> flags 16206 >>> 0x_FF_FF_FF_FF 4294967295 >>> flags = int('0b_1111_0000', 2) >>> flags 240

There are some notes about the underscore that need to be mentioned:

  • You can only use one consecutive underscore and it has to be between digits and after any base specifier
  • Leading and trailing underscores are not allowed

This is kind of a fun new feature in Python. While I personally don’t have any use cases for this in my current job, hopefully you will have one at yours.

Categories: FLOSS Project Planets

Palantir: New Years Resolution: Spend More Time With Family and Friends

Planet Drupal - Wed, 2017-01-11 17:11
New Years Resolution: Spend More Time With Family and Friends brandt Wed, 01/11/2017 - 16:11 Allison Manley Jan 11, 2017

In this five-part series, every Monday in January we’ll explore a New Year’s resolution and how it can apply to your web project.

Stay connected with the latest news on web strategy, design, and development.

Sign up for our newsletter.

Surrounding oneself with a community of friends and family that offer needed support is important to us all. Palantir spent twenty years building our own culture and community right here at the office! But we’ve also been active members in the Drupal community for 12 years:

  • We’ve made contributions to every facet of the Drupal project: Core development, contributed modules, themes, financial assistance, training, documentation, conference organizing, and one Palantiri is a member of the Drupal Board.
  • This means we have a long history of helping organizations level up so they can become Drupal contributors and participants as well.
  • The collaboration in the open source community is one of the reasons Palantiri love Drupal so much.
Upcoming Events

Are you looking to get involved in the Drupal community? Some ideas:

Besides the Drupal and Open Source communities, Palantir works in some specific verticals that have their own rich and robust communities. We’re still finalizing exactly where we’ll be in 2017, but we know for sure you’ll find us at the following conferences so we can connect with friends in those industries and offer them support as needed:

Next week’s resolution: get organized. 

We'd love to help you keep your 2017 resolution.

Let's chat.
Categories: FLOSS Project Planets

Caktus Consulting Group: New year, new Python: Python 3.6

Planet Python - Wed, 2017-01-11 14:44

Python 3.6 was released in the tail end of 2016. Read on for a few highlights from this release.

New module: secrets

Python 3.6 introduces a new module in the standard library called secrets. While the random module has long existed to provide us with pseudo-random numbers suitable for applications like modeling and simulation, these were not "cryptographically random" and not suitable for use in cryptography. secrets fills this gap, providing a cryptographically strong method to, for instance, create a new, random password or a secure token.

New method for string interpolation

Python previously had several methods for string interpolation, but the most commonly used was str.format(). Let’s look at how this used to be done. Assuming 2 existing variables, name and cookies_eaten, str.format() could look like this:

"{0} ate {1} cookies".format(name, cookies_eaten)

Or this:

"{name} ate {cookies_eaten} cookies".format(name=name, cookies_eaten=cookies_eaten)

Now, with the new f-strings, the variable names can be placed right into the string without the extra length of the format parameters:

f"{name} ate {cookies_eaten} cookies"

This provides a much more pythonic way of formatting strings, making the resulting code both simpler and more readable.

Underscores in numerals

While it doesn’t come up often, it has long been a pain point that long numbers could be difficult to read in the code, allowing bugs to creep in. For instance, suppose I need to multiply an input by 1 billion before I process the value. I might say:

bill_val = input_val * 1000000000

Can you tell at a glance if that number has the right number of zeroes? I can’t. Python 3.6 allows us to make this clearer:

bill_val = input_val * 1_000_000_000

It’s a small thing, but anything that reduces the chance I’ll introduce a new bug is great in my book!

Variable type annotations

One key characteristic of Python has always been its flexible variable typing, but that isn’t always a good thing. Sometimes, it can help you catch mistakes earlier if you know what type you are expecting to be passed as parameters, or returned as the results of a function. There have previously been ways to annotate types within comments, but the 3.6 release of Python is the first to bring these annotations into official Python syntax. This is a completely optional aspect of the language, since the annotations have no effect at runtime, but this feature makes it easier to inspect your code for variable type inconsistencies before finalizing it.

And much more…

In addition to the changes mentioned above, there have been improvements made to several modules in the standard library, as well as to the CPython implementation. To read about all of the updates this new release includes, take a look at the official notes.

Categories: FLOSS Project Planets

Jeff Geerling's Blog: Drupal VM Tips & Tricks - brief remote presentation for DrupalDC

Planet Drupal - Wed, 2017-01-11 14:31

Yesterday I presented Drupal VM Tips & Tricks at the DrupalDC meetup, remotely. I didn't have a lot of time to prepare anything for the presentation, but I thought it would be valuable to walk through some of the neat features of Drupal VM people might not know about.

Here's the video from the presentation:

*/

Some relevant links mentioned during the presentation:

Categories: FLOSS Project Planets

Steinar H. Gunderson: 3G-SDI signal support

Planet Debian - Wed, 2017-01-11 14:03

I had to figure out what kinds of signal you can run over 3G-SDI today, and it's pretty confusing, so I thought I'd share it.

For the reference, 3G-SDI is the same as 3G HD-SDI, an extension of HD-SDI, which is an extension of the venerable SDI standard (well, duh). They're all used for running uncompressed audio/video data of regular BNC coaxial cable, possibly hundreds of meters, and are in wide use in professional and semiprofessional setups.

So here's the rundown on 3G-SDI capabilities:

  • 1080p60 supports 10-bit 4:2:2 Y'CbCr. Period.
  • 720p60/1080p30/1080i60 supports a much wider range of formats: 10-bit 4:4:4:4 RGBA (alpha optional), 10-bit 4:4:4:4 Y'CbCrA (alpha optional), 12-bit 4:4:4 RGB, 12-bit 4:4:4 Y'CbCr or finally 12-bit 4:2:2 Y'CbCr (seems rather redundant).
  • There's also a format exclusively for 1080p24 (actually 2048x1080) that supports 12-bit X'Y'Z. Digital cinema, hello. Apart from that, it supports pretty much what 1080p30 does. There's also a 2048x1080p30 (no interlaced version) mode for 12-bit 4:2:2:4 Y'CbCrA, but it seems rather obscure.

And then there's dual-link 3G-SDI, which uses two cables instead of one—and there's also Blackmagic's proprietary “6G-SDI”, which supports basically everything dual-link 3G-SDI does. But in 2015, seemingly there was also a real 6G-SDI and 12G-SDI, and it's unclear to me whether it's in any way compatible with Blackmagic's offering. It's all confusing. But at least, these are the differences from single-link to dual-link 3G-SDI:

  • 1080p60 supports essentially everything that 720p60 supports on single-link: 10-bit 4:4:4:4 RGBA (alpha optional), 10-bit 4:4:4:4 Y'CbCrA (alpha optional), 12-bit 4:4:4 RGB, 12-bit 4:4:4 Y'CbCr and the redundant 12-bit 4:2:2 Y'CbCr.
  • 2048x1080 4:4:4 X'Y'Z' now also supports 1080p25 and 1080p30.

4K? I don't know. 120fps? I believe that's also a proprietary extension of some sort.

And of course, having a device support 3G-SDI doesn't mean at all it's required to support all of this; in particular, I believe Blackmagic's systems don't support alpha at all except on their single “12G-SDI” card, and I'd also not be surprised if RGB support is rather limited in practice.

Categories: FLOSS Project Planets

Sven Hoexter: Failing with F5: using experimental mv feature on a pool causes tmm to segfault

Planet Debian - Wed, 2017-01-11 12:36

Just a short PSA for those around working with F5 devices:

TMOS 11.6 introduced an experimental "mv" command in tmsh. In the last days we tried it for the first time on TMOS 12.1.1. It worked fine for a VirtualServer but a mv for a pool caused a sefault in tmm. We're currently working with the F5 support to sort it out, they think it's a known issue. Recommendation for now is to not use mv on pools. Just do it the old way, create a new pool, assign the new pool to the relevant VS and delete the old pool.

Possible bug ID at F5 is ID562808. Since I can not find it in the TMOS 12.2 release notes I expect that this issue also applies to TMOS 12.2, but I did not verify that.

Categories: FLOSS Project Planets

Reproducible builds folks: Reproducible Builds: week 89 in Stretch cycle

Planet Debian - Wed, 2017-01-11 10:04

What happened in the Reproducible Builds effort between Sunday January 1 and Saturday January 7 2017:

GSoC and Outreachy updates Toolchain development
  • #849999 was filed: "dpkg-dev should not set SOURCE_DATE_EPOCH to the empty string"
Packages reviewed and fixed, and bugs filed

Chris Lamb:

Dhole:

Reviews of unreproducible packages

13 package reviews have been added, 4 have been updated and 6 have been removed in this week, adding to our knowledge about identified issues.

2 issue types have been added/updated:

Upstreaming of reproducibility fixes

Merged:

Opened:

Weekly QA work

During our reproducibility testing, the following FTBFS bugs have been detected and reported by:

  • Chris Lamb (4)
diffoscope development

diffoscope 67 was uploaded to unstable by Chris Lamb. It included contributions from :

[ Chris Lamb ] * Optimisations: - Avoid multiple iterations over archive by unpacking once for an ~8X runtime optimisation. - Avoid unnecessary splitting and interpolating for a ~20X optimisation when writing --text output. - Avoid expensive diff regex parsing until we need it, speeding up diff parsing by 2X. - Alias expensive Config() in diff parsing lookup for a 10% optimisation. * Progress bar: - Show filenames, ELF sections, etc. in progress bar. - Emit JSON on the the status file descriptor output instead of a custom format. * Logging: - Use more-Pythonic logging functions and output based on __name__, etc. - Use Debian-style "I:", "D:" log level format modifier. - Only print milliseconds in output, not microseconds. - Print version in debug output so that saved debug outputs can standalone as bug reports. * Profiling: - Also report the total number of method calls, not just the total time. - Report on the total wall clock taken to execute diffoscope, including cleanup. * Tidying: - Rename "NonExisting" -> "Missing". - Entirely rework diffoscope.comparators module, splitting as many separate concerns into a different utility package, tidying imports, etc. - Split diffoscope.difference into diffoscope.diff, etc. - Update file references in debian/copyright post module reorganisation. - Many other cleanups, etc. * Misc: - Clarify comment regarding why we call python3(1) directly. Thanks to Jérémy Bobbio <lunar@debian.org>. - Raise a clearer error if trying to use --html-dir on a file. - Fix --output-empty when files are identical and no outputs specified. [ Reiner Herrmann ] * Extend .apk recognition regex to also match zip archives (Closes: #849638) [ Mattia Rizzolo ] * Follow the rename of the Debian package "python-jsbeautifier" to "jsbeautifier". [ siamezzze ] * Fixed no newline being classified as order-like difference. reprotest development

reprotest 0.5 was uploaded to unstable by Chris Lamb. It included contributions from:

[ Ximin Luo ] * Stop advertising variations that we're not actually varying. That is: domain_host, shell, user_group. * Fix auto-presets in the case of a file in the current directory. * Allow disabling build-path variations. (Closes: #833284) * Add a faketime variation, with NO_FAKE_STAT=1 to avoid messing with various buildsystems. This is on by default; if it causes your builds to mess up please do file a bug report. * Add a --store-dir option to save artifacts.

Other contributions (not yet uploaded):

reproducible-builds.org website development tests.reproducible-builds.org
  • Debian arm64 architecture was fully tested in all three suites in just 15 days. Thanks again to Codethink.co.uk for their support!
  • Log diffoscope profiling info. (lamby)
  • Run pg_dump with -O --column-inserts to make easier to import our main database dump into a non-PostgreSQL database. (mapreri)
  • Debian armhf network: CPU frequency scaling was enabled for three Firefly boards, enabling the CPUs to run at full speed. (vagrant)
  • Arch Linux and Fedora tests have been disabled (h01ger)
  • Improve mail notifications about daily problems. (h01ger)
Misc.

This week's edition was written by Chris Lamb, Holger Levsen and Vagrant Cascadian, reviewed by a bunch of Reproducible Builds folks on IRC & the mailing lists.

Categories: FLOSS Project Planets

Evolving Web: Upcoming 2017 Drupal Events where we can meet in North America

Planet Drupal - Wed, 2017-01-11 09:41

And it is finally 2017! New year, new projects, new challenges and, of course, a lot of Drupal events.

On this short post, I'll go through a few Drupal events in North America that we'll be either attending or be sponsoring on the first quarter of the year.

If you are planning to attend, feel free to get in touch with us in advance. We love hanging around and meeting with fellow community members, potential business partners, and people just interested in getting to know us.

read more
Categories: FLOSS Project Planets

PyTennessee: PyTN Profiles: A. Jesse Jiryu Davis and Level12

Planet Python - Wed, 2017-01-11 09:24


Speaker Profile: A. Jesse Jiryu Davis (@jessejiryudavis)

Staff Engineer at MongoDB in New York City specializing in C, Python, and async. Lead developer of the MongoDB C Driver libraries libbson and libmongoc. Author of Motor, an async MongoDB driver for Tornado and asyncio. Contributor to Python, PyMongo, MongoDB, Tornado, and asyncio. Co-author with Guido van Rossum of “A Web Crawler With asyncio Coroutines”, a chapter in the “500 Lines or Less” book in the Architecture of Open Source Applications series.

Lives in Manhattan’s East Village with writer Jennifer Keishin Armstrong and two beautiful dwarf hamsters.

Jesse will be presenting “Python Coroutines: A Magic Show!” at 2:00PM Saturday (2/4) in the auditorium. First I’ll write an async framework before your eyes. But the show’s not over! I’ll build coroutines with Python 3’s “async” and “await”. This isn’t just a magic trick, you’ll learn what “async” really means, how an event loop works, and how the Python interpreter pauses and resumes coroutines. A mysterious set of technologies becomes simple and accessible once you see a concise implementation.

Sponsor Profile: Level12 (@Level12io)

Level 12 is a software craftsmanship firm specializing in custom web and data(base) applications. We help our clients leverage software to solve some of their most daunting business challenges.

We are passionate about:

  • Productivity – maximizing time, talent, and resources
  • Radical candor – we think you can handle the truth
  • Advocacy – partnering with our clients to achieve results
  • Craftsmanship – this goes beyond technical competence, we are passionate about building software the right way. There is an art to it and we are artists. We love beautiful and clean code.

Agile methodologies inform our software development and project management workflows. Combined with close collaboration and radical candor, we deliver functional software through an iterative process that provides flexibility and promotes ROI.

If a tank and milk truck walk into a bar and get into a fight, who wins?

Find out at http://level12.io/who-wins?

Categories: FLOSS Project Planets

PyTennessee: PyTN Profiles: Kevin Najimi and Wingware

Planet Python - Wed, 2017-01-11 09:15


Speaker Profile: Kevin Najimi (@kevin_najimi)

Kevin is a huge fan of coding in Python and finance. He wants everyone to enjoy programming in Python at least as much as he does. He cares deeply about educating people about investing. Oh…and he really doesn’t like writing about himself…so here are some bullets.

Background - Works as a Solution Architect in the investment management space - Has 15 years of experience building quant research and algorithmic trading systems
- Active member of the Chicago and New York algorithmic trading community

Kevin will be presenting “Algorithmic Trading with Python” at 1:00PM Saturday (2/4) in Room 300. Ever wonder what it takes to get started using Python to manage your investments and automate your trading? This demo-rich talk will walk through how to get started the right way and avoid some of the dangerous pitfalls.

Sponsor Profile: Wingware (@pythonide)

Wing IDE is designed specifically for Python. Its powerful code intelligence, editing, refactoring, debugging, and testing features boost productivity, reduce coding errors, and simplify working with unfamiliar code. Wing IDE runs on Windows, Linux, and OS X and works with Django, Flask, Plone, GAE, wxPython, PyQt, matplotlib, Maya, NUKE, pygame, and many others. For details please visit wingware.com.

Categories: FLOSS Project Planets

Experienced Django: Extending Templates

Planet Python - Wed, 2017-01-11 09:07

Recently at work, I had a quite long code review from a very young, very rushed co-worker.  Let many such reviews (my own included) there were several instances of copy-paste coding, where the same block of code is copied to several locations with only slight modifications.

This was fresh in my mind when I started in on this series of tutorials (which are quite good):  Building Data Products with Python

In the middle of the first page, there is a great description of how I have been doing copy-paste coding in the KidTasks app – navigation links in the templates.

Fortunately for me, they also give a good example of how to fix this by using the extends tag to reference shared template code, similar to using a base class.

The Problem

Two of my templates end with the following four lines of code

         <br/> <a href="{% url 'task_new' %}?from={{ request.path|urlencode }}">Add New Task</a>          <br/> <a href="{% url 'rep_task_new' %}?from={{ request.path|urlencode }}">Add New Repeating Task</a>          <br/> <a href="{% url 'kid_new' %}?from={{ request.path|urlencode }}">Add New Kid</a>          <br/> <a href="{% url 'today' %}">View Today's Tasks</a>

While there are currently only two instances of this, there are likely to be more in the future (or in a real app).

The Fix

Fortunately, the solution was quite simple.  First you create the base template which includes the shared code and some blocks for parts of the template content.  In my case, for such simple pages, there was really only a title block and a content block.  The base template looks like this:

<div>    <h1>{% block title %}(no title){% endblock %}</h1>    {% block content %}(no content){% endblock %}    <nav>       <div id="navbar">          <br/> <a href="{% url 'task_new' %}?from={{ request.path|urlencode }}">Add New Task</a>          <br/> <a href="{% url 'rep_task_new' %}?from={{ request.path|urlencode }}">Add New Repeating Task</a>          <br/> <a href="{% url 'kid_new' %}?from={{ request.path|urlencode }}">Add New Kid</a>          <br/> <a href="{% url 'today' %}">View Today's Tasks</a>       </div>    </nav> </div>

Note that this puts the navigation links on the bottom of the page (where they were previously).  It also adds a div with an id to the nav bar which, I suspect, is going to be handy when I start diving in to CSS and making the site look a bit more professional.

Another thing to note is that this will also move the formatting of the title for these pages to the base template rather than having it in each individual template.  This is helpful for maintaining a consistent look throughout your app.

The change to the sub-templates was quite small.  First I had to move the title from inside the content block into its own block:

{% block title %} Today's Tasks {% endblock %}

…and then all I had to do was use the extends tag to reference the base template:

{% extends 'tasks/base.html' %}

NOTE: the extends tag is required to be the first line in the template.  Django will complain loudly if it is not.

One of the full sub-templates ended up looking like this:

{% extends 'tasks/base.html' %} {% block title %} Today's Tasks {% endblock %} {% block content %} {% if kids %}     {% for name, tasks in kids.items %}         <h1>{{ name }} on {{ day }}</h1>         {% if tasks %}             <ul>             {% for task in tasks %}                 <li><a href="{% url 'update' task.id %}">                 {% if task.completed %} <strike> {% endif %}                     {{ task.name }}                 {% if task.completed %} </strike> {% endif %}                 </a></li>             {% endfor %}             </ul>         {% else %}             <p>No tasks for {{ name }}.</p>         {% endif %}     {% endfor %} {% else %}     <p>No kids are present.</p> {% endif %} {% endblock %}

That’s all there is to it!  Thanks to Jose A Dianes for the well-written tutorial referenced above.

Categories: FLOSS Project Planets

Eli Bendersky: A brief tutorial on parsing reStructuredText (reST)

Planet Python - Wed, 2017-01-11 09:05

Docutils, the canonical library for processing and munging reStructuredText, is mostly used in an end-to-end mode where HTML or other user-consumable formats are produced from input reST files. However, sometimes it's useful to develop tooling that works on reST input directly and does something non-standard. In this case, one has to dig only a little deeper in Docutils to find useful modules to help with the task.

In this short tutorial I'm going to show how to write a tool that consumes reST files and does something other than generating HTML from them. As a simple but useful example, I'll demonstrate a link checker - a tool that checks that all web links within a reST document are valid. As a bonus, I'll show another tool that uses internal table-parsing libraries within Docutils that let us write pretty-looking ASCII tables and parse them.

Parsing reST text into a Document

This tutorial is a code walk-through for the complete code sample available online. I'll only show a couple of the most important code snippets from the full sample.

Docutils represents a reST file internally as your typical document tree (similarly to many XML and HTML parsers), where every node is of a type derived from docutils.nodes.Node. The top-level document is parsed into an object of type document [1].

We start by creating a new document with some default settings and populating it with the output of a Parser:

# ... here 'fileobj' is a file-like object holding the contents of the input # reST file. # Parse the file into a document with the rst parser. default_settings = docutils.frontend.OptionParser( components=(docutils.parsers.rst.Parser,)).get_default_values() document = docutils.utils.new_document(fileobj.name, default_settings) parser = docutils.parsers.rst.Parser() parser.parse(fileobj.read(), document) Processing a reST document with a visitor

Once we have the document, we can go through it and find the data we want. Docutils helps by defining a hierarchy of Visitor types, and a walk method on every Node that will recursively visit the subtree starting with this node. This is a very typical pattern for Python code; the standard library has a number of similar objects - for example ast.NodeVisitor.

Here's our visitor class that handles reference nodes specially:

class LinkCheckerVisitor(docutils.nodes.GenericNodeVisitor): def visit_reference(self, node): # Catch reference nodes for link-checking. check_link(node['refuri']) def default_visit(self, node): # Pass all other nodes through. pass

How did I know it's reference nodes I need and not something else? Just experemintation :) Once we parse a reST document we can print the tree and it shows which nodes contain what. Coupled with reading the source code of Docutils (particularly the docutils/nodes.py module) it's fairly easy to figure out which nodes one needs to catch.

With this visitor class in hand, we simply call walk on the parsed document:

# Visit the parsed document with our link-checking visitor. visitor = LinkCheckerVisitor(document) document.walk(visitor)

That's it! To see what check_link does, check out the code sample.

Bonus: parsing ASCII grid tables with Docutils

Docutils supports defining tables in ASCII in a couple of ways; one I like in particular is "grid tables", done like this:

+------------------------+------------+----------+----------+ | Header row, column 1 | Header 2 | Header 3 | Header 4 | +========================+============+==========+==========+ | body row 1, column 1 | column 2 | column 3 | column 4 | +------------------------+------------+----------+----------+ | body row 2 | Cells may span columns. | +------------------------+------------+---------------------+ | body row 3 | Cells may | - Table cells | +------------------------+ span rows. | - contain | | body row 4 | | - body elements. | +------------------------+------------+---------------------+

Even if we don't really care about reST but just want to be able to parse tables like the one above, Docutils can help. We can use its tableparser module. Here's a short snippet from another code sample:

def parse_grid_table(text): # Clean up the input: get rid of empty lines and strip all leading and # trailing whitespace. lines = filter(bool, (line.strip() for line in text.splitlines())) parser = docutils.parsers.rst.tableparser.GridTableParser() return parser.parse(docutils.statemachine.StringList(list(lines)))

The parser returns an internal representation of the table that can be easily used to analyze it or to munge & emit something else (by default Docutils can emit HTML tables from it).

One small caveat in this code to pay attention to: we need to represent the table as a list of lines (strings) and then wrap it in a docutils.statemachine.StringList object, which is a Docutils helper that provides useful analysis methods on lists of strings.

[1]

Inconsistent naming, I know. In general, Docutils is not the most Pythonic library out there; this is totally forgivable given how old it is. Its roots go all the way back to the early days of Python, and both the language and the coding style has changed quite a bit since then.

On the positive side, Docutils is well designed and is reasonably magic-free. I always feel at home when rummaging through its source code - it's readable and intuitive.

Categories: FLOSS Project Planets

DrupalEasy: Florida DrupalCamp: Now a 3-day, networking and learning open-source extravaganza!

Planet Drupal - Wed, 2017-01-11 08:16

We’re taking it up a couple notches this year down in Orlando for the 9th Florida DrupalCamp! We're expanding in every dimension we can find - highlighted by an opening day (Friday) of full-day workshops followed by two days (Saturday and half-day Sunday) of sessions!

Amazing Trainings

In previous years, we’ve had concurrent trainings on Saturdays and some sprinting on Sunday. This year, we changed it up: Friday will be a full training day including workshops on:

  • Beginning React JS - taught by John Tucker
  • Docker for Development with Drupal - taught by Lisa Ridley  
  • Introduction to Drupal 8 - taught by DrupalEasy's own Michael Anello  
  • Introduction to Drupal 8 Theming - taught by Suzanne Dergacheva
  • Introduction to DrupalVM - taught by Ben Hosmer

The best part is that trainings are included with the price of the ticket (now $35). You sign up for the training when registering. Space is limited, however, so register soon!

Three Phenomenal Featured Speakers

We have three amazing featured speakers this year!

Many More Extraordinary Sessions and Speakers

We have over 30 sessions already submitted with several weeks to go until the deadline. We still need more. Check them out and submit your session soon.

Kick-A** Weather

Orlando in February. Sunny and warm :)

The Absolute Best Sponsors Ever

It's true, we have the best group of sponsors money can't buy! We're crazy-happy about having Achieve Agency as our top-level Platinum sponsor. This newly formed South Florida shop is looking to make a big splash in the community, and we're happy that we can help introduce them to everyone. 

Combined with Johnson & Johnson, devPanel, and Digital Echidna as Gold-level sponsors as well as all of our other amazing supporters (including DrupalEasy!), we're excited to bring you the biggest and best Drupal event you've ever seen (not to mention a few fun surprises!)

Lots More Sensational Stuff!

From a new logo, great catering, t-shirt, giveaways, and even an integrated video game easter-egg on our website, there’s lots to be had. Don't miss it!  Register today or regret it for the rest of 2017!

Categories: FLOSS Project Planets

PyCon: PyCon Startup Row 2017 Applications Are Now Open!

Planet Python - Wed, 2017-01-11 07:04

Starting at the 2011 conference in Atlanta, the PyCon Expo Hall has offered a special event for startups: “Startup Row,” a row of booths that features interesting startups built with Python.

We’re happy to announce that applications to Startup Row at PyCon 2017 in Portland, Oregon, are now open!

You may have questions about Startup Row, so here we provide some basic answers.

How do I apply?

There is information about applying at the end of this post, but if you’re the “do first, ask questions later” type, go to our application form.

What do Startup Row companies get?

We give founders a unique opportunity to connect with the vibrant, diverse community of engineers, data scientists, speakers, investors and enthusiasts who come to the world’s largest Python programming conference.

Startup Row companies get:

  • Free booth space
  • Admission to PyCon for two startup team members
  • Coverage here on the PyCon blog and elsewhere
  • A couple of fun events exclusively for Startup Row companies and the community.

And in a first for Startup Row, this year we’ll be giving our companies access to the Jobs Fair at PyCon, so they can recruit from the same quality pool of engineering talent that the likes of Google, Facebook, Dropbox and other big companies have recruited at PyCon for years.

All in, if selected, your company receives a few thousand dollars worth of access to the best PyCon has to offer, all for free because you’re doing cool stuff with Python.

What are the rules?
  1. Your startup has to be 2.5 years old or less.
  2. Including founders, there have to be less than 15 people on the team at the time you apply.
  3. Obviously, you have to use Python somewhere in your stack. (Open source, proprietary, front end, back end, data analysis, devops — it all counts.)
How does the selection committee pick companies?
  • We strongly favor engineer-founders, people who can build both valuable software and valuable businesses.
  • The technology or product has to be interesting. Are you solving a tough engineering problem? Building a version control system to replace git? Using a new technology in a unique way? Something that scratches your own itch as a domain expert in some field? Great!
  • Traction. Is your company reaching a lot of people, either now or in the near future? Do you have a good sales pipeline? Lots of signups? MAU stats that would make Facebook jealous? Be sure to tell us about it in your application.
Which companies have been on Startup Row before?

In the past six years, Startup Row has featured over 75 companies, some of which you’ve probably heard of or even used.

Pandas, the popular data science library, was created by Lambda Foundry. 

DotCloud (which would become Docker), ZeroVM, X.ai, Mailgun, Mixpanel, AppThwack, and many others were all featured on Startup Row back when they were early stage startups.

I’ve heard something about local pitch events. Tell me more!

Yes, we’re hosting pitch events in Seattle, San Francisco, Chicago, and New York. If you’re interested in pitching or hosting your own local Startup Row pitch event, email one of Startup Row’s organizers at don [at] sheu [dot] com, or jason [at] jdr [dot] fyi for more information.

Currently, we've scheduled events in Chicago, San Francisco and Seattle, and we're adding more dates. The Chicago event is on January 26 at Braintree HQ in collaboration with the Braintree team and ChiPy, the local user group. The San Francisco event is on March 8, and as of the time of publishing a venue is TBD. Finally, Avvo offered to host the Seattle event in collaboration with PuPPy, the Seattle and Puget Sound Python user group.

We’ll be announcing the local events schedule and additional dates on the Startup Row page.

Where can I learn more about Startup Row?

Startup Row has its own page on the PyCon 2017 site, where you can learn more about the history of Startup Row at PyCon (fun fact: it started as a collaboration between Y Combinator and the PSF) and just how well Startup Row alumni have performed (another fun fact: nearly 20% have had successful exits so far).

If you have any quick questions up front for the organizing team, you can find us @ulysseas and @jason_rowley on Twitter, or at the email addresses listed above.

Okay, I’ve read all this. Now, how do I apply?

First off, we commend you for sticking it through to the end! You can click here to go to the application form for Startup Row.

We’re looking forward to learning a little more about what you’re working on!

Categories: FLOSS Project Planets

Dirk Eddelbuettel: R / Finance 2017 Call for Papers

Planet Debian - Wed, 2017-01-11 06:44

Last week, Josh sent the call for papers to the R-SIG-Finance list making everyone aware that we will have our nineth annual R/Finance conference in Chicago in May. Please see the call for paper (at the link, below, or at the website) and consider submitting a paper.

We are once again very excited about our conference, thrilled about upcoming keynotes and hope that many R / Finance users will not only join us in Chicago in May 2017 -- but also submit an exciting proposal.

We also overhauled the website, so please see R/Finance. It should render well and fast on devices of all sizes: phones, tablets, desktops with browsers in different resolutions. The program and registration details still correspond to last year's conference and will be updated in due course.

So read on below, and see you in Chicago in May!

Call for Papers

R/Finance 2017: Applied Finance with R
May 19 and 20, 2017
University of Illinois at Chicago, IL, USA

The ninth annual R/Finance conference for applied finance using R will be held on May 19 and 20, 2017 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past eight years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2017.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated "lightning talks." Both academic and practitioner proposals related to R are encouraged.

All slides will be made publicly available at conference time. Presenters are strongly encouraged to provide working R code to accompany the slides. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

Financial assistance for travel and accommodation may be available to presenters, however requests must be made at the time of submission. Assistance will be granted at the discretion of the conference committee.

Please submit proposals online at http://go.uic.edu/rfinsubmit.

Submissions will be reviewed and accepted on a rolling basis with a final deadline of February 28, 2017. Submitters will be notified via email by March 31, 2017 of acceptance, presentation length, and financial assistance (if requested).

Additional details will be announced via the conference website as they become available. Information on previous years' presenters and their presentations are also at the conference website. We will make a separate announcement when registration opens.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Categories: FLOSS Project Planets
Syndicate content