FLOSS Project Planets

KnackForge: How to update Drupal 8 core?

Planet Drupal - Sat, 2018-03-24 00:01
How to update Drupal 8 core?

Let's see how to update your Drupal site between 8.x.x minor and patch versions. For example, from 8.1.2 to 8.1.3, or from 8.3.5 to 8.4.0. I hope this will help you.

  • If you are upgrading to Drupal version x.y.z

           x -> is known as the major version number

           y -> is known as the minor version number

           z -> is known as the patch version number.

Sat, 03/24/2018 - 10:31
Categories: FLOSS Project Planets

foss-gbg on Wednesday

Planet KDE - Mon, 2017-02-20 01:08

If you happen to be in Gothenburg on Wednesday you are most welcome to visit foss-gbg. It is a free event (you still have to register so that we can arrange some light food) starting at 17.00.

The topics are Yocto Linux on FPGA-based hardware, risk and license management in open source projects and a product release by the local start-up Zifra (an encryptable SD-card).

More information and free tickets are available at the foss-gbg site.

Welcome!

Categories: FLOSS Project Planets

Russ Allbery: Haul via parents

Planet Debian - Sun, 2017-02-19 21:39

My parents were cleaning out a bunch of books they didn't want, so I grabbed some of the ones that looked interesting. A rather wide variety of random stuff. Also, a few more snap purchases on the Kindle even though I've not been actually finishing books recently. (I do have two finished and waiting for me to write reviews, at least.) Who knows when, if ever, I'll read these.

Mark Ames — Going Postal (nonfiction)
Catherine Asaro — The Misted Cliffs (sff)
Ambrose Bierce — The Complete Short Stores of Ambrose Bierce (collection)
E. William Brown — Perilous Waif (sff)
Joseph Campbell — A Hero with a Thousand Faces (nonfiction)
Jacqueline Carey — Miranda and Caliban (sff)
Noam Chomsky — 9-11 (nonfiction)
Noam Chomsky — The Common Good (nonfiction)
Robert X. Cringely — Accidental Empires (nonfiction)
Neil Gaiman — American Gods (sff)
Neil Gaiman — Norse Mythology (sff)
Stephen Gillet — World Building (nonfiction)
Donald Harstad — Eleven Days (mystery)
Donald Harstad — Known Dead (mystery)
Donald Harstad — The Big Thaw (mystery)
James Hilton — Lost Horizon (mainstream)
Spencer Johnson — The Precious Present (nonfiction)
Michael Lerner — The Politics of Meaning (nonfiction)
C.S. Lewis — The Joyful Christian (nonfiction)
Grigori Medredev — The Truth about Chernobyl (nonfiction)
Tom Nadeu — Seven Lean Years (nonfiction)
Barak Obama — The Audacity of Hope (nonfiction)
Ed Regis — Great Mambo Chicken and the Transhuman Condition (nonfiction)
Fred Saberhagen — Berserker: Blue Death (sff)
Al Sarrantonio (ed.) — Redshift (sff anthology)
John Scalzi — Fuzzy Nation (sff)
John Scalzi — The End of All Things (sff)
Kristine Smith — Rules of Conflict (sff)
Henry David Thoreau — Civil Disobedience and Other Essays (nonfiction)
Alan W. Watts — The Book (nonfiction)
Peter Whybrow — A Mood Apart (nonfiction)

I've already read (and reviewed) American Gods, but didn't own a copy of it, and that seemed like a good book to have a copy of.

The Carey and Brown were snap purchases, and I picked up a couple more Scalzi books in a recent sale.

Categories: FLOSS Project Planets

Norbert Preining: Ryu Murakami – Tokyo Decadence

Planet Debian - Sun, 2017-02-19 21:08

The other Murakami, Ryu Murakami (村上 龍), is hard to compare to the more famous Haruki. His collection of stories reflects the dark sides of Tokyo, far removed from the happy world of AKB48 and the like. Criminals, prostitutes, depression, loss. A bleak image onto a bleak society.

This collection of short stories is a consequent deconstruction of happiness, love, everything we believe to make our lives worthwhile. The protagonists are idealistic students loosing their faith, office ladies on aberrations, drunkards, movie directors, the usual mixture. But the topic remains constant – the unfulfilled search for happiness and love.

I felt I was beginning to understand what happiness is about. It isn’t about guzzling ten or twenty energy drinks a day, barreling down the highway for hours at a time, turning over your paycheck to your wife without even opening the envelope, and trying to force your family to respect you. Happiness is based on secrets and lies.Ryu Murakami, It all started just about a year and a half ago

A deep pessimistic undertone is echoing through these stories, and the atmosphere and writing reminds of Charles Bukowski. This pessimism resonates in the melancholy of the running themes in the stories, Cuban music. Murakami was active in disseminating Cuban music in Japan, which included founding his own label. Javier Olmo’s pieces are often the connecting parts, as well as lending the short stories their title: Historia de un amor, Se fué.

The belief – that what’s missing now used to be available to us – is just an illusion, if you ask me. But the social pressure of “You’ve got everything you need, what’s your problem?” is more powerful than you might ever think, and it’s hard to defend yourself against it. In this country it’s taboo even to think about looking for something more in life.Ryu Murakami, Historia de un amor

It is interesting to see that on the surface, the women in the stories are the broken characters, leading feminists to incredible rants about the book, see the rant^Wreview of Blake Fraina at Goodreads:

I’ll start by saying that, as a feminist, I’m deeply suspicious of male writers who obsess over the sex lives of women and, further, have the audacity to write from a female viewpoint…
…female characters are pretty much all pathetic victims of the male characters…
I wish there was absolutely no market for stuff like this and I particularly discourage women readers from buying it…Blake Fraina, Goodreads review

On first sight it might look like that the female characters are pretty much all pathetic victims of the male characters, but in fact it is the other way round, the desperate characters, the slaves of their own desperation, are the men, and not the women, in these stories. It is dual to the situation in Hitomi Kanehara’s Snakes and Earrings, where on first sight the tattooist and the outlaw friends are the broken characters, but the really cracked one is the sweet Tokyo girly.

Male-female relationships are always in transition. If there’s no forward progress, things tend to slip backwards.Ryu Murakami, Se fué

Final verdict: Great reading, hard to put down, very much readable and enjoyable, if one is in the mood of dark and depressing stories. And last but not least, don’t trust feminist book reviews.

Categories: FLOSS Project Planets

Carl Trachte: Filling in Missing Grouping Columns of MSSQL SSRS Report Dumped to Excel

Planet Python - Sun, 2017-02-19 19:34
This is another simple but common problem in certain business environments:

1) Data are presented via a Microsoft SQL Server Reporting Services report, BUT

2) The user wants the data in Excel, and, further, wants to play with it (pivot, etc.) there.  The problem is that the grouping column labels are not in every record, only in the one row that begins the list of records for that group (sanitized screenshot below):

But I don't WANT to copy and paste all those groupings for 30,000 records :*-(I had this assignment recently from a remote request.  It took about four rounds of an e-mail exchange to figure out that it really wasn't a data problem, but a formatting one that needed solving.

It is possible to do the whole thing in Python.  I did the Excel part by hand in order to get a handle on the data:

1) In Excel, delete the extra rows on top of the report leaving just the headers and the data.

2) In Excel, select everything on the data page, format the cells correctly by unselecting the Merge Cells and Wraparound options.

3) In Excel, at this point you should be able to see if there are extra empty columns as space fillers; delete them.  Save the worksheet as a csv file.

4) In a text editor, open your csv file, identify any empty rows, and delete them.  Change column header names as desired.

Now the Python part:

#!python36

"""
Doctor csv dump from unmerged cell
dump of SSRS dump from MSSQL database.

Fill in cell gaps where merged
cells had only one grouping value
so that all rows are complete records.
"""

import pprint

COMMA = ','
EMPTY = ''

INFILE = 'rawdata.csv'
OUTFILE = 'canneddumpfixed.csv'

ERRORFLAG = 'ERROR!'

f = open(INFILE, 'r')
headerline = next(f)
numbercolumns = len(headerline.split(COMMA))

f2 = open(OUTFILE, 'w')

# Assume at least one data column on far right.
missingvalues = (numbercolumns - 1) * [ERRORFLAG]

for linex in f:
    print('Processing line {:s} . . .'.format(linex))
    splitrecord = linex.split(COMMA)
    for slotx in range(0, numbercolumns - 1):
        if splitrecord[slotx] != EMPTY:
            missingvalues[slotx] = splitrecord[slotx]
        else:
            splitrecord[slotx] = missingvalues[slotx]
    f2.write(COMMA.join(splitrecord))

f2.close()

print('Finished')

At this point you've got your data in csv format - you can open it in Excel and go to work.

There may be a free or COTS (commercial off the shelf) utility that does all this somewhere in the Microsoft "ecosystem" (I think that's their fancy enviro-friendly word for vendor-user community) but I don't know of one.


Thanks for stopping by.





Categories: FLOSS Project Planets

Savas Labs: Docker and the Drupal Pattern Lab Starter Theme

Planet Drupal - Sun, 2017-02-19 19:00

How to build a Docker Pattern Lab image for local Drupal development with the Pattern Lab Starter theme and/or with other common front-end applications such as npm, Gulp, and Bower. Continue reading…

Categories: FLOSS Project Planets

Gregor Herrmann: RC bugs 2016/52-2017/07

Planet Debian - Sun, 2017-02-19 17:19

debian is in deep freeze for the upcoming stretch release. still, I haven't dived into fixing "general" release-critical bugs yet; so far I mostly kept to working on bugs in the debian perl group:

  • #834912 – src:libfile-tee-perl: "libfile-tee-perl: FTBFS randomly (Failed 1/2 test programs)"
    add patch from ntyni (pkg-perl)
  • #845167 – src:lemonldap-ng: "lemonldap-ng: FTBFS randomly (failing tests)"
    upload package prepared by xavier with disabled tests (pkg-perl)
  • #849362 – libstring-diff-perl: "libstring-diff-perl: FTBFS: test failures with new libyaml-perl"
    add patch from ntyni (pkg-perl)
  • #851033 – src:jabref: "jabref: FTBFS: Could not find org.postgresql:postgresql:9.4.1210."
    update maven.rules
  • #851347 – libjson-validator-perl: "libjson-validator-perl: uses deprecated Mojo::Util::slurp, makes libswagger2-perl FTBFS"
    upload new upstream release (pkg-perl)
  • #852853 – src:libwww-curl-perl: "libwww-curl-perl: FTBFS (Cannot find curl.h)"
    add patch for multiarch curl (pkg-perl)
  • #852879 – src:license-reconcile: "license-reconcile: FTBFS: dh_auto_test: perl Build test --verbose 1 returned exit code 255"
    update tests (pkg-perl)
  • #852889 – src:liblatex-driver-perl: "liblatex-driver-perl: FTBFS: Test failures"
    add missing build dependency (pkg-perl)
  • #854859 – lemonldap-ng-doc: "lemonldap-ng-doc: unhandled symlink to directory conversion: /usr/share/doc/lemonldap-ng-doc/pages/documentation/current"
    help with dpkg-maintscript-helper, upload on xavier's behalf (pkg-perl)

thanks to the release team for pro-actively unblocking the packages with fixes which were uploaded after the begin of the freeze!

Categories: FLOSS Project Planets

Bhishan Bhandari: Raising and Handling Exceptions in Python – Python Programming Essentials

Planet Python - Sun, 2017-02-19 08:31
Brief Introduction Any unexpected events that occur during the execution of a program is known to be an exception. Like everything, exceptions are also objects in python that is either an instance of Exception class or an instance of underlying class derived from the base class Exception. Exceptions may occur due to logical errors in […]
Categories: FLOSS Project Planets

Import Python: Import Python Weekly Issue 112 - Python Programming Videos By MIT, mypy static type checker and more

Planet Python - Sun, 2017-02-19 06:44
Worthy Read
Introduction to Computer Science and Programming in Python. Video Series from MIT Introduction to Computer Science and Programming in Python is intended for students with little or no programming experience. It aims to provide students with an understanding of the role computation can play in solving problems and to help students, regardless of their major, feel justifiably confident of their ability to write small programs that allow them to accomplish useful goals. The class uses the Python 3.5 programming language.
video
Whitepaper 3 Ways Our Dev Teams Create Velocity with Multi-System Integrations
sponsor
Python repository moves to GitHub Python core developer Brett talks about the history the decision to move Python to GitHub
core-python
memoryview memoryview is a special type that can be used to work with data stored in other data-structures.
core python
A Python-esque Type System for Python: Duck Typing Statically I think the mypy static type checker is a fantastic initiative, and absolutely love it. My one complaint is that it relies a little too much on subclassing for determining compatibility. This post discusses nominal vs. structural subtyping, duck typing and how it relates to structural subtyping, subtyping in mypy, and using abstract base classes in lieu of a structural subtyping system.
mypy
Regular Expressions Are Nothing to Fear regex
Extreme IO performance with parallel Apache Parquet in Python In this post, I show how Parquet can encode very large datasets in a small file footprint, and how we can achieve data throughput significantly exceeding disk IO bandwidth by exploiting parallelism (multithreading).
parquet, IO
Hire Development Experts Toptal hand-matches leading companies with experts in software, web,, and mobile app development. Let us match you with on-demand developers for your next project.
sponsor
Two Easy Ways to Use Scikit Learn and Dask This post describes two simple ways to use Dask to parallelize Scikit-Learn operations either on a single computer or across a cluster.
sckit learn
Learning AI if You Suck at Math — P4 — Tensors Illustrated (with Cats!) This the 4th part in the series.
tensorflow
Getting Started With Kafka Basically in this guide we will configure a basic Kafka instance in an Ubuntu environment & write a very very basic python producer & consumer.
kafka
Predict gender with voice and speech data A beginner’s guide to implementing classification algorithms in Python
machine learning, classification
Ergonomica Ergonomica is a Python-based console language, integrating modules such as os, shutil, and subprocess into a fast, easy-to use environment. It allows for functional programming tools and operations as well as data types that would otherwise require obscure grep or sed commands.
shell
Deep Learning with Keras on Google Compute Engine Inception, a model developed by Google is a deep CNN. Against the ImageNet dataset (a common dataset for measuring image recognition performance) it performed top-5 error 3.47%. In this tutorial, you’ll use the pre-trained Inception model to provide predictions on images uploaded to a web server.
deep learning, keras
Django Weekly Issue 26 Django round up for this week.
django

Projects
cpython - 5349 Stars, 303 Fork The Python programming language default implementation is now on github.
Bella - 494 Stars, 53 Fork A pure python, post-exploitation, data mining tool and remote administration tool for macOS.
PyTorch-Mini-Tutorials - 112 Stars, 12 Fork Minimal tutorials for PyTorch
mog - 38 Stars, 2 Fork A different take on the UNIX tool cat
pyowl - 31 Stars, 6 Fork Ordered Weighted L1 regularization for classification and regression in Python
QuoraDQBaseline - 10 Stars, 5 Fork Baseline solution to Quora Duplicate Question dataset.
http_heartbeat_proxy - 2 Stars, 0 Fork A simple proxy make some service heartbeat-able.
Categories: FLOSS Project Planets

libsigsegv @ Savannah: libsigsegv 2.11 is released

GNU Planet! - Sat, 2017-02-18 17:33

libsigsegv version 2.11 is released.

New in this release:

  • Added support for catching stack overflow on Linux/SPARC.
  • Provide a correct value for SIGSTKSZ on 64-bit AIX and on HP-UX. The one defined by these systems is too small.
  • Updated build infrastructure.
  • Compilation now requires the <stdint.h> include file. Platforms which don't have this include file (such as IRIX) are no longer supported.
  • NOTE: Support for Cygwin and native Windows is currently not up-to-date.

Download: https://haible.de/bruno/gnu/libsigsegv-2.11.tar.gz

Categories: FLOSS Project Planets

Steve Kemp: Apologies for the blog-churn.

Planet Debian - Sat, 2017-02-18 17:00

I've been tweaking my blog a little over the past few days, getting ready for a new release of the chronicle blog compiler (github).

During the course of that I rewrote all the posts to have 100% lower-case file-paths. Redirection-pages have been auto-generated for each page which was previously mixed-case, but unfortunately that will have meant that the RSS feed updated unnecessarily:

  • If it used to contain:
    • https://example.com/Some_Page.html
  • It would have been updated to contain
    • https://example.com/some_page.html

That triggered a lot of spamming, as the URLs would have shown up as being new/unread/distinct.

Categories: FLOSS Project Planets

Jamal Moir: Become a Lord of the Cells and Speed up Your Jupyter Notebook Workflow

Planet Python - Sat, 2017-02-18 12:04

Everyone loves a good Jupyter Notebook. Jupyter Notebooks are an insanely convenient environment to rapidly prototype Python scripts and delve into Data Science. They speed up the time from writing code to actually executing it and you can visually see the output for each section you write. I make heavy use Jupyter Notebooks in my […]

The post Become a Lord of the Cells and Speed up Your Jupyter Notebook Workflow appeared first on Data Dependence.

Categories: FLOSS Project Planets

agoradesign: Drupal's great little helpers: Random utility class

Planet Drupal - Sat, 2017-02-18 07:50
Drupal's API has a huge number of very useful utitlity classes and functions, especially in Drupal 8. Although the API docs are great, it's rather impossible to always find every little feature. Today I want to show you the Random utility class, which I've nearly overseen and found rather by accident.
Categories: FLOSS Project Planets

Nicola Iarocci: Python Workload pulled off Visual Studio 2017 RC3

Planet Python - Sat, 2017-02-18 04:48

So how do you install the awesome Python Development Tools on the latest Visual Studio 2017 RC? That might seem a stupid question considering that the Data Science and Python Development workload has been available with every Release Candidate so far. You simply select the workload during the installation and you’re done, right? Not quite.

I found out the hard way this morning as I wanted to install VS 2017 RC3 on my development machine and, to my surprise, I could not find Python Development anywhere on the workloads window (which itself is a huge improvement over the VS 2015 install experience, by the way). Easy, I thought, they moved it to some secondary “optional workloads” tab, but a quick scan did not reveal any of that.

Concerned now, I turned to the Oracle of All Things only to find that the Python Workload has been pulled off the Visual Studio 2017 RC3 (January 2017). It was actually reported in the release notes:

Removed the Data Science and Python Development workloads as some of the components weren’t meeting the release requirements, such as translation to non-English languages. They will be available soon as separate downloads.

When I glanced over them I (and probably you too) did not notice this little paragraph. But wait, it’s even worse than you would expect:

Upgrading to current version will remove any previously installed Python and Data Science workloads/components.

That’s right. If you upgrade to RC3 you win a wipe out of your Python environment. Further research revelead an open ticket on GitHub. Apparently they are working on a way to install the Python and Data Science workloads on top of an existing VS 2017 install, but I would not hold my breath on it:

Thanks everyone for the support and understanding. It’s still not clear to us how we’re going to be releasing Python support, but the plan is definitely to have something when VS 2017 releases next month.

Since the official VS 2017 release is planned early next month it is very likely that we will just have to wait until then. In the meantime, you better have a VS 2015 sitting side by side with your brand new, mutilated, Visual Studio 2017. Or you switch to Visual Studio Code, which offers fantastic support for Python.

Or you fallback to good ole trusted Vim, like I did.

join the newsletter to get an email alert when a new post surfaces on this site. if you want to get in touch, i am @nicolaiarocci on twitter.

Categories: FLOSS Project Planets

Full Stack Python: The Full Stack Python Blog

Planet Python - Sat, 2017-02-18 00:00

Full Stack Python began way back in December 2012 when I started writing the initial deployment, server, operating system, web server and WSGI server pages. Since then, the pages have expanded out into a boatload of other areas including subjects outside the deployment topics I originally started the site to explain.

Frequently though I wanted to write a Python walkthrough that was not a good fit for the page format I use for each topic. Many of those walkthroughs became Twilio blog posts but not all of them were quite the right fit on there. I'll still be writing plenty more Twilio tutorials, but this Full Stack Python blog is the spot for technical posts that fall outside the Twilio domain.

Let me know what you think and what tutorials you'd like to see in the future. Hit me up on Twitter @fullstackpython or @mattmakai.

Categories: FLOSS Project Planets

Philip Semanchuk: Pandas Surprise

Planet Python - Fri, 2017-02-17 22:20
Summary

Part of learning how to use any tool is exploring its strengths and weaknesses. I’m just starting to use the Python library Pandas, and my naïve use of it exposed a weakness that surprised me.

Background Thanks to bradleypjohnson for sharing this Lucky Charms photo under CC BY 2.0.

I have a long list of objects, each with the properties “color” and “shape”. I want to count the frequency of each color/shape combination. A sample of what I’m trying to achieve could be represented in a grid like this –

circle square star blue 8 41 18 orange 5 33 25 red 53 64 58

At first I implemented this with a dictionary of collections.Counter instances where the top level dictionary is keyed by shape, like so –

import collections SHAPES = ('square', 'circle', 'star', ) frequencies = {shape: collections.Counter() for shape in SHAPES}

Then I counted my frequencies using the code below. (For simplicity, assume that my objects are simple 2-tuples of (shape, color)).

for shape, color in all_my_objects: frequencies[shape][color] += 1

So far, so good.

Enter the Pandas

This looked to me like a perfect opportunity to use a Pandas DataFrame which would nicely support the operations I wanted to do after tallying the frequencies, like adding a column to represent the total number (sum) of instances of each color.

It was especially easy to try out a DataFrame because my counting loop ( for...all_my_objects) wouldn’t change, only the definition of frequencies. (Note that the code below requires I know in advance all the possible colors I can expect to see, which the Dict + Counter version does not. This isn’t a problem for me in my real-world application.)

import pandas as pd frequencies = pd.DataFrame(columns=SHAPES, index=COLORS, data=0, dtype='int') for shape, color in all_my_objects: frequencies[shape][color] += 1 It Works, But…

Both versions of the code get the job done, but using the DataFrame as a frequency counter turned out to be astonishingly slow. A DataFrame is simply not optimized for repeatedly accessing individual cells as I do above.

How Slow is it?

To isolate the effect pandas was having on performance, I used Python’s timeit module to benchmark some simpler variations on this code. In the version of Python I’m using (3.6), the default number of iterations for each timeit test is 1 million.

First, I timed how long it takes to increment a simple variable, just to get a baseline.

Second, I timed how long it takes to increment a variable stored inside a collections.Counter inside a dict. This mimics the first version of my code (above) for a frequency counter. It’s more complex than the simple variable version because Python has to resolve two hash table references (one inside the dict, and one inside the Counter). I expected this to be slower, and it was.

Third, I timed how long it takes to increment one cell inside a 2×2 NumPy array. Since Pandas is built atop NumPy, this gives an idea of how the DataFrame’s backing store performs without Pandas involved.

Fourth, I timed how long it takes to increment one cell inside a 2×2 Pandas DataStore. This is what I had used in my real code.

Raw Benchmark Results

Here’s what timeit showed me. Sorry for the cramped formatting.

$ python Python 3.6.0 (v3.6.0:41df79263a11, Dec 22 2016, 17:23:13) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import timeit >>> timeit.timeit('data += 1', setup='data=0') 0.09242476700455882 >>> timeit.timeit('data[0][0]+=1',setup='from collections import Counter;data={0:Counter()}') 0.6838196019816678 >>> timeit.timeit('data[0][0]+=1',setup='import numpy as np;data=np.zeros((2,2))') 0.8909121589967981 >>> timeit.timeit('data[0][0]+=1',setup='import pandas as pd;data=pd.DataFrame(data=[[0,0],[0,0]],dtype="int")') 157.56428507200326 >>> Benchmark Results Summary

Here’s a summary of the results from above (decimals truncated at 3 digits). The rightmost column shows the results normalized so the fastest method (incrementing a simple variable) equals 1.

Actual (seconds) Normalized (seconds) Simple variable 0.092 1 Dict + Counter 0.683 7.398 Numpy 2D array 0.890 9.639 Pandas DataFrame 157.564 1704.784

As you can see, resolving the index references in the middle two cases (Dict + Counter in one case, NumPy array indices in the other) slows things down, which should come as no surprise. The NumPy array is a little slower than the Dict + Counter.

The DataFrame, however, is about 150 – 200 times slower than either of those two methods. Ouch!

I can’t really even give you a graph of all four of these methods together because the time consumed by the DataFrame throws the chart scale out of whack.

Here’s a bar chart of the first three methods –

Here’s a bar chart of all four –

Why Is My DataFrame Access So Slow?

One of the nice features of DataFrames is that they support dictionary-like labels for rows and columns. For instance, if I define my frequencies to look like this –

>>> SHAPES = ('square', 'circle', 'star', ) >>> COLORS = ('red', 'blue', 'orange') >>> pd.DataFrame(columns=SHAPES, index=COLORS, data=0, dtype='int')         square  circle  star red          0       0     0 blue         0       0     0 orange       0       0     0 >>>

Then frequencies['square']['orange'] is a valid reference.

Not only that, DataFrames support a variety of indexing and slicing options including –

  • A single label, e.g. 5 or 'a'
  • A list or array of labels ['a', 'b', 'c']
  • A slice object with labels 'a':'f'
  • A boolean array
  • A callable function with one argument

Here are those techniques applied in order to the frequencies DataFrame so you can see how they work –

>>> frequencies['star'] red       0 blue      0 orange    0 Name: star, dtype: int64 >>> frequencies[['square', 'star']]         square  star red          0     0 blue         0     0 orange       0     0 >>> frequencies['red':'blue']       square  circle  star red        0       0     0 blue       0       0     0 >>> frequencies[[True, False, True]]         square  circle  star red          0       0     0 orange       0       0     0 >>> frequencies[lambda x: 'star'] red       0 blue      0 orange    0 Name: star, dtype: int64

This flexibility has a price. Slicing (which is what is invoked by the square brackets) calls an object’s __getitem__() method. The parameter to __getitem__()  is the whatever was inside the square brackets. A DataFrame’s __getitem__() has to figure out what the passed parameter represents. Determining whether the parameter is a label reference, a callable, a boolean array, or something else takes time.

If you look at the DataFrame’s __getitem__() implementation, you can see all the code that has to execute to resolve a reference. (I linked to the version of the code that was current when I wrote this in February of 2017. By the time you read this, the actual implementation may differ.) Not only does __getitem__() have a lot to do, but because I’m accessing a cell (rather than a whole row or column), there’s two slice operations, so __getitem__() gets invoked twice each time I increment my counter.

This explains why the DataFrame is so much slower than the other methods. The dictionary and Counter both only support key lookup in a hash table, and a NumPy array has far fewer slicing options than a DataFrame, so its __getitem__() implementation can be much simpler.

Better DataFrame Indexing?

DataFrames support a few methods that exist explicitly to support “fast” getting and setting of scalars. Those methods are .at() (for label lookups) and .iat() (for integer-based index lookups). It also provides get_value() and set_value(), but those methods are deprecated in the version I have (0.19.2).

“Fast” is how the Panda’s documentation describes these methods. Let’s use timeit to get some hard data. I’ll try at() and iat(); I’ll also try get_value()/set_value() even though they’re deprecated.

>>> timeit.timeit("data.at['red','square']+=1",setup="import pandas as pd;data=pd.DataFrame(columns=('square','circle','star'),index=('red','blue','orange'),data=0,dtype='int')") 36.33179204000044 >>> timeit.timeit('data.iat[0,0]+=1',setup='import pandas as pd;data=pd.DataFrame(data=[[0,0],[0,0]],dtype="int")') 42.01523362501757 >>> timeit.timeit('data.set_value(0,0,data.get_value(0,0)+1)',setup='import pandas as pd;data=pd.DataFrame(data=[[0,0],[0,0]],dtype="int")') 15.050199927005451 >>>

These methods are better, but they’re still pretty bad. Let’s put those numbers in context by comparing them to other techniques. This time, for normalized results, I’m going to use my Dict + Counter method as the baseline of 1 and compare all other methods to that. The row “DataFrame (naïve)” refers to naïve slicing, like frequencies[0][0].

Actual (seconds) Normalized (seconds) Dict + Counter 0.683 1 Numpy 2D array 0.890 1.302 DataFrame (get/set) 15.050 22.009 DataFrame (at) 36.331 53.130 DataFrame (iat) 42.015 61.441 DataFrame (naïve) 157.564 230.417

The best I can do with a DataFrame uses deprecated methods, and is still over 20 times slower than the Dict + Counter. If I use non-deprecated methods, it’s over 50 times slower.

Workaround

I like label-based access to my frequency counters, I like the way I can manipulate data in a DataFrame (not shown here, but it’s useful in my real-world code), and I like speed. I don’t necessarily need blazing fast speed, I just don’t want slow.

I can have my cake and eat it too by combining methods. I do my counting with the Dict + Counter method, and use the result as initialization data to a DataFrame constructor.

SHAPES = ('square', 'circle', 'star', ) frequencies = {shape: collections.Counter() for shape in SHAPES} for shape, color in all_my_objects: frequencies[shape][color] += 1 frequencies = pd.DataFrame(data=frequencies)

The frequencies DataFrame now looks something like this –

circle square star blue 8 41 18 orange 5 33 25 red 53 64 58

The rows and columns appear in essentially random order; they’re ordered by whatever order Python returns the dict keys during DataFrame initialization. Getting them in a specific order is left as an exercise for the reader.

There’s one more detail to be aware of. If a particular (shape, color) combination doesn’t appear in my data, it will be represented by NaN in the DataFrame. They’re easy to set to 0 with frequencies.fillna(0).

Conclusion

What I was trying to do with Pandas – unfortunately, the very first thing I ever tried to do with it – didn’t play to its strengths. It didn’t break my code, but it slowed it down by a factor of ~1700. Since I had thousands of items to process, the difference was hard to overlook!

Pandas looks great for some things, and I expect I’ll continue using it. This was just a bump in the road, albeit an interesting one.

Categories: FLOSS Project Planets

Dirk Eddelbuettel: RPushbullet 0.3.1

Planet Debian - Fri, 2017-02-17 21:17

A new release 0.3.1 of the RPushbullet package, following the recent 0.3.0 release is now on CRAN. RPushbullet is interfacing the neat Pushbullet service for inter-device messaging, communication, and more. It lets you easily send alerts like the one to the to your browser, phone, tablet, ... -- or all at once.

This release owes once again a lot to Seth Wenchel who helped to update and extend a number of features. We fixed one more small bug stemming from the RJSONIO to jsonlite transition, and added a few more helpers. We also enabled Travis testing and with it covr-based coverage analysis using pretty much the same setup I described in this recent blog post.

Changes in version 0.3.1 (2017-02-17)
  • The target device designation was corrected (#39).

  • Three new (unexported) helper functions test the validity of the api key, device and channel (Seth in #41).

  • The summary method for the pbDevices class was corrected (Seth in #43).

  • New helper functions pbValidateConf, pbGetUser, pbGetChannelInfo were added (Seth in #44 closing #40).

  • New classes pbUser and pbChannelInfo were added (Seth in #44).

  • Travis CI tests (and covr coverage analysis) are now enabled via an encrypted config file (#45).

Courtesy of CRANberries, there is also a diffstat report for this release.

More details about the package are at the RPushbullet webpage and the RPushbullet GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Ingo Juergensmann: Migrating from Owncloud 7 on Debian to Nextcloud 11

Planet Debian - Fri, 2017-02-17 18:19

These days I got a mail by my hosting provider stating that my Owncloud instance is unsecure, because the online scan from scan.nextcloud.com mailed them. However the scan seemed quite bogus: it reported some issues that were listed as already solved in Debians changelog file. But unfortunately the last entry in changelog was on January 5th, 2016. So, there has been more than a whole year without security updates for Owncloud in Debian stable.

In an discussion with the Nextcloud team I complained a little bit that the scan/check is not appropriate. The Nextcloud team replied very helpful with additional information, such as two bug reports in Debian to clarify that the Owncloud package will most likely be removed in the next release: #816376 and #822681.

So, as there is no nextcloud package in Debian unstable as of now, there was no other way to manually upgrade & migrate to Nextcloud. This went fairly well:

ownCloud 7 -> ownCloud 8.0 -> ownCloud 8.1 -> ownCloud 8.2 -> ownCloud 9.0 -> ownCloud 9.1 -> Nextcloud 10 -> Nextcloud 11

There were some smaller caveats:

  1. When migrating from OC 9.0 to OC 9.1 you need to migrate your addressbooks and calendars as described in the OC 9.0 Release Notes
  2. When migrating from OC 9.1 to Nextcloud 10, the OC 9.1 is higher than expected by the Mextcloud upgrade script, so it warns about that you can't downgrade your installation. The fix was simply to change the OC version in the config.php
  3. The Documents App of OC 7 is no longer available in Nextcloud 11 and is replaced by Collabora App, which is way more complex to setup

The installation and setup of the Docker image for collabora/code was the main issue, because I wanted to be able to edit documents in my cloud. For some reason Nextcloud couldn't connect to my docker installation. After some web searches I found "Can't connect to Collabora Online" which led me to the next entry in the Nextcloud support forum. But in the end it was this posting that finally made it work for me. So, in short I needed to add...

DOCKER_OPTS="--storage-driver=devicemapper"

to /etc/default/docker.

So, in the end everything worked out well and my cloud instance is secure again. :-)

UPDATE 2016-02-18 10:52:
Sadly with that working Collabora Online container from Docker I now face this issue of zombie processes for loolforkit inside of that container.

Kategorie: DebianTags: DebianSoftwareCloudServer 
Categories: FLOSS Project Planets

Caktus Consulting Group: Caktus Attends Wagtail CMS Sprint in Reykjavik

Planet Python - Fri, 2017-02-17 18:15

Caktus CEO Tobias McNulty and Sales Engineer David Ray recently had the opportunity to attend a development sprint for the Wagtail Content Management System (CMS) in Reykjavik, Iceland. The two-day software development sprint attracted 15 attendees hailing from a total of 5 countries across North America and Europe.

Wagtail was originally built for the Royal College of Art by UK firm Torchbox and is now one of the fastest-growing open source CMSs available. Being longtime champions of the Django framework, we’re also thrilled that Wagtail is Django-based. This makes Wagtail a natural fit for content-heavy sites that might still benefit from the customization made possible through the CMS’ Django roots.

The team worked on a wide variety of projects, including caching optimizations, an improved content model, a new React-based page explorer, the integration of a new rich-text editor (Draft.js), performance enhancements, other new features, and bug fixes.

Team Wagtail Bakery stole the show with a brand-new demo site that’s visually appealing and better demonstrates the level of customization afforded by the Wagtail CMS. The new demo site, which is still in development as of the time of this post, can be found at wagtail/bakerydemo on GitHub.

After the sprint was over, our hosts at Overcast Software were kind enough to take us on a personalized tour of the countryside around Reykjavik. We left Iceland with significant progress on a number of pull requests on Wagtail, new friends, and a new appreciation for the country's magical landscapes.

We were thrilled to attend and are delighted to be a part of the growing Wagtail community. If you're interested in participating in the next Wagtail sprint, it is not far away. Wagtail Space is taking place in Arnhem, The Netherlands March 21st-25th and is being organized to accommodate both local and remote sprinters. We hope to connect with you then!

Categories: FLOSS Project Planets
Syndicate content