Chris Lamb: Free software activities in November 2018

Planet Debian - Fri, 2018-11-30 15:23

Here is my monthly update covering what I have been doing in the free software world during November 2018 (previous month):

Reproducible builds

Whilst anyone can inspect the source code of free software for malicious flaws almost all software is distributed pre-compiled to end users.

The motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised.

This month I:

Debian Debian LTS

This month I have worked 18 hours on Debian Long Term Support (LTS) and 12 hours on its sister Extended LTS project.

  • Investigated and triaged golang-go.net-dev, libsdl2-image, lighttpd, nginx, pdns, poppler, rustc & xml-security-c amongst many others.

  • "Frontdesk" duties, responding to user queries, etc.

  • Issued DLA 1572-1 for nginx to fix a denial of service (DoS) vulnerability — as there was no validation for the size of a 64-bit atom in an .mp4 file this led to CPU exhaustion when the size was zero.

  • Issued DLA 1576-1 correcting a SSH passphrase disclosure in ansible's User module leaking data in the global process list.

  • Issued DLA 1584-1 for ruby-i18n to fix a remote denial-of-service vulnerability.

  • Issued DLA 1585-1 to prevent an XSS vulnerability in ruby-rack where a malicious request could forge the HTTP scheme being returned to the underlying application.

  • Issued DLA 1591-1 to fix two vulnerabilities in libphp-phpmailer where a arbitrary local files could be disclosed via relative path HTML transformations as well as an object injection attack.

  • Uploaded libsdl2-image (2.0.3+dfsg1-3) and sdl-image1.2 (1.2.12-10) to the unstable distribution to fix buffer overflows on a corrupt or maliciously-crafted XCF files. (#912617 & #912618)

  • Uploaded ruby-i18n (0.7.0-3) to unstable [...] and prepared a stable proposed update for a potential 0.7.0-2+deb9u1 in stretch (#914187).

  • Uploaded ruby-rack (1.6.4-6) to unstable [...] and (2.0.5-2) to experimental [...]. I also prepared a proposed update for a 1.6.4-4+deb9u1 in the stable distribution (#914184).

  • python-django (2:2.1.3-1) — New upstream bugfix release.

  • redis:

    • 5.0.1-1 — New upstream release, ensure that Debian-supplied Lua libraries are available during scripting. (#913185), refer to /run directly in .service files, etc.
    • 5.0.1-2 — Ensure that lack of IPv6 support does not prevent startup Debian where we bind to the ::1 interface by default. (#900284 & #914354)
    • 5.0.2-1 — New upstream release.
  • redisearch (1.2.1-1) — Upload the last AGPLv3 (ie. non-Commons Clause)) package from my GoodFORM project.

  • hiredis (0.14.0-3) — Adopt and tidy package (#911732).

  • python-redis (3.0.1-1) — New upstream release.

  • adminer (4.7.0-1) — New upstream release & ensure all documentation is under /usr/share/doc.

I also sponsored uploads of elpy (1.26.0-1) & muttrc-mode-el (1.2+git20180915.aa1601a-1).

Debian bugs filed
  • molly-guard: Breaks conversion with usrmerge. (#914716)

  • git-buildpackage: Please add gbp-dch --stable flag. (#914186)

  • git-buildpackage: gbp pq -Pq suffixes are not actually optional. (#914281)

  • python-redis: Autopkgtests fail. (#914800)

  • git-buildpackage: Correct "saving" typo. (#914280)

  • python-astropy: Please drop unnecessary dh_strip_nondeterminism override. (#914612)

  • shared-mime-info: Don't assume every *.key file is an Apple Keynote file. (#913550, with patch)

FTP Team

As a Debian FTP assistant this month I ACCEPTed 37 packages: android-platform-system-core, arm-trusted-firmware, boost-defaults, dtl, elogind, fonts-ibm-plex, gnome-remote-desktop, gnome-shell-extension-desktop-icons, google-i18n-address, haskell-haskell-gi-base, haskell-rio, lepton-eda, libatteanx-serializer-rdfa-perl, librdf-trine-serializer-rdfa-perl, librdf-trinex-compatibility-attean-perl, libre-engine-re2-perl, libtest-regexp-pattern-perl, linux, lua-lxc, lxc-templates, ndctl, openssh, osmo-bsc, osmo-sgsn, othman, pg-rational, qtdatavis3d-everywhere-src, ruby-grape-path-helpers, ruby-grape-route-helpers, ruby-graphiql-rails, ruby-js-regex, ruby-regexp-parser, shellia, simple-revision-control, theme-d, ulfius & vim-julia.

Categories: FLOSS Project Planets

Gregor Herrmann: RC bugs 2018/01-48

Planet Debian - Fri, 2018-11-30 14:12

I just arrived at the Bug Squashing Party in bern. – a good opportunity to report the RC bugs I've touched so far this year (not that many …):

  • #750732 – src:libanyevent-perl: "libanyevent-perl: Intermittent build failures on various architectures"
    disable a test (pkg-perl)
  • #862678 – src:pidgin: "Switch from network-manager-dev to libnm-dev"
    propose patch, later uploaded by maintainer
  • #878550 – src:liblog-dispatch-filerotate-perl: "liblog-dispatch-filerotate-perl: missing (build) dependency on libparams-validate-perl"
    add missing (build) dependency, upload to DELAYED/5
  • #882618 – libdbix-class-schema-loader-perl: "libdbix-class-schema-loader-perl: Test failures"
    apply patch from ntyni (pkg-perl)
  • #884626 – src:liblinux-dvb-perl: "liblinux-dvb-perl FTBFS with linux-libc-dev 4.14.2-1"
    upload with fix from knowledgejunkie (pkg-perl)
  • #886044 – src:syncmaildir: "syncmaildir: Depends on gconf"
    propose a patch
  • #886355 – src:libpar-packer-perl: "libpar-packer-perl: frequent parallel FTBFS"
    disable parallel building (pkg-perl)
  • #890905 – src:jabref: "jabref: doesn't build/run with default-jdk/-jre"
    try to come up with a patch (pkg-java)
  • #892275 – redshift: "redshift: Unable to connect to GeoClue."
    investigate and downgrade
  • #892392 – src:aqemu: "aqemu: build-depends on GCC 6"
    propose a patch
  • #893251 – jabref: "jabref: doesn't start with liblog4j2-java 2.10.0-1"
    use versioned (build) dependency (pkg-java)
  • #894626 – libsnmp-perl: "libsnmp-perl: undefined symbol: netsnmp_ds_toggle_boolean"
    propose a patch
  • #894727 – libgit-repository-perl: "libgit-repository-perl: FTBFS: t/10-new_fail.t broke with new git"
    add patch from upstream pull request (pkg-perl)
  • #895697 – src:libconfig-model-tester-perl: "libconfig-model-tester-perl FTBFS: Can't locate Module/Build.pm in @INC"
    add missing build dependency (pkg-perl)
  • #896502 – libxml-structured-perl: "libxml-structured-perl: missing dependency on libxml-parser-perl"
    add missing (build) dependency (pkg-perl)
  • #896534 – libnetapp-perl: "libnetapp-perl: missing dependency on libnet-telnet-perl"
    add missing dependency (pkg-perl)
  • #896537 – libmoosex-mungehas-perl: "libmoosex-mungehas-perl: missing dependency on libtype-tiny-perl | libeval-closure-perl"
    add missing dependency (pkg-perl)
  • #896538 – libmonitoring-livestatus-class-perl: "libmonitoring-livestatus-class-perl: missing dependency on libmodule-find-perl"
    add missing dependency, upload to DELAYED/5
  • #896539 – libmodule-install-trustmetayml-perl: "libmodule-install-trustmetayml-perl: missing dependency on libmodule-install-perl"
    add missing (build) dependency (pkg-perl)
  • #896540 – libmodule-install-extratests-perl: "libmodule-install-extratests-perl: missing dependency on libmodule-install-perl"
    add missing (build) dependency (pkg-perl)
  • #896541 – libmodule-install-automanifest-perl: "libmodule-install-automanifest-perl: missing dependency on libmodule-install-perl"
    add missing (build) dependency (pkg-perl)
  • #896543 – liblwp-authen-negotiate-perl: "liblwp-authen-negotiate-perl: missing dependency on libwww-perl"
    add missing dependency, upload to DELAYED/5
  • #896549 – libhtml-popuptreeselect-perl: "libhtml-popuptreeselect-perl: missing dependency on libhtml-template-perl"
    add missing dependency, upload to DELAYED/5
  • #896551 – libgstreamer1-perl: "libgstreamer1-perl: Typelib file for namespace 'Gst', version '1.0' not found"
    add missing (build) dependencies (pkg-perl)
  • #897724 – src:collectd: "collectd: ftbfs with GCC-8"
    pass a compiler flag, upload to DELAYED/5
  • #898198 – src:libnet-oauth-perl: "FTBFS (test failures, also seen in autopkgtests) with libcrypt-openssl-rsa-perl >= 0.30-1"
    add patch (pkg-perl)
  • #898561 – src:libmarc-transform-perl: "libmarc-transform-perl: FTBFS with libyaml-perl >= 1.25-1 (test failures)"
    apply patch provided by YAML upstream (pkg-perl)
  • #898977 – libnet-dns-zonefile-fast-perl: "libnet-dns-zonefile-fast-perl: FTBFS: You are missing required modules for NSEC3 support"
    add missing (build) dependency (pkg-perl)
  • #900232 – src:collectd: "collectd: FTBFS: sed: can't read /usr/lib/pkgconfig/OpenIPMIpthread.pc: No such file or directory"
    propose a patch, later upload to DELAYED/2
  • #901087 – src:libcatalyst-plugin-session-store-dbi-perl: "libcatalyst-plugin-session-store-dbi-perl: FTBFS: Base class package "Class::Data::Inheritable" is empty."
    add missing (build) dependency (pkg-perl)
  • #901807 – src:libmath-gsl-perl: "libmath-gsl-perl: incompatible with GSL >= 2.5"
    apply patches from ntyni and tweak build (pkg-perl)
  • #902192 – src:libpdl-ccs-perl: "libpdl-ccs-perl FTBFS on architectures where char is unsigned"
    new upstream release (pkg-perl)
  • #902625 – libmath-gsl-perl: "libmath-gsl-perl: needs a versioned dependency on libgsl23 (>= 2.5) or so"
    make build dependency versioned (pkg-perl)
  • #903173 – src:get-flash-videos: "get-flash-videos: FTBFS in buster/sid (dh_installdocs: Cannot find "README")"
    fix name in .docs (pkg-perl)
  • #903178 – src:libclass-insideout-perl: "libclass-insideout-perl: FTBFS in buster/sid (dh_installdocs: Cannot find "CONTRIBUTING")"
    fix name in .docs (pkg-perl)
  • #903456 – libbio-tools-phylo-paml-perl: "libbio-tools-phylo-paml-perl: fails to upgrade from 'stable' to 'sid' - trying to overwrite /usr/share/man/man3/Bio::Tools::Phylo::PAML.3pm.gz"
    upload package fixed by carandraug (pkg-perl)
  • #904737 – src:uwsgi: "uwsgi: FTBFS: unable to build gccgo plugin"
    update build dependencies, upload to DELAYED/5
  • #904740 – src:libtext-bidi-perl: "libtext-bidi-perl: FTBFS: 'fribidi_uint32' undeclared"
    apply patch from CPAN RT (pkg-perl)
  • #904858 – src:libtickit-widget-tabbed-perl: "libtickit-widget-tabbed-perl: Incomplete debian/copyright?"
    fix d/copyright (pkg-perl)
  • #905614 – src:license-reconcile: "FTBFS: Failed test 'no warnings' with libsoftware-license-perl 0.103013-2"
    apply patch from Felix Lechner (pkg-perl)
  • #906482 – src:libgit-raw-perl: "libgit-raw-perl: FTBFS in buster/sid (failing tests)"
    patch test (pkg-perl)
  • #908323 – src:libgtk3-perl: "libgtk3-perl: FTBFS: t/overrides.t failure"
    add patch and versioned (build) dependency (pkg-perl)
  • #909343 – src:libcatalyst-perl: "libcatalyst-perl: fails to build with libmoosex-getopt-perl 0.73-1"
    upload new upstream release (pkg-perl)
  • #910943 – libhtml-tidy-perl: "libhtml-tidy-perl: FTBFS (test failures) with tidy-html5 5.7"
    add patch (pkg-perl)
  • #912039 – src:libpetal-utils-perl: "libpetail-utils-perl: FTBFS: Test failures"
    add missing build dependency (pkg-perl)
  • #912045 – src:mb2md: "mb2md: FTBFS: Test failures"
    add missing build dependency (pkg-perl)
  • #914288 – src:libpgplot-perl: "libpgplot-perl: FTBFS and autopkgtest fail with new giza-dev: test waits for input"
    disable interactive tests (pkg-perl)
  • #915096 – src:libperl-apireference-perl: "libperl-apireference-perl: Missing support for perl 5.28.1"
    add support for perl 5.28.1 (pkg-perl)

let's see how the weekend goes.

Categories: FLOSS Project Planets

Dries Buytaert: How NBC Sports supports the biggest media events online

Planet Drupal - Fri, 2018-11-30 10:08

Many of Acquia's customers have hundreds or even thousands of sites, which vary in terms of scale, functionality, longevity and complexity.

One thing that is very unique about Acquia is that we can help organizations scale from small to extremely large, one to many, and coupled to decoupled. This scalability and flexibility is quite unique, and allows organizations to standardize on a single web platform. Standardizing on a single web platform not only removes the complexity from having to manage dozens of different technology stacks and teams, but also enables organizations to innovate faster.

A great example is NBC Sports Digital. Not only does NBC Sports Digital have to manage dozens of sites across 30,000 sporting events each year, but it also has some of the most trafficked sites in the world.

In 2018, Acquia supported NBC Sports Digital as it provided fans with unparalleled coverage of Super Bowl LII, the Pyeongchang Winter Games and the 2018 World Cup. As quoted in NBC Sport's press release, NBC Sports Digital streamed more than 4.37 billion live minutes of video, served 93 million unique users, and delivered 721 million minutes of desktop video streamed. These are some of the highest trafficked events in the history of the web, and I'm very proud that they are powered by Drupal and Acquia.

To learn more about how Acquia helps NBC Sports Digital deliver more than 10,000 sporting events every year, watch my conversation with Eric Black, CTO of NBC Sports Digital, in the video below:

Not every organization gets to entertain 100 million viewers around the world, but every business has its own World Cup. Whether it's Black Friday, Mother's Day, a new product launch or breaking news, we offer our customers the tools and services necessary to optimize efficiency and provide flexibility at any scale.

Categories: FLOSS Project Planets

Stack Abuse: Handling Unix Signals in Python

Planet Python - Fri, 2018-11-30 09:53

UNIX/Linux systems offer special mechanisms to communicate between each individual process. One of these mechanisms are signals, and belong to the different methods of communication between processes (Inter Process Communication, abbreviated with IPC).

In short, signals are software interrupts that are sent to the program (or the process) to notify the program of significant events or requests to the program in order to run a special code sequence. A program that receives a signal either stops or continues the execution of its instructions, terminates either with or without a memory dump, or even simply ignores the signal.

Although it is defined in the POSIX standard, the reaction actually depends on how the developer wrote the script and implemented the handling of signals.

In this article we explain what are signals, show you how to sent a signal to another process from the command line as well as processing the received signal. Among other modules, the program code is mainly based on the signal module. This module connects the according C headers of your operating system with the Python world.

An Introduction to Signals

On UNIX-based systems, there are three categories of signals:

  • System signals (hardware and system errors): SIGILL, SIGTRAP, SIGBUS, SIGFPE, SIGKILL, SIGSEGV, SIGXCPU, SIGXFSZ, SIGIO


  • User-defined signals: SIGQUIT, SIGABRT, SIGUSR1, SIGUSR2, SIGTERM

Each signal is represented by an integer value, and the list of signals that are available is comparably long and not consistent between the different UNIX/Linux variants. On a Debian GNU/Linux system, the command kill -l displays the list of signals as follows:


The signals 1 to 15 are roughly standardized, and have the following meaning on most of the Linux systems:

  • 1 (SIGHUP): terminate a connection, or reload the configuration for daemons
  • 2 (SIGINT): interrupt the session from the dialogue station
  • 3 (SIGQUIT): terminate the session from the dialogue station
  • 4 (SIGILL): illegal instruction was executed
  • 5 (SIGTRAP): do a single instruction (trap)
  • 6 (SIGABRT): abnormal termination
  • 7 (SIGBUS): error on the system bus
  • 8 (SIGFPE): floating point error
  • 9 (SIGKILL): immmediately terminate the process
  • 10 (SIGUSR1): user-defined signal
  • 11 (SIGSEGV): segmentation fault due to illegal access of a memory segment
  • 12 (SIGUSR2): user-defined signal
  • 13 (SIGPIPE): writing into a pipe, and nobody is reading from it
  • 14 (SIGALRM): the timer terminated (alarm)
  • 15 (SIGTERM): terminate the process in a soft way

In order to send a signal to a process in a Linux terminal you invoke the kill command with both the signal number (or signal name) from the list above and the id of the process (pid). The following example command sends the signal 15 (SIGTERM) to the process that has the pid 12345:

$ kill -15 12345

An equivalent way is to use the signal name instead of its number:

$ kill -SIGTERM 12345

Which way you choose depends on what is more convenient for you. Both ways have the same effect. As a result the process receives the signal SIGTERM, and terminates immediately.

Using the Python signal Library

Since Python 1.4, the signal library is a regular component of every Python release. In order to use the signal library, import the library into your Python program as follows, first:

import signal

Capturing and reacting properly on a received signal is done by a callback function - a so-called signal handler. A rather simple signal handler named receiveSignal() can be written as follows:

def receiveSignal(signalNumber, frame): print('Received:', signalNumber) return

This signal handler does nothing else than reporting the number of the received signal. The next step is registering the signals that are caught by the signal handler. For Python programs, all the signals (but 9, SIGKILL) can be caught in your script:

if __name__ == '__main__': # register the signals to be caught signal.signal(signal.SIGHUP, receiveSignal) signal.signal(signal.SIGINT, receiveSignal) signal.signal(signal.SIGQUIT, receiveSignal) signal.signal(signal.SIGILL, receiveSignal) signal.signal(signal.SIGTRAP, receiveSignal) signal.signal(signal.SIGABRT, receiveSignal) signal.signal(signal.SIGBUS, receiveSignal) signal.signal(signal.SIGFPE, receiveSignal) #signal.signal(signal.SIGKILL, receiveSignal) signal.signal(signal.SIGUSR1, receiveSignal) signal.signal(signal.SIGSEGV, receiveSignal) signal.signal(signal.SIGUSR2, receiveSignal) signal.signal(signal.SIGPIPE, receiveSignal) signal.signal(signal.SIGALRM, receiveSignal) signal.signal(signal.SIGTERM, receiveSignal)

Next, we add the process information for the current process, and detect the process id using the methode getpid() from the os module. In an endless while loop we wait for incoming signals. We implement this using two more Python modules - os and time. We import them at the beginning of our Python script, too:

import os import time

In the while loop of our main program the print statement outputs "Waiting...". The time.sleep() function call makes the program wait for three seconds.

# output current process id print('My PID is:', os.getpid()) # wait in an endless loop for signals while True: print('Waiting...') time.sleep(3)

Finally, we have to test our script. Having saved the script as signal-handling.py we can invoke it in a terminal as follows:

$ python3 signal-handling.py My PID is: 5746 Waiting... ...

In a second terminal window we send a signal to the process. We identify our first process - the Python script - by the process id as printed on screen, above.

$ kill -1 5746

The signal event handler in our Python program receives the signal we have sent to the process. It reacts accordingly, and simply confirms the received signal:

... Received: 1 ... Ignoring Signals

The signal module defines ways to ignore received signals. In order to do that the signal has to be connected with the predefined function signal.SIG_IGN. The example below demonstrates that, and as a result the Python program cannot be interrupted by CTRL+C anymore. To stop the Python script an alternative way has been implemented in the example script - the signal SIGUSR1 terminates the Python script. Furthermore, instead of an endless loop we use the method signal.pause(). It just waits for a signal to be received.

import signal import os import time def receiveSignal(signalNumber, frame): print('Received:', signalNumber) raise SystemExit('Exiting') return if __name__ == '__main__': # register the signal to be caught signal.signal(signal.SIGUSR1, receiveSignal) # register the signal to be ignored signal.signal(signal.SIGINT, signal.SIG_IGN) # output current process id print('My PID is:', os.getpid()) signal.pause() Handling Signals Properly

The signal handler we have used up to now is rather simple, and just reports a received signal. This shows us that the interface of our Python script is working fine. Let's improve it.

Catching the signal is already a good basis but requires some improvement to comply with the rules of the POSIX standard. For a higher accuracy each signal needs a proper reaction (see list above). This means that the signal handler in our Python script needs to be extended by a specific routine per signal. This works best if we understand what a signal does, and what a common reaction is. A process that receives signal 1, 2, 9 or 15 terminates. In any other case it is expected to write a core dump, too.

Up to now we have implemented a single routine that covers all the signals, and handles them in the same way. The next step is to implement an individual routine per signal. The following example code demonstrates this for the signals 1 (SIGHUP) and 15 (SIGTERM).

def readConfiguration(signalNumber, frame): print ('(SIGHUP) reading configuration') return def terminateProcess(signalNumber, frame): print ('(SIGTERM) terminating the process') sys.exit()

The two functions above are connected with the signals as follows:

signal.signal(signal.SIGHUP, readConfiguration) signal.signal(signal.SIGTERM, terminateProcess)

Running the Python script, and sending the signal 1 (SIGHUP) followed by a signal 15 (SIGTERM) by the UNIX commands kill -1 16640 and kill -15 16640 results in the following output:

$ python3 daemon.py My PID is: 16640 Waiting... Waiting... (SIGHUP) reading configuration Waiting... Waiting... (SIGTERM) terminating the process

The script receives the signals, and handles them properly. For clarity, this is the entire script:

import signal import os import time import sys def readConfiguration(signalNumber, frame): print ('(SIGHUP) reading configuration') return def terminateProcess(signalNumber, frame): print ('(SIGTERM) terminating the process') sys.exit() def receiveSignal(signalNumber, frame): print('Received:', signalNumber) return if __name__ == '__main__': # register the signals to be caught signal.signal(signal.SIGHUP, readConfiguration) signal.signal(signal.SIGINT, receiveSignal) signal.signal(signal.SIGQUIT, receiveSignal) signal.signal(signal.SIGILL, receiveSignal) signal.signal(signal.SIGTRAP, receiveSignal) signal.signal(signal.SIGABRT, receiveSignal) signal.signal(signal.SIGBUS, receiveSignal) signal.signal(signal.SIGFPE, receiveSignal) #signal.signal(signal.SIGKILL, receiveSignal) signal.signal(signal.SIGUSR1, receiveSignal) signal.signal(signal.SIGSEGV, receiveSignal) signal.signal(signal.SIGUSR2, receiveSignal) signal.signal(signal.SIGPIPE, receiveSignal) signal.signal(signal.SIGALRM, receiveSignal) signal.signal(signal.SIGTERM, terminateProcess) # output current process id print('My PID is:', os.getpid()) # wait in an endless loop for signals while True: print('Waiting...') time.sleep(3) Further Reading

Using the signal module and an according event handler it is relatively easy to catch signals. Knowing the meaning of the different signals, and to react properly as defined in the POSIX standard is the next step. It requires that the event handler distinguishes between the different signals, and has a separate routine for all of them.

Categories: FLOSS Project Planets

Michal Čihař: Weblate 3.3

Planet Debian - Fri, 2018-11-30 09:00

Weblate 3.3 has been released today. The most visible new feature are component alerts, but there are several other improvements as well.

Full list of changes:

  • Added support for component and project removal.
  • Improved performance for some monolingual translations.
  • Added translation component alerts to highlight problems with a translation.
  • Expose XLIFF unit resname as context when available.
  • Added support for XLIFF states.
  • Added check for non writable files in DATA_DIR.
  • Improved CSV export for changes.

If you are upgrading from older version, please follow our upgrading instructions.

You can find more information about Weblate on https://weblate.org, the code is hosted on Github. If you are curious how it looks, you can try it out on demo server. Weblate is also being used on https://hosted.weblate.org/ as official translating service for phpMyAdmin, OsmAnd, Turris, FreedomBox, Weblate itself and many other projects.

Should you be looking for hosting of translations for your project, I'm happy to host them for you or help with setting it up on your infrastructure.

Further development of Weblate would not be possible without people providing donations, thanks to everybody who have helped so far! The roadmap for next release is just being prepared, you can influence this by expressing support for individual issues either by comments or by providing bounty for them.

Filed under: Debian English SUSE Weblate

Categories: FLOSS Project Planets

Shannon -jj Behrens: JJ's Mostly Adequate Summary of the Django Meetup: When *Not* To Use the ORM & Goodbye REST: Building GraphQL APIs with Django

Planet Python - Fri, 2018-11-30 08:48

The Django meetup was at Prezi. They have a great space. They are big Django users.

Goodbye REST: Building APIs with Django and GraphQL

Jaden Windle, @jaydenwindle, lead engineer at Jetpack.


They moved from Django REST Framework to GraphQL.

It sounds like a small app.

They're using Django, React, and React Native.

I think he said they used Reason and moved away from it, but I could be wrong.

They had to do a rebuild anyway, so they used GraphQL.

He said not to switch to GraphQL just because it's cool, and in general, avoid hype-driven development.

GraphQL is a query language, spec, and collection of tools, designed to operate over a single endpoint via HTTP, optimzing for perf and flexibility.

Key features:

  • Query for only the data you need.
  • Easily query for multiple resources in a single request.
  • Great front end tooling for handling caching, loading / error states, and updates.

(I wonder if he's talking about Apollo, not just GraphQL. I found out later, that they are using Apollo.)

GraphQL Schema:

  • Types
  • Queries
  • Mutations
  • *Subscriptions

Graphene is the goto framework for GraphQL in Django.

He showed an example app.

You create a type and connect it to a model. It's like a serializer in DRF.

(He's using VS Code.)

It knows how to understand relationships between types based on the relationships in the models.

He showed the query syntax.

He showed how Graphene connects to the Django model. You're returning raw Django model objects, and it takes care of serialization.

There's a really nice UI where you can type in a query, and it queries the server. It has autocomplete. I can't tell if this is from Apollo, Graphene, or some other GraphQL tool.

You only pass what you need across the wire.

When you do a mutation, you can also specify what data it should give back to you.

There is some Mutation class that you subclass.

The code looks a lot like DRF.

Subscriptions aren't fully implemented in Graphene. His company is working on implementing and open sourcing something. There are a bunch of other possible real-time options--http://graphql-python/graphql-ws is one.

There's a way to do file uploads.

Important: There's this thing called graphene-django-extras. There's even something to connect to DRF automatically.


  • Dramatically improves front end DX
  • Flexible types allow for quick iteration
  • Always up-to-date documentation
  • Only send needed data over the wire


  • Graphene isn't as mature as some other GraphQL implementations (for instance, in JS and Ruby)
  • Logging is different when using a single GraphQL endpoint
  • REST is currently better at server-side caching (E-Tag, etc.)

Graphene 3 is coming.

In the Q&A, they said they do use Apollo.

They're not yet at a scale where they have to worry about performance.

He's not entirely sure whether it's prone to the N+1 queries problem, but there are GitHub issues related to that.

You can do raw ORM or SQL queries if you need to. Otherwise, he's not sure what it's doing behind the scenes.

You can add permissions to the models. There's also a way to tie into Django's auth model.

Their API isn't a public API. It's only for use by their own client.

The ORM, and When Not To Use It

Christophe Pettus from PostgreSQL Experts.

He thinks the ORM is great.

The first ORM he wrote was written before a lot of the audience was born.

Sometimes, writing code for the ORM is hard.

Database agnosticism isn't as important as you think it is. For instance, you don't make the paint on your house color-agnostic.

Libraries have to be DB agnostic. Your app probably doesn't need to be.

Reasons you might want to avoid the query language:

  • Queries that generate sub-optimal SQL
  • Queries more easily expressed in SQL
  • SQL features not available via the ORM

Django's ORM's SQL is much better than it used to be.

Don't use __in with very large lists. 100 is about the longest list you should use.

Avoid very deep joins.

It's not hard to chain together a bunch of stuff that ends up generating SQL that's horrible.

The query generator doesn't do a separate optimization pass that makes the query better.

It's better to express .filter().exclude().filter() in SQL.

There are so many powerful functions and operators in PostgreSQL!

SomeModel.objects.raw() is your friend!

You can write raw SQL, and yet still have it integrate with Django models.

You can even get stuff back from the database that isn't in the model definition.

There's some WITH RECURSIVE thing in PostgreSQL that would be prohibitively hard to do with the Django ORM. It's not really recursive--it's iterative.

You can also do queries without using the model framework.

The model framework is very powerful, but it's not cheap.

Interesting: The data has to be converted into Python data and then again into model data. If you're just going to serialize it into JSON, why create the model objects? You can even create the JSON directly in the database and hand it back directly with PostgreSQL. But make sure the database driver doesn't convert the JSON back to Python ;) Return it as raw text.

There are also tables that Django can't treat as model tables. For instance, there are logging tables that lack a primary key. Sometimes, you have weird tables with non-Django-able primary keys.

The ORM is great, though. For instance, it's great for basic CRUD.

Interfaces that require building queries in steps are better done with SQL--for instance, an advanced search function.


  • Don't be afraid to step outside the ORM.
  • SQL isn't a bug. It's a feature. It's code like everything else.
  • Do use the ORM for operations that it makes easier.
  • Don't hesitate to use the full power of SQL.


Whether you put your SQL in model methods or managers is a matter of taste. Having all the code for a particular model in one place (i.e. the model or manager) is useful.

Write tests. Use CI.

Use parameter substitution in order to avoid SQL injection attacks. Remember, all external input is hostile. You can't use parameter substitution if the table name is dynamic--just be careful with what data you allow.

If you're using PostGIS, write your own queries for sure. It's really easy to shoot yourself in the foot with the GIS package.



Categories: FLOSS Project Planets

Codementor: Subtleties of Python

Planet Python - Fri, 2018-11-30 06:13
A good software engineer understands how crucial attention to detail is; minute details, if overlooked, can make a world of difference between a working unit and a disaster. That’s why writing...
Categories: FLOSS Project Planets

Shannon -jj Behrens: PyCon Notes: PostgreSQL Proficiency for Python People

Planet Python - Fri, 2018-11-30 06:12
In summary, this tutorial was fantastic! I learned more in three hours than I would have learned if I had read a whole book!

Here's the video. Here are the slides. Here are my notes:

Christophe Pettus was the speaker. He's from PostgreSQL Experts.

PostgreSQL is a rich environment.

It's fully ACID compliant.

It has the richest set of features of any modern, production RDMS. It has even more features than

PostgreSQL focuses on quality, security, and spec compliance.

It's capable of very high performance: tens of thousands of transactions per second, petabyte-sized data sets, etc.

To install it, just use your package management system (apt, yum, etc.). Those systems will usually take care of initialization.

There are many options for OS X. Heroku even built a Postgres.app that runs more like a foreground app.

A "cluster" is a single PostgreSQL server (which can manage multiple databases).

initdb creates the basic file structure. PostgreSQL has to be up and running to run initdb.

To create a database:

sudo su - postgres

create database this_new_database;

To drop a database:

drop database this_new_database;

Debian runs initdb for you. Red Hat does not.

Debian has a cluster management system. Use it. See, for instance, pg_createcluster.

Always create databases as UTF-8. Once you've created it, you can't change it.

Don't use SQLASCII. It's a nightmare. Don't use "C locale".

pg_ctl is a built-in command to start and stop PostgreSQL:

pg_ctl -D . start

Usually, pg_ctl is wrapped by something provided by your platform.

On Ubuntu, start PostgreSQL via:

service postgresql start

Always use "-m fast" when stopping.

Postgres puts its own data in a top-level directory. Let's call it $PGDATA.

Don't monkey around with that data.

pg_clog and pg_xlog are important. Don't mess with them.

On most systems, configuration lives in $PGDATA.

postgresql.conf contains server configuration.

pg_hba.conf contains authentication settings.

postgresql.conf can feel very overwhelming.

Avoid making a lot of changes to postgresql.conf. Instead, add the following to it:

include "postgresql.conf.include"

Then, mess with "postgresql.conf.include".

The important parameters fall into these categories: logging, memory, checkpoints, and the planner.


Be generous with logging. It has a very low impact on the system. It's your best source of info for diagnosing problems.

You can log to syslog or log CSV to files. He showed his typical logging configuration.

He showed his guidelines / heuristics for all the settings, including how to finetune things. They're really good! See his slides.

As of version 9.3, you don't need to tweak Linux kernel parameters anymore.

Do not mess with fsync or  synchronous_commit.

Most settings require a server reload to take effect. Some things require a server restart. Some can be set on a per-session basis. Here's how to do that. This is also an example of how to use a transaction:

set local random_page_cost = 2.5;
show random_page_cost;

pg_hba.conf contains users and roles. Roles are like groups. They form a hierarchy.

A user is just a role with login privs.

Don't use the "postgres" superuser for anything application-related.

Sadly, you probably will have to grant schema-modification privs to your app user if you use migrations, but if you don't have to, don't.

By default, DB traffic is not encrypted. Turn on SSL if you are running in a cloud provider.

In pg_hba.conf, "trust" means if they can log into the server, they can access Postgres too. "peer" means they can have a Postgres user that matches their username. "md5" is an md5 hash password.

It's a good idea to restrict the IP addresses allowed to talk to the server fairly tightly.
The WALThe Write-Ahead Log is key to many Postgres operations. It's the basis for replication, crash recovery, etc.

When each transaction is committed, it is logged to the write-ahead log.

Changes in the transaction are flushed to disk.

If the system crashes, the WAL is "replayed" to bring the DB to a consistent state.

It's a continuous record of changes since the last checkpoint.

The WAL is stored in 16MB segments in the pg_xlog directory.

Never delete anything from pg_xlog.

archive_command is a way to move the WAL segments to someplace safe (like a
different system).

By default, synchronous_commit is on, which means that commits do not return until the WAL flush is done. If you turn it off, they'll return when the WAL flush is queued. You might lose transactions in the case of a crash, but there's no risk of database corruption.
Backup and RecoveryExperience has shown that 20% of the time, your EBS volumes will not reattach when you reboot in AWS.

pg_dump is a built-in dump/restore tool.

It takes a logical snapshot of the database.

It doesn't lock the database or prevent writes to disk.

pg_restore restores the database. It's not fast.

It's great for simple backups but not suitable for fast recovery from major failures.

pg_bench is the built in benchmarking tool.

pg_dump -Fc --verbose example > example.dump

Without the -Fc, it dumps SQL commands instead of its custom format.

pg_restore --dbname=example_restored --verbose example.dump

pg_restore takes a long time because it has to recreate indexes.

pg_dumpall --globals-only

Back up each database with pg_dump using --format=custom.

To do a parallel restore, use --jobs=.

If you have a large database, pg_dump may not be appropriate.

A disk snapshot + every WAL segment is enough to recreate the database.

To start a PITR (point in time recovery) backup:

select pg_start_backup(...);

Copy the disk image and any WAL files that are created.

select pg_stop_backup();

Make sure you have all the WAL segments.

The disk image + all the WAL segments are enough to create the DB.

See also github.com/wal-e/wal-e. It's highly recommended.

It automates backups to S3.

He explained how to do a PITR.

With PITR, you can rollback to a particular point in time. You don't have to replay everything.

This is super handy for application failures.

RDS is something that scripts all this stuff for you.
ReplicationSend the WAL to another server.

Keep the server up to date with the primary server.

That's how PostgreSQL replication works.

The old way was called "WAL Archiving". Each 16MB segment was sent to the secondary when complete. Use rsync, WAL-E, etc., not scp.

The new way is Streaming Replication.

The secondary gets changes as they happen.

It's all setup via recovery.conf in your $PGDATA.

He showed a recovery.conf for a secondary machine, and showed how to let it become the master.

Always have a disaster recovery strategy.

pg_basebackup is a utility for doing a snapshot of a running server. It's the easiest way to take a snapshot to start a new secondary. It's also useful for archival backups. It's not the fastest thing, but it's pretty foolproof.


The good:

Easy to setup.

Schema changes are replicated.

Secondaries can handle read-only queries for load balancing.

It either works or it complains loudly.

The bad:

You get the entire DB cluster or none of it.

No writes of any kind to the secondary, not even temporary tables.

Some things aren't replicated like temporary tables and unlogged tables.

His advice is to start with WAL-E. The README tells you everything. It fixes a ton of problems.

The biggest problem with WAL-E is that writing to S3 can be slow.

Another way to do funky things is trigger-based replication. There's a bunch of third-party packages to do this.

Bucardo is one that lets you do multi-master setups.

However, they're fiddly and complex to set up. They can also fail quietly.
Transactions, MVCC, and VacuumBEGIN;

By the way, no bank works this way ;)

Everything runs inside of a transaction.

If there is no explicit transaction, each statement is wrapped in one for you.

Everything that modifies the database is transactional, even schema changes.

\d shows you all your tables.

With a transaction, you can even rollback a table drop.

South (the Django migration tool) runs the whole migration in a single transaction.

Many resources are held until the end of a transaction. Keep your transactions brief and to the point.

Beware of "IDLE IN TRANSACTION" sessions. This is a problem for Django apps.

A tuple in Postgres is the same thing as a row.

Postgres uses Multi-Version Concurrency Control. Each transaction sees its own version of the database.

Writers only block writers to the same tuple. Nothing else causes blocking.

Postgres will not allow two snapshots to "fork" the database. If two people try to write to the same tuple, Postgres will block one of them.

There are higher isolation modes. His description of them was really interesting.

He suggested that new apps use SERIALIZABLE. This will help you find the concurrency errors in your app.

Deleted tuples are not usually immediately freed.

Vacuum's primary job is to scavenge tuples that are no longer visible to any transaction.

autovacuum generally handles this problem for you without intervention (since version 8).

Run analyze after a major database change to help the planner out.

If someone tells you "vacuum's not working", they're probably wrong.

The DB generally stabilizes at 20% to 50% bloat. That's acceptable.

The problem might be that there are long-running transactions or idle-in-transaction sessions. They'll block vacuuming. So will manual table locking.

He talked about vacuum issues for rare situations.
Schema DesignNormalization is important, but don't obsess about it.

Pick "entities". Make sure that no entity-level info gets pushed into the subsidiary items.

Pick a naming scheme and stick with it.

Plural or singular? DB people tend to like plural. ORMs tend to like singular.

You probably want lower_case to avoid quoting.

Calculated denormalization can sometimes be useful; copied denormalization is almost never useful.

Joins are good.

PostgreSQL executes joins very efficiently. Don't be afraid of them.

Don't worry about large tables joined with small tables.

Use the typing system. It has a rich set of types.

Use domains to create custom types.

A domain is a core type + a constraint.

Don't use polymorphic fields (fields whose interpretation is dependent on another field).

Don't use strings to store multiple types.

Use constraints. They're cheap and fast.

You can create constraints across multiple columns.

Avoid Entity-Attribute-Value schemas. They cause great pain. They're very inefficient. They make reports very difficult.

Consider using UUIDs instead of serials as synthetic keys.

The problem with serials for keys is that merging tables can be hard.

Don't have "Thing" tables like "Object" tables.

If a table has a few frequently-updated fields and a few slowly-updated fields, consider splitting the table. Split the fast-moving stuff out into a separate 1-to-1 table.

Arrays are a first-class type in PostgreSQL. It's a good substitute for using a subsidiary table.

A list of tags is a good fit for arrays.

He talked about hstore. It's much better than Entity-Attribute-Value. It's great for optional, variable attributes. It's like a hash. It can be indexed, searched, etc. It lets you add attributes to tables for users. Don't use it as a way to avoid all table modifications.

json is now a built in type.

There's also jsonb.

Avoid indexes on big things, like 10k character strings.

NULL it a total pain in the neck.

Only use it to mean "missing value".

Never use it to represent a meaningful value.

Let's call anything 1MB or more a "very large object". Store them in files. Store the metadata in the database. The database API is just not a good fit for this.

Many-to-many tables can get extremely large. Consider replacing them with array fields (either one way or both directions). You can use a trigger to maintain integrity.

You don't want more than about 250k entries in an array.

Use UTF-8. Period.

Always use TIMESTAMPTZ (which Django uses by default). Don't use TIMESTAMP. TIMESTAMPTZ is a timestamp converted to UTC.

Index types:


Use a B-Tree on a column if you frequently query on that column,
use one of the comparison operators, only get back 10-15% of the rows,
and run that query frequently.

It won't use the index if you're going to get back more than 15% of
the rows because it's faster to scan a table then scan an index.

Use a partial index if you can ignore most of the rows.

The entire tuple has to be copied into the index.


It's a framework to create indexes.

KNN indexes are the K-nearest neighbors.


Generalized inverted index. Used for full-text search.

The others either are not good or very specific.

Why isn't it using my index?

Use explain analyze to look at the query.

If it thinks it's going to require most of the rows, it'll do a table scan.

If it's wrong, use analyze to update the planner stats.

Sometimes, it can't use the index.

Two ways to create an index:

create index

create index concurrently

reindex rebuilds an index from scratch.

pg_stat_user_indexes tells you about how your indexes are being used.

What do you do if a query is slow:

Use explain or explain analyze.

explain doesn't actually run the query.

"Cost" is measured in arbitrary units. Traditionally, they have been "disk fetches". Costs are inclusive of subnodes.

I think explain analyze actually runs the query.

Things that are bad:

Joins between 2 large tables.

Cross joins (cartesian products). These often happen by accident.

Sequential scans on large tables.

select count(*) is slow because it results in a full table scan since you
have to see if the tuples are alive or dead.

offset / limit. These actually run the query and then throw away that many
rows. Beware that GoogleBot is relentless. Use other keys.

If the database is slow:

Look at pg_stat_activity:

select * from pg_stat_activity;

tail -f the logs.

Too much I/O? iostat 5.

If the database isn't responding:

Try connecting with it using psql.


Python Particularspsycopg2 is the only real option in Python 2.

The result set of a query is loaded into client memory when the query completes. If there are a ton of rows, you could run out of memory. If you want to scroll through the results, use a "named" cursor. Be sure to dispose of it properly.

The Python 3 situation is not so great. There's py-postgresql. It's pure Python.

If you are using Django 1.6+, use the @atomic decorator.

Cluster all your writes into small transactions. Leave read operations outside.

Do all your writes at the very end of the view function.

Multi-database works very nicely with hot standby.

Point the writes at the primary, and the reads at the secondary.

For Django 1.5, use the @xact decorator.

Sloppy transaction management can cause the dreaded Django idle-in-transaction problem.

Use South for database migration. South is getting merged into Django in version 1.7 of Django.

You can use manual migrations for stuff the Django ORM can't specify.
Special SituationsUpgrade to 9.3.4. Upgrade minor versions promptly.

Major version upgrades require more planning. pg_upgrade has to be run when the database is not running.

A full pg_dump / pg_restore is always the safest, although not the most practical.

Always read the release notes.

All parts of a replication set must be upgraded at once (for major versions).

Use copy, not insert, for bulk loading data. psycopg2 has a nice interface. Do a vacuum afterwards.
AWSInstances can disappear and come back up without instance storage.

EBS can fail to reattach after reboot.

PIOPS are useful (but pricey) if you are using EBS.

Script everything, instance creation, PostgreSQL, etc. Use Salt. Use a VPC.

Scale up and down as required to meet load. If you're just using them to rent a server, it's really expensive.

PostgreSQL RDS is a managed database instance. Big plus: automatic failover! Big minus: you can't read from the secondary. It's expensive. It's a good place to start.
ShardingEventually, you'll run out of write capacity on your master.

postgres-xc is an open source fork of PostgreSQL.

Bucardo provides multi-master write capability.

He talked about custom sharding.

Instagram wrote a nice article about it.
PoolingOpening a connection is expensive. Use a pooler.

pgbouncer is a pooler.

pgPool II can even do query analysis. However, it has higher overhead and is more complex to configure.
ToolsMonitor everything.

check_postgres.pl is a plugin to monitor PostgreSQL.

pgAdmin III and Navicat are nice clients.

pgbadger is for log analysis. So is pg_stat_statements.
ClosingMVCC works by each tuple having a range of transaction IDs that can see that

Failover is annoying to do in the real world. People use HAProxy, some pooler, etc. with some scripting, or they have a human do the failover.

HandyRep is a server-based tool designed to allow you to manage a PostgreSQL "replication cluster", defined as a master and one or more replicas on the same network.
Categories: FLOSS Project Planets

PyBites: 3 Cool Things You Can do With the dateutil Module

Planet Python - Fri, 2018-11-30 06:00

In this short article I will show you how to use dateutil's parse, relativedelta and rrule to make it easier to work with datetimes in Python.

Firt some necessary imports:

>>> from datetime import date >>> from dateutil.parser import parse >>> from dateutil.relativedelta import relativedelta >>> from dateutil.rrule import rrule, WEEKLY, WE 1. Parse a datetime from a string

This is actually what made me look into dateutil to start with. Camaz shared this technique in the forum for Bite 7. Parsing dates from logs

Imagine you have this log line:

>>> log_line = 'INFO 2014-07-03T23:27:51 supybot Shutdown complete.'

Up until recently I used datetime's strptime like so:

>>> date_str = '%Y-%m-%dT%H:%M:%S' >>> datetime.strptime(log_line.split()[1], date_str) datetime.datetime(2014, 7, 3, 23, 27, 51)

More string manipulation and you have to know the format string syntax. dateutil's parse takes this complexity away:

>>> timestamp = parse(log_line, fuzzy=True) >>> print(timestamp) 2014-07-03 23:27:51 >>> print(type(timestamp)) <class 'datetime.datetime'> 2. Get a timedelta in months

A limitation of datetime's timedelta is that it does not show the number of months:

>>> today = date.today() >>> pybites_born = date(year=2016, month=12, day=19) >>> (today-pybites_born).days 711

So far so good. However this does not work:

>>> (today-pybites_born).years Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'datetime.timedelta' object has no attribute 'years'

Nor this:

>>> (today-pybites_born).months Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'datetime.timedelta' object has no attribute 'months'

relativedelta to the rescue:

>>> diff = relativedelta(today, pybites_born) >>> diff.years 1 >>> diff.months 11

When you need months, use relativedelta. And yes, we can almost celebrate two years of PyBites!

Another use case of this we saw in my previous article, How to Test Your Django App with Selenium and pytest, where I used it to get the last 3 months for our new Platform Coding Streak feature:

>>> def _make_3char_monthname(dt): ... return dt.strftime('%b').upper() ... >>> this_month = _make_3char_monthname(today) >>> last_month = _make_3char_monthname(today-relativedelta(months=+1)) >>> two_months_ago = _make_3char_monthname(today-relativedelta(months=+2)) >>> for month in (this_month, last_month, two_months_ago): ... print(f'{month} {today.year}') ... NOV 2018 OCT 2018 SEP 2018

Let's get next Wednesday for the next example:

>>> next_wednesday = today+relativedelta(weekday=WE(+1)) >>> next_wednesday datetime.date(2018, 12, 5) 3. Make a range of dates

Say I want to schedule my next batch of Italian lessons, each Wednesday for the coming 10 weeks. Easy:

>>> rrule(WEEKLY, count=10, dtstart=next_wednesday) <dateutil.rrule.rrule object at 0x1033ef898>

As this will return an iterator and it does not show up vertically, let's materialize it in a list and pass it to pprint:

>>> from pprint import pprint as pp >>> pp(list(rrule(WEEKLY, count=10, dtstart=next_wednesday))) [datetime.datetime(2018, 12, 5, 0, 0), datetime.datetime(2018, 12, 12, 0, 0), datetime.datetime(2018, 12, 19, 0, 0), datetime.datetime(2018, 12, 26, 0, 0), datetime.datetime(2019, 1, 2, 0, 0), datetime.datetime(2019, 1, 9, 0, 0), datetime.datetime(2019, 1, 16, 0, 0), datetime.datetime(2019, 1, 23, 0, 0), datetime.datetime(2019, 1, 30, 0, 0), datetime.datetime(2019, 2, 6, 0, 0)]

Double-check with Unix cal

$ cal 12 2018 December 2018 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ cal 1 2019 January 2019 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ cal 2 2019 February 2019 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 ...

We added an exercise to our platform to create a #100DaysOfCode planning, skipping weekend days. rrule made this relatively easy.

And that's it, my favorite use cases of dateutil so far. There is some timezone functionality in dateutil as well, but I have mostly used pytz for that.

Learn more? Check out this nice dateutil examples page and feel free to share your favorite snippets in the comments below.

Don't forget this is an external library (pip install python-dateutil), for most basic operations datetime would suffice. Another nice stdlib module worth checking out is calendar.

Keep Calm and Code in Python!

-- Bob

Categories: FLOSS Project Planets

Jonathan Dowland: glBSP

Planet Debian - Fri, 2018-11-30 05:59

Continuing a series of blog posts about Debian packages I have adopted (Previously: smartmontools; duc), in January this year I also adopted glBSP.

I was surprised to see glBSP come up for adoption; I found out when I was installing something entirely unrelated, thanks to the how-can-i-help package. (This package is a great idea: it tells you about packages you have installed which are in danger of being removed from Debian, or have other interesting bugs filed against them. Give it a go!) glBSP is a dependency on another of my packages, WadC, so I adopted it fairly urgently.

glBSP is a node-building tool for Doom maps. A Map in Doom is defined in a handful of different lumps of data. The top-level, canonical data structures are relatively simple: THINGS is a list of things (type, coordinates, angle facing); VERTEXES is a list of points for geometry (X/Y coordinates); SECTORS define regions (light level, floor height and texture,…), etc. Map authoring tools can build these lumps of data relatively easily. (I've done it myself: I generate them all in liquorice, that I should write more about one day.)

During gameplay, Doom needs to answer questions such as: the player is at location (X,Y) and has made a noise. Can Monster Z hear that noise? Or: the player is at location (X,Y) at facing Z°, what walls need to be drawn? These decisions needed to be made very quickly on the target hardware of 1993 (a 486 CPU) in order to maintain the desired frame-rate (35fps). To facilitate this, various additional data structures are derived from the canonical lumps. glBSP is one of a class of tools called node builders that calculate these extra lumps. The name "node-builder" comes from one of the lumps (NODES), which encodes a binary-space partition of the map geometry (and that's where "BSP" comes from).

If you would be interested to know more about these algorithms (and they are fascinating, honest!), I recommend picking up Fabien Sanglard's forthcoming book "Game Engine Black Book: DOOM". You can pre-order an ebook from Google Play here. It will be available as a physical book (and ebook) via Amazon on publication date, which will be December 10, marking Doom's 25th anniversary.

The glBSP package could do with some work to bring it up to the modern standards and conventions of Debian packages. I haven't bothered to do that, because I'm planning to replace it with another node-builder. glBSP is effectively abandoned upstream. There are loads of other node builders that could be included: glBSP and Eureka author Andrew Apted started a new one called AJBSP, and my long-time friend Kim Roar Foldøy Hauge has one called zokumbsp. The best candidate as an all-round useful node-builder is probably ZDBSP, which was originally developed as an internal node-builder for the ZDoom engine, and was designed for speed. It also copes well with some torture-test maps, such as WadC's "choz.wl", which brought glBSP to its knees. I've submitted a package of ZDBSP to Debian and I'm waiting to see if it is accepted by the FTP masters. After that, we could consider removing glBSP.

Categories: FLOSS Project Planets

How to use Qt Lite with Yocto

Planet KDE - Fri, 2018-11-30 02:00

During the development cycle of 5.12 we’ve been working on making it easier to use Qt Lite in Yocto builds, especially those using our meta-boot2qt layer.

Qt Configuration Tool now has the ability to export the set of selected Qt Lite features into files that can be used by the Yocto build to build a slimmer Qt.

Qt Configuration Tool is commercial-only and allows you visually explore and toggle different optional parts of Qt after opening an initially configured source tree. It can be used directly to configure that source tree for builds that don’t include features that you don’t need. That configuration can also be exported into a single file that can imported later, which makes sharing the configuration between people easier.

Previously there however wasn’t a clear path to using that in a Yocto-driven build, since meta-qt5 recipes build Qt module by module. New version 1.2 of the tool and the new qt5-features class in meta-boot2qt layer provide that. Both are available in Qt for Device Creation 5.12.

  1. Run ./configure for a qt5 source tree with the necessary options for your
    platform and choosing the commercial license.
  2. Open Qt Configuration Tool and select the build directory from the previous step.
  3. Select/unselect the features you want with the tool.
  4. When the configuration is ready, select “File” > “Export Features for
    Boot2Qt” and pick a folder. A feature file for each module that you’ve made
    changes to is created into that folder.
  5. In your yocto build, for each module that you have a feature file for:
    1. Create a new recipe extension for the module, e.g. “qtdeclarative_git.bbappend”.
    2. Add the line “inherit qt5-features” into that bbappend file.
    3. Put the feature file for the module (e.g. “qtdeclarative.opt”) into a
      features/ directory next to your bbappend file.

Then run your build (bitbake b2qt-embedded-qt5-image if you are basing your build on our example images) and enjoy the slimmer Qt made just for you!

The post How to use Qt Lite with Yocto appeared first on Qt Blog.

Categories: FLOSS Project Planets

OpenSense Labs: Extract the power of Predictive UX with Drupal

Planet Drupal - Fri, 2018-11-30 01:54
Extract the power of Predictive UX with Drupal Shankar Fri, 11/30/2018 - 12:24

Perhaps it is not very surprising that in an age of austerity and a climate of fear about child abuse, new technology is being sought by social workers for help. The Guardian, news media, revealed that local authorities in the UK have been using machine learning and predictive modelling to intervene before children were referred to social services. For instance, local councils are building ‘predictive analytics’ systems for leveraging cornucopia of data on hundreds of people for constructing computer models in an effort to predict child abuse and intervene before it can happen.

Power of predictive analytics can be extracted not only for social issues like child abuse but for a superabundance of areas in different industries. One such area is the web user experience where implementation of predictive analytics can be very essential. Predictive user experience (UX) can help usher in a plenitude of betterment. But how did predictive analytics came into being?

Destination 1689

Contemporary Analysis states that the history of predictive analytics takes us back to 1689. While the rise of predictive analytics has been attributed to technological advances like Hadoop and MapReduce, it has been in use for centuries.

One of the first applications of predictive analytics can be witnessed in the times when shipping and trade were prominent.

One of the first applications of predictive analytics can be witnessed in the times when shipping and trade were prominent. Lloyd’s of London, one of the first insurance and reinsurance markets, was a catalyst for the distribution of important information required for underwriting. And the name underwriting itself took birth from London insurance market. In exchange for a premium, bankers would accept the risk on a given sea voyage and write their names underneath the risk information that is written on one Lloyd’s slip developed for this purpose.

Lloyd’s coffee house was established in 1689 by Edward Lloyd. He was well-known among the sailors, merchants and ship owners as he shared reliable shipping news which helped in discussing deals including insurance.

Technological advancements in the 20th century and 21st century have given impetus to predictive analytics as can be seen through the following compilation by FICO.

Source: FICOPredictive Analytics and User Experience: A Detailed Look

IBM states that predictive analytics brings together advanced analytics capabilities comprising of ad-hoc analysis, predictive modelling, data mining, text analytics, optimisation, real-time scoring and machine learning. Enterprises can utilise these tools in order to discover patterns in data and forecast events.

Predictive Analytics is a form of advanced analytics which examines data or content to answer the question “What is going to happen?” or more precisely, “What is likely to happen?”, and is characterized by techniques such as regression analysis, forecasting, multivariate statistics, pattern matching, predictive modelling, and forecasting. - Gartner

A statistical representation of data compiled by Statista delineates that predictive analytics is only going to grow and its market share will keep expanding in the coming years.

Predictive analytics revenues/market size worldwide, from 2016 to 2022 (in billion U.S. dollars) | Statista

A Personalisation Pulse Check report from Accenture found that 65% of customers were more likely to shop at a store or online business that sent relevant and personalized promotions. So, instead of resulting in alterations to the user interface, applying a predictive analytics algorithm to UX design presents the users with relevant information. For instance, a user who has recently bought a costly mobile phone from an e-commerce site might be willing to buy a cover to protect it from dust and scratches. Hence, that user would receive a recommendation to purchase a cover. The e-commerce site might also recommend other accessories like headphones, memory cards or antivirus software.

How does Predictive Analytics Work?

Following are the capabilities of predictive analytics according to a compilation by IBM:

  • Statistical analysis and visualisation: It addresses the overall analytical process including planning, data collection, analysis, reporting and deployment.
  • Predictive modelling: It leverages the power of model-building, evaluation and automation capabilities.
  • Linear regression: Linear regression analysis helps in predicting the value of a variable on the basis of the value of another variable.
  • Logistic regression: It is also known as the logit model which is used for predictive analytics and modelling and is also utilised for application in machine learning.
Leveraging Predictive Models in UX Design

Data will drive the UX in the future. Patterns that derive data make for a terrific predictive engine. This helps in forecasting a user’s intent by compiling numerous predictors that together influence conversions.

Data will drive the UX in the future

With the help of predictive analytics in UX design, conversation rates can be improved. For instance, recommendation systems leverage data such as consumer interest and purchasing behaviour which is then applied via a predictive model for generating a listicle of recommended items. 

Amazon, e-commerce giant, utilises an item-item collaborative filtering algorithm for suggesting products. This helps in displaying the books to a bookworm and baby toys to a new mother. Quantum Interface, which is a startup in Austin Texas, has built a predictive user interface with the help of natural user interface (NUI) principles. This utilises the directional vectors - speed, time and angle change - for forecasting user’s intent.

Implementing Predictive UX with Drupal

Predictive UX adapts content based on a user’s previous choices just like web personalisation does. But predictive UX extracts the power of machine learning and statistical techniques for making informed decisions on the user’s behalf.

While modern technology is oscillating from mobile-first to AI-first, predictive UX is the next huge thing which is going to be a trend-setter. It is meritorious as it helps users reduce the cognitive load because coercing users to make too many decisions will propel them to take the easy way out.

Drupal provides different ways of implementing predictive UX:

Acquia Lift

Acquia Lift Connector, a Drupal module, offers integration with the Acquia Lift service and an improved user experience for web personalisation, testing and targeting directly on the front end of your website.

It leverages machine learning to automatically recommend content based on what a user is currently looking at or has looked in the past. It has a drag-and-drop feature for developing, previewing and launching personalisations and has a customer data repository for providing a single view of customers.

It has the feature of real-time adaptive targeting that refines segments while A/B helps in keeping the users engrossed with the content that resonates.

ApachePrediction IO

Bay Area Drupal Camp 2018 has a session where a demonstration showed how predictive UX helps users decide. It was geared towards both UX designers and developers. It talked about how machine learning powers predictive UX and the ways of implementing it using Drupal.

It exhibited a Drupal 8 site which had a list of restaurants that could be sorted by proximity. That means you can check out the nearest restaurant and order food. When users log in to this site, they see top recommendations customised to them.

There are some interesting things happening behind-the-scenes to show the recommendations. An API query is being sent to the machine learning server which, in return, shows a ranked list of recommendations. So, when users go to a restaurant and order food, all that data is sent to the event server through the API which is how data is being collected. Here, the Apache PredictionIO server, which is an open source machine learning stack, offers simple commands to train and deploy engine.

Gazing into the future of UX

UX Collective says that the future of UX is effulgent in the coming years. Demand for pixel perfect usable and delightful UX is sky-high especially when digital transformation endeavours underway globally. Following graphical representation shows the top design-driven organisations against all of Standard and Poor’s (S&P) index.

Source: Job Trends Report: The Job Market for UX/UI Designers

It further states that UX design will consist of more formal studies:

  • Study of cognitive neuroscience and human behaviour
  • Study of ethics
  • Artificial Intelligence advances, generated and unsupervised machine learning-based system interactions, predictive UX, personalised robotic services and so on

User experience will always be an integral component of any sector in any industry. While web personalisation is a sure-shot way of improving digital web experience, disrupting technologies like machine learning take it to another level. Leveraging machine learning algorithms, predictive UX can forecast user choices and help them decide. Implementing predictive UX is a praiseworthy solution to offer users an unprecedented digital experience.

When it comes to Drupal development, OpenSense Labs has been making steadfast in its objectives of embracing innovative technologies that can be implemented with Drupal’s robust framework.

Contact us at hello@opensenselabs.com to implement predictive UX with Drupal.

blog banner blog image Predictive UX Drupal 8 Machine Learning Predictive Analytics Predictive Modelling User Experience Web user experience Digital user experience Customer experience UX web personalisation Blog Type Articles Is it a good read ? On
Categories: FLOSS Project Planets

Reinout van Rees: Amsterdam Python meetup, november 2018

Planet Python - Fri, 2018-11-30 01:46

My summary of the 28 november python meetup at the Byte office. I myself also gave a talk (about cookiecutter) but I obviously haven't made a summary of that. I'll try to summarize that one later :-)

Project Auger - Chris Laffra

One of Chris' pet projects is auger, automated unittest generation. He wrote it when lying in bed with a broken ankle and thought about what he hated most: writing tests.

Auger? Automated Unittest GEneRator. It works by running a tracer

The project's idea is:

  • Write code as always
  • Don't worry about tests
  • Run the auger tracer to record function parameter values and function results.
  • After recording, you can generate mocks and assertions.

"But this breaks test driven development"!!! Actually, not quite. It can be useful if you have to start working on an existing code base without any tests: you can generate a basic set of tests to start from.

So: it records what you did once and uses that as a starting point for your tests. It makes sure that what ones worked keeps on working.

It works with a "context manager". A context manager normally has __enter__() and __exit__(). But you can add more interesting things. If in the __enter__()` you call sys.settrace(self.trace), you can add a def trace(self, frame, event, args), which is then fired upon everything that happens within the context manager. You can use it for coverage tracking or logging or visualization of what happens in your code. He used the last for algorithm visualizations on http://pyalgoviz.appspot.com/

So... this sys.settrace() magic is used to figure out which functions get called with which parameters.

Functions and classes in the modules you want to check are tested, classes from other modules are partially mocked.

Python LED animation system BiblioPixel - Tom Ritchford

Bibliopixel (https://github.com/ManiacalLabs/BiblioPixel) is his pet project. It is a python3 program that runs on basically everything (raspberry pi, linux, osx, windows. What it does? It controls large numbers of lights in real-time without programming.

There are lots of output drivers form led strips and philips hue to an opengl in-browser renderer. There are also lots of different ways to steer it. Here is the documentation.

He actually started on a lot of programs having to do with audio and lights and so. Starting with a PDP-11 (which only produced beeps). Amiga, macintosch (something that actually worked and was used for real), java, javascript, python + C++. And now python.

The long-term goal is to programmatically control lights and other hardware in real time. And... he wants to define the project in text files. The actual light "program" should not be in code. Ideally, bits of projects ought to be reusable. And any input ought to be connectable to any output.

Bibliopixel started with the AllPixel LED controller which had a succesful kickstarter campaign (he got involved two years later).

An "animation" talks to a "layout" and the layout talks to one or more drivers (one could be a debug visualization on your laptop and the other the real physical installation). Animations can be nested.

Above it all is the "Project". A YAML (or json) file that defines the project and configures everything.

Bibliopixel is quite forgiving about inputs. It accepts all sorts of colors (red, #ff0000, etc). Capitalization, missing spaces, extraneous spaces: all fine. Likewise about "controls": a control receives a "raw" message and then tries to convert it into something it understands.

Bibliopixel is very reliable. Lots of automated tests. Hardware test boards to test the code with the eight most common types of hardware. Solid error handling and readable error messages help a lot.

There are some weak points. The biggest is lack of developers. Another problem is that it only supports three colors (RGB). So you can't handle RGBW (RGB plus white) and other such newer combinations. He hopes to move the code over completely to numpy, that would help a lot. Numpy is already supported, but the existing legacy implementation currently also still needs to be work.

He showed some nice demos at the end.

Categories: FLOSS Project Planets

Google Code-in in KDE

Planet KDE - Thu, 2018-11-29 21:58

So far, so good! We're having quite a variety of students and I'm happy to see new ones still joining. And surprising to me, we're still getting beginner tasks being done, which keep us busy.

New this round were some tutorials written by Pranam Lashkari. I hope we can expand on this next year, because I think a lot of students who are willing to do these tutorials one by one are learning a lot. His thought is that they can be added to our documentation after the contest is over. I think we can re-use some stuff that we've already written, for next year. What do you think?

In addition, I'm seeing loads of work -- yes, small jobs, but keep in mind these kids are 13 to 18 years old* - for the teams who were willing to write up tasks and tutor. It is a lot of work so I really appreciate all those mentors who have stepped forward.

I'm very glad we are participating this year. It isn't as wild and crazy as it was in the beginning, because there are now lots more orgs involved, so the kids have lots of options. Happy that the kids we have are choosing us!


* Rules state: "13 to 17 years old and enrolled in a pre-university educational program" before enrolling. https://developers.google.com/open-source/gci/faq

Categories: FLOSS Project Planets

wget @ Savannah: GNU Wget 1.20 Released

GNU Planet! - Thu, 2018-11-29 19:14

Noteworthy Changes in this release:

  • Add new option `--retry-on-host-error` to treat local errors as transient and hence Wget will retry to download the file after a brief waiting period.
  • Fixed multiple potential resource leaks as found by static analysis
  • Wget will now not create an empty wget-log file when running with -q and -b switches together
  • When compiled using the GnuTLS >= 3.6.3, Wget now has support for TLSv1.3
  • Now there is support for using libpcre2 for regex pattern matching
  • When downloading over FTP recursively, one can now use the

--{accept,reject}-regex switches to fine-tune the downloaded files

  • Building Wget from the git sources now requires autoconf 2.63 or above. Building from the Tarballs works as it used to.
Categories: FLOSS Project Planets

Molly de Blanc: Free software activities (November, 2018)

Planet Debian - Thu, 2018-11-29 17:54

Welcome to what is the first and may or may not be the last monthly summary of my free software activities.

November was a good month for me, heavily laden with travel. Conferences and meetings took me to Seattle, WA (USA) and Milano and Bolzano in Italy. I think of my activities as generally focusing on “my” projects — that is to say, representing my own thoughts and ideas, rather than those of my employer or associated projects.

In addition to using my free time to work on free and open source software and related issues, my day job is at the Free Software Foundation. I included highlights from my past month at the FSF. This feels a little bit like cheating.

November Activities (personal)
  • I keynoted the Seattle GNU/Linux festival (SeaGL), delivering a talk entitled “Insecure connections: Love and mental health in our digital lives.” Slides are available on GitLab.
  • Attended an Open Source Initiative board meeting in Milan, Italy.
  • Spoke at SFScon in Bolzano, Italy, giving a talk entitled “User freedom: A love Story.” Slides forthcoming. For this talk, I created a few original slides, but largely repurposed images from “Insecure connections.”
  • I made my first quantative Debian contribution, in which I added the Open Source Initiative to the list of organizations to which Debian is a member.
  • Submitted sessions to the Community and the Legal and Policy devrooms at FOSDEM. #speakerlife
  • Reviewed session proposals for CopyLeft Conf, for which I am on the paper’s committee.
  • I helped organize a $15,000 match donation for the Software Freedom Conservancy.
Some highlights from my day job
Categories: FLOSS Project Planets

Daniel Pocock: Connecting software freedom and human rights

Planet Debian - Thu, 2018-11-29 17:04

2018 is the 70th anniversary of the Universal Declaration of Human Rights.

Over the last few days, while attending the UN Forum on Business and Human Rights, I've had various discussions with people about the relationship between software freedom, business and human rights.

In the information age, control of the software, source code and data translates into power and may contribute to inequality. Free software principles are not simply about the cost of the software, they lead to transparency and give people infinitely more choices.

Many people in the free software community have taken a particular interest in privacy, which is Article 12 in the declaration. The modern Internet challenges this right, while projects like TAILS and Tor Browser help to protect it. The UN's 70th anniversary slogan Stand up 4 human rights is a call to help those around us understand these problems and make effective use of the solutions.

We live in a time when human rights face serious challenges. Consider censorship: Saudi Arabia is accused of complicity in the disappearance of columnist Jamal Khashoggi and the White House is accused of using fake allegations to try and banish CNN journalist Jim Acosta. Arjen Kamphuis, co-author of Information Security for Journalists, vanished in mysterious circumstances. The last time I saw Arjen was at OSCAL'18 in Tirana.

For many of us, events like these may leave us feeling powerless. Nothing could be further from the truth. Standing up for human rights starts with looking at our own failures, both as individuals and organizations. For example, have we ever taken offense at something, judged somebody or rushed to make accusations without taking time to check facts and consider all sides of the story? Have we seen somebody we know treated unfairly and remained silent? Sometimes it may be desirable to speak out publicly, sometimes a difficult situation can be resolved by speaking to the person directly or having a meeting with them.

Being at the United Nations provided an acute reminder of these principles. In parallel to the event, the UN were hosting a conference on the mine ban treaty and the conference on Afghanistan, the Afghan president arriving as I walked up the corridor. These events reflect a legacy of hostilities and sincere efforts to come back from the brink.

A wide range of discussions and meetings

There were many opportunities to have discussions with people from all the groups present. Several sessions raised issues that made me reflect on the relationship between corporations and the free software community and the risks for volunteers. At the end of the forum I had a brief discussion with Dante Pesce, Chair of the UN's Business and Human Rights working group.

Best free software resources for human rights?

Many people at the forum asked me how to get started with free software and I promised to keep adding to my blog. What would you regard as the best online resources, including videos and guides, for people with an interest in human rights to get started with free software, solving problems with privacy and equality? Please share them on the Libre Planet mailing list.

Let's not forget animal rights too

Are dogs entitled to danger pay when protecting heads of state?

Categories: FLOSS Project Planets

Python Engineering at Microsoft: Python in Visual Studio Code – November 2018 Release

Planet Python - Thu, 2018-11-29 15:31

We are pleased to announce that the November 2018 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. You can learn more about Python support in Visual Studio Code in the documentation.

This release was a quality focused release, we have closed a total of 28 issues, improving startup performance and fixing various bugs related to interpreter detection and Jupyter support. Keep on reading to learn more!

Improved Python Extension Load Time

We have started using webpack to bundle the TypeScript files in the extension for faster load times, this has significantly improved the extension download size, installation time and extension load time.  You can see the startup time of the extension by running the Developer: Startup Performance command. Below shows before and after times of extension loading (measured in milliseconds):

One downside to this approach is that reporting & troubleshooting issues with the extension is harder as the call stacks output by the Python extension are minified. To address this we have added the Python: Enable source map support for extension debugging command. This command will load source maps for for better error log output. This slows down load time of the extension, so we provide a helpful reminder to disable it every time the extension loads with source maps enabled:

These download, install, and startup performance improvements will help you get to writing your Python code faster, and we have even more improvements planned for future releases.

Other Changes and Enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. The full list of improvements is listed in our changelog; some notable changes include:

  • Update Jedi to 0.13.1 and parso 0.3.1. (#2667)
  • Make diagnostic message actionable when opening a workspace with no currently selected Python interpreter. (#2983)
  • Fix problems with virtual environments not matching the loaded python when running cells. (#3294)
  • Make nbconvert in a installation not prevent notebooks from starting. (#3343)

Be sure to download the Python extension for Visual Studio Code now to try out the above improvements. If you run into any problems, please file an issue on the Python VS Code GitHub page.

Categories: FLOSS Project Planets