FLOSS Project Planets

Python Insider: Python 2.7.17 release candidate 1 available

Planet Python - Tue, 2019-10-08 17:24
A release candidate for the upcoming 2.7.17 bug fix release is now available for download.
Categories: FLOSS Project Planets

PyPy Development: PyPy's new JSON parser

Planet Python - Tue, 2019-10-08 15:45
Introduction In the last year or two I have worked on and off on making PyPy's JSON faster, particularly when parsing large JSON files. In this post I am going to document those techniques and measure their performance impact. Note that I am quite a lot more constrained in what optimizations I can apply here, compared to some of the much more advanced approaches like Mison, Sparser or SimdJSON because I don't want to change the json.loads API that Python programs expect, and because I don't want to only support CPUs with wide SIMD extensions. With a more expressive API, more optimizations would be possible.
There are a number of problems of working with huge JSON files: deserialization takes a long time on the one hand, and the resulting data structures often take a lot of memory (usually they can be many times bigger than the size of the file they originated from). Of course these problems are related, because allocating and initializing a big data structure takes longer than a smaller data structure. Therefore I always tried to attack both of these problems at the same time.
One common theme of the techniques I am describing is that of optimizing the parser for how JSON files are typically used, not how they could theoretically be used. This is a similar approach to the way dynamic languages are optimized more generally: most JITs will optimize for typical patterns of usage, at the cost of less common usage patterns, which might even become slower as a result of the optimizations.
Maps The first technique I investigated is to use maps in the JSON parser. Maps, also called hidden classes or shapes, are a fairly common way to (generally, not just in the context of JSON parsing) optimize instances of classes in dynamic language VMs. Maps exploit the fact that while it is in theory possible to add arbitrary fields to an instance, in practice most instances of a class are going to have the same set of fields (or one of a small number of different sets). Since JSON dictionaries or objects often come from serialized instances of some kind, this property often holds in JSON files as well: dictionaries often have the same fields in the same order, within a JSON file.
This property can be exploited in two ways: on the one hand, it can be used to again store the deserialized dictionaries in a more memory efficient way by not using a hashmap in most cases, but instead splitting the dictionary into a shared description of the set of keys (the map) and an array of storage with the values. This makes the deserialized dictionaries smaller if the same set of keys is repeated a lot. This is completely transparent to the Python programmer, the dictionary will look completely normal to the Python program but its internal representation is different.
One downside of using maps is that sometimes files will contain many dictionaries that have unique key sets. Since maps themselves are quite large data structures and since dictionaries that use maps contain an extra level of indirection we want to fall back to using normal hashmaps to represent the dictionaries where that is the case. To prevent this we perform some statistics at runtime, how often every map (i.e. set of keys) is used in the file. For uncommonly used maps, the map is discarded and the dictionaries that used the map converted into using a regular hashmap.
Using Maps to Speed up Parsing Another benefit of using maps to store deserialized dictionaries is that we can use them to speed up the parsing process itself. To see how this works, we need to understand maps a bit better. All the maps produced as a side-effect of parsing JSON form a tree. The tree root is a map that describes the object without any attributes. From every tree node we have a number of edges going to other nodes, each edge for a specific new attribute added:

This map tree is the result of parsing a file that has dictionaries with the keys a, b, c many times, the keys a, b, f less often, and also some objects with the keys x, y.
When parsing a dictionary we traverse this tree from the root, according to the keys that we see in the input file. While doing this, we potentially add new nodes, if we get key combinations that we have never seen before. The set of keys of a dictionary parsed so far are represented by the current tree node, while we can store the values into an array. We can use the tree of nodes to speed up parsing. A lot of the nodes only have one child, because after reading the first few keys of an object, the remaining ones are often uniquely determined in a given file. If we have only one child map node, we can speculatively parse the next key by doing a memcmp between the key that the map tree says is likely to come next and the characters that follow the ',' that started the next entry in the dictionary. If the memcmp returns true this means that the speculation paid off, and we can transition to the new map that the edge points to, and parse the corresponding value. If not, we fall back to general code that parses the string, handles escaping rules etc. This trick was explained to me by some V8 engineers, the same trick is supposedly used as part of the V8 JSON parser.
This scheme doesn't immediately work for map tree nodes that have more than one child. However, since we keep statistics anyway about how often each map is used as the map of a parsed dictionary, we can speculate that the most common map transition is taken more often than the others in the future, and use that as the speculated next node.
So for the example transition tree shown in the figure above the key speculation would succeed for objects with keys a, b, c. For objects with keys a, b, f the speculation would succeed for the first two keys, but not for the third key f. For objects with the keys x, y the speculation would fail for the first key x but succeed for the second key y.
For real-world datasets these transition trees can become a lot more complicated, for example here is a visualization of a part of the transition tree generated for parsing a New York Times dataset:

Caching Strings A rather obvious observation we can use to improve performance of the parser is the fact that string values repeat a lot in most JSON files. For strings that are used as dictionary keys this is pretty obvious. However it happens also for strings that are used as values in dictionaries (or are stored in lists). We can use this fact to intern/memoize strings and save memory. This is an approach that many JSON parsers use, including CPython's. To do this, I keep a dictionary of strings that we have seen so far during parsing and look up new strings that are deserialized. If we have seen the string before, we can re-use the deserialized previous string. Right now I only consider utf-8 strings for caching that do not contain any escapes (whether stuff like \", \n or escaped unicode chars).
This simple approach works extremely well for dictionary keys, but needs a number of improvements to be a win in general. The first observation is that computing the hash to look up the string in the dictionary of strings we've seen so far is basically free. We can compute the hash while scanning the input for the end of the string we are currently deserializing. Computing the hash while scanning doesn't increase the time spent scanning much. This is not a new idea, I am sure many other parsers do the same thing (but CPython doesn't seem to).
Another improvement follows from the observation that inserting every single deserialized non-key string into a hashmap is too expensive. Instead, we insert strings into the cache more conservatively, by keeping a small ring buffer of hashes of recently deserialized strings. The hash is looked for in the ring buffer, and only if the hash is present we insert the string into the memoization hashmap. This has the effect of only inserting strings into the memoization hashmap that re-occur a second time not too far into the file. This seems to give a good trade-off between still re-using a lot of strings but keeping the time spent updating and the size of the memoization hashmap low.
Another twist is that in a lot of situations caching strings is not useful at all, because it will almost never succeed. Examples of this are UUIDs (which are unique), or the content of a tweet in a JSON file with many tweets (which is usually unique). However, in the same file it might be useful to cache e.g. the user name of the Twitter user, because many tweets from the same person could be in such a file. Therefore the usefulness of the string cache depends on which fields of objects we are deserializing the value off. Therefore we keep statistics per map field and disable string memoization per individual field if the cache hit rate falls below a certain threshold. This gives the best of both worlds: in the cases where string values repeat a lot in certain fields we use the cache to save time and memory. But for those fields that mostly contain unique strings we don't waste time looking up and adding strings in the memoization table. Strings outside of dictionaries are quite rare anyway, so we just always try to use the cache for them.
The following pseudocode sketches the code to deserialize a string in the input at a given position. The function also takes a map, which is the point in the map tree that we are currently deserializing a field off (if we are deserializing a string in another context, some kind of dummy map can be used there).
def deserialize_string(pos, input, map): # input is the input string, pos is the position of the starting " of # the string # find end of string, check whether it contains escape codes, # compute hash, all at the same time end, escapes, hash = find_end_of_string(pos + 1, input) if end == -1: raise ParseError if escapes: # need to be much more careful with escaping return deserialize_string_escapes(pos, input) # should we cache at all? if map.cache_disabled(): return input[pos + 1:end] # if string is in cache, return it if hash in cache: map.cache_hit += 1 return cache[hash] result = input[pos + 1:end] map.cache_miss += 1 # if hash is in the ring buffer of recently seen hashes, # add the string to the cache if hash in ring_buffer: cache[hash] = result else: ring_buffer.write(hash) return result Evaluation To find out how much the various techniques help, I implemented a number of JSON parsers in PyPy with different combinations of the techniques enabled. I compared the numbers with the JSON parser of CPython 3.7.3 (simplejson), with ujson, with the JSON parser of Node 12.11.1 (V8) and with RapidJSON (in DOM mode).
I collected a number of medium-to-large JSON files to try the JSON parsers on:
  • Censys: A subset of the Censys port and protocol scan data for websites in the Alexa top million domains
  • Gharchive: Github activity from January 15-23, 2015 from Github Archive
  • Reddit: Reddit comments from May 2009
  • Rosie: The nested matches produced using the Rosie pattern language all.things pattern on a log file
  • Nytimes: Metadata of a collection of New York Times articles
  • Tpch: The TPC-H database benchmark's deals table as a JSON file
  • Twitter: A JSON export of the @pypyproject Twitter account data
  • Wikidata: A file storing a subset of the Wikidata fact dump from Nov 11, 2014
  • Yelp: A file of yelp businesses
Here are the file sizes of the benchmarks:
Benchmark File Size [MiB] Censys 898.45 Gharchive 276.34 NYTimes 12.98 Reddit 931.65 Rosie 388.88 TPCH 173.86 Wikidata 119.75 Yelp 167.61 I measured the times of each benchmark with a number of variations of the improved PyPy algorithms:
  • PyPyBaseline: The PyPy JSON parser as it was before my work with JSON parsing started (PyPy version 5.8)
  • PyPyKeyStringCaching: Memoizing the key strings of dictionaries, but not the other strings in a json file, and not using maps to represent dictionaries (this is the JSON parser that PyPy has been shipping since version 5.9, in the benchmarks I used 7.1).
  • PyPyMapNoCache: Like PyPyKeyStringCaching, but using maps to represent dictionaries. This includes speculatively parsing the next key using memcmp, but does not use string caching of non-key strings.
  • PyPyFull: Like PyPyMapNoCache but uses a string cache for all strings, not just keys. This is equivalent to what will be released soon as part of PyPy 7.2
In addition to wall clock time of parsing, I also measured the increase in memory use of each implementation after the input string has been deserialized, i.e. the size of the in-memory representation of every JSON file.
Contributions of Individual Optimizations Let's first look at the contributions of the individual optimizations to the overall performance and memory usage.

All the benchmarks were run 30 times in new processes, all the numbers are normalized to PyPyFull.
The biggest individual improvement to both parsing time and memory used comes from caching just the keys in parsed dictionaries. This is the optimization in PyPy's JSON parser that has been implemented for a while already. To understand why this optimization is so useful, let's look at some numbers about each benchmark, namely the number of total keys across all dictionaries in each file, as well as the number of unique keys. As we can see, for all benchmarks the number of unique keys is significantly smaller than the number of keys in total.
Benchmark Number of keys Number of unique keys Censys 14 404 234 163 Gharchive 6 637 881 169 NYTimes 417 337 60 Reddit 25 226 397 21 Rosie 28 500 101 5 TPCH 6 700 000 45 Wikidata 6 235 088 1 602 Yelp 5 133 914 61 The next big jump in deserialization time and memory comes from introducing maps to represent deserialized dictionaries. With PyPyMapNoCache deserialization time goes down because it's much cheaper to walk the tree of maps and store all deserialized objects into an array of values than to build hashmaps with the same keys again and again. Memory use goes down for the same reason: it takes a lot less memory to store the shared structure of each set of keys in the map, as opposed to repeating it again and again in every hashmap.
We can look at some numbers about every benchmark again. The table shows how many map-based dictionaries are deserialized for every benchmark, and how many hashmap-backed dictionaries. We see that the number of hashmap-backed dictionaries is often zero, or at most a small percentage of all dictionaries in each benchmark. Yelp has the biggest number of hashmap-backed dictionaries. The reason for this is that the input file contains hashmaps that store combinations of various features of Yelp businesses, and a lot of these combinations are totally unique to a business. Therefore the heuristics determine that it's better to store these using hashmaps.
Benchmark Map Dicts Regular Dicts % Regular Dicts Censys 4 049 235 1 042 0.03 Gharchive 955 301 0 0.00 NYTimes 80 393 0 0.00 Reddit 1 201 257 0 0.00 Rosie 6 248 966 0 0.00 TPCH 1 000 000 0 0.00 Wikidata 1 923 460 46 905 2.38 Yelp 443 140 52 051 10.51 We can also look at numbers about how often the memcmp-based speculative parsing of the next key of a given map succeeds. Looking at statistics about each benchmark, we can see that the speculation of what key we expect next pays off in a significant percentage of cases, between 63% for Wikidata where the dictionary structures are quite irregular, and 99% for Reddit, where all the dictionaries have the same set of keys.
Benchmark Number of Keys Map Transitions % Successful Speculation Censys 14 404 234 14 403 243 65.79 Gharchive 6 637 881 6 637 881 86.71 NYTimes 417 337 417 337 79.85 Reddit 25 226 397 25 226 397 100.00 Rosie 28 500 101 28 500 101 90.37 TPCH 6 700 000 6 700 000 86.57 Wikidata 6 235 088 5 267 744 63.68 Yelp 5 133 914 4 593 980 90.43 geomean 82.04 General string caching is the most unclear optimization. On the one hand its impact on memory usage is quite substantial, leading to a 20% reduction for Gharchive and Reddit, up to a 2× improvement for Yelp. On the other hand, the effect on performance is less clear, since it even leads to a slowdown in Gharchive and Reddit, and generally only a small improvement. Choosing the right heuristic for when to disable the cache also has somewhat unclear effects and is definitely a topic worthy of further investigation.
Comparison against other JSON Decoders To get a more general feeling of the performance and memory usage of the improved PyPy parser, we compare it against CPython's built-in json parser, ujson for CPython, Node's (V8) JSON parser and RapidJSON. For better context for the memory usage I also show the file size of the input files.
These benchmarks are not really an apples-to-apple comparison. All of the implementations use different in-memory representations of strings in the deserialized data-structure (Node uses two bytes per character in a string, in CPython it depends but 4 bytes on my machine), PyPyBaseline uses four bytes, PyPy and RapidJSON use utf-8). But it's still interesting to get some ballpark numbers. The results are as follows:

As we can see, PyPyFull handily beats CPython and ujson, with a geometric mean of the improvement of about 2.5×. The memory improvement can be even more extreme, with an improvement of over 4× against CPython/ujson in some cases (CPython gives better memory sizes, because its parser caches the keys of dictionaries as well). Node is often more than 50% slower, whereas RapidJSON beats us easily, by a factor of 2× on average.
Conclusions While the speedup I managed to achieve over the course of this project is nice and I am certainly happy to beat both CPython and Node, I am ultimately still annoyed that RapidJSON manages to maintain such a clear lead over PyPyFull, and would like to get closer to it. One problem that PyPy suffers compared to RapidJSON is the overhead of garbage collection. Deserializing large JSON files is pretty much the worst case for the generational GC that PyPy uses, since none of the deserialized objects die young (and the GC expects that most objects do). That means that a lot of the deserialization time of PyPy is wasted allocating the resulting objects in the nursery, and then copying them into the old generation. Somehow, this should be done in better ways, but all my attempts to not have to do the copy did not seem to help much. So maybe more improvements are possible, if I can come up with more ideas.
On the memory side of things, Node/V8 is beating PyPy clearly which might indicate more general problems in how we represent Python objects in memory. On the other hand, I think it's cool that we are competitive with RapidJSON in terms of memory and often within 2× of the file size.
An effect that I didn't consider at all in this blog post is the fact that accessing the deserialized objects with constants strings is also faster than with regular dictionaries, due to them being represented with maps. More benchmarking work to do in the future!
If you have your own programs that run on PyPy and use the json parser a lot, please measure them on the new code and let me know whether you see any difference!
Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #389 (Oct. 8, 2019)

Planet Python - Tue, 2019-10-08 15:30

#389 – OCTOBER 8, 2019
View in Browser »

Get Started With Django: Build a Portfolio App

In this course, you’ll learn the basics of creating powerful web applications with Django, a Python web framework. You’ll build a portfolio website to showcase your web development projects, complete with a fully functioning blog.
REAL PYTHON video

Auto Formatters for Python

An auto formatter is a tool that will format your code in a way it complies with the tool or any other standard it set. But which auto formatter should you use with Python code?
KEVIN PETERS • Shared by Kevin Peters

Monitor Your Python Applications With Datadog’s Distributed Tracing and APM

Debug and optimize your code by tracing requests across web servers, databases, and services in your environment—and seamlessly correlate those traces with metrics and logs to troubleshoot issues. Try Datadog in your environment with a free 14-day trial →
DATADOG sponsor

Timsort: The Fastest Sorting Algorithm You’ve Never Heard Of

CPython uses Timsort for sorting containers. Timsort is a fast O(n log n) stable sorting algorithm built for the real world — not constructed in academia.
BRANDON SKERRITT

PyPy’s New JSON Parser

PyPy has a new and faster JSON parser implementation. This post covers the design decisions that were made to develop the new and improved parser.
PYPY STATUS BLOG

Automatically Reloading Python Modules With %autoreload

Tired of having to reload a module each time you change it? IPython’s %autoreload to the rescue!
SEBASTIAN WITOWSKI

Six Django Template Tags Not Often Used in Tutorials

{% for ... %} {% empty %} {% endfor %} anyone?
MEDIUM.COM/@HIGHCENBURG

(Floating Point) Numbers, They Lie

When and why 2 + 2 = 4.00000000000000000001…
GLYPH LEFKOWITZ

Python 2.7 Retirement Countdown: ~2.5 Months

PYTHONCLOCK.ORG

Discussions What’s Your Favorite Python Library?

Twitter discussion about everyone’s best-loved Python libraries. What’s your personal favorite?
MIKE DRISCOLL

Python Jobs Full Stack Developer (Toronto, ON, Canada)

Beanfield Metroconnect

Backend Developer (Kfar Saba, Israel)

3DSignals

More Python Jobs >>>

Articles & Tutorials How Dictionaries Are Implemented in CPython

Learn what hash tables are, why you would use them, and how they’re used to implement dictionaries in the most popular Python interpreter: CPython.
DATA-STRUCTURES-IN-PRACTICE.COM

Building a Python C Extension Module

Learn how to write Python interfaces in C. Find out how to invoke C functions from within Python and build Python C extension modules. You’ll learn how to parse arguments, return values, and raise custom exceptions using the Python API.
REAL PYTHON

Automated Python Code Reviews, Directly From Your Git Workflow

Codacy lets developers spend more time shipping code and less time fixing it. Set custom standards and automatically track quality measures like coverage, duplication, complexity and errors. Integrates with GitHub, GitLab and Bitbucket, and works with 28 different languages. Get started today for free →
CODACY sponsor

Is Rectified Adam Actually Better Than Adam?

“Is the Rectified Adam (RAdam) optimizer actually better than the standard Adam optimizer? According to my 24 experiments, the answer is no, typically not (but there are cases where you do want to use it instead of Adam).”
ADRIAN ROSEBROCK

Using the Python zip() Function for Parallel Iteration

How to use Python’s built-in zip() function to solve common programming problems. You’ll learn how to traverse multiple iterables in parallel and create dictionaries with just a few lines of code.
REAL PYTHON

How Is Python 2 Supported in RHEL After 2020?

What the Py 2.x end-of-life deadline means in practice, e.g. “Just because the PSF consider Python 2 unsupported does not mean that Python 2 is unsupported within RHEL.” Also see the related discussion on Hacker News.
REDHAT.COM

Principal Component Analysis (PCA) With Python, From Scratch

Derive PCA from first principles and implement a working version in Python by writing all the linear algebra code from scratch. Nice tutorial!
ORAN LOONEY

Winning the Python Software Interview

Tips on acing your Python coding interview, PSF has a new Code of Conduct, and regex testing tools.
PYTHON BYTES FM podcast

Analyzing the Stack Overflow Survey With Python and Pandas

Do your own data science exploration and analysis on the annual developer survey’s dataset.
MOSHE ZADKA

Python 2 EOL: Are You Prepared? Take Our Survey for a Chance to Win a Drone!

Python 2 End of Life is coming soon. Please take our 5-minute survey to let us know how you’re preparing for the change. You’ll get the final results, plus the chance to win a camera drone. Thanks for your time!
ACTIVESTATE sponsor

How to Add Maps to Django Web App Projects With Mapbox

Learn how to add maps and location-based data to your web applications using Mapbox.
MATT MAKAI

Coding a Simplex Solver From Scratch With Python

OLE KRÖGER

Write Your Own DNS Server in Python Hosted on Kubernetes

NEERAN GUL

Turn Python Scripts Into Beautiful ML Tools With Streamlit

ADRIEN TREUILLE

Python and Fast HTTP Clients

JULIEN DANJOU

Hiding Items From the Legend in Matplotlib

ROBIN WILSON

Projects & Code re-assert: Show Where Your Regex Match Assertion Failed

GITHUB.COM/ASOTTILE

PyWaffle: Make Waffle Charts in Python

GITHUB.COM/GYLI • Shared by Guangyang Li

imagededup: Finding Duplicate Images Made Easy

GITHUB.COM/IDEALO

django-stubs: PEP-484 Type Checking Stubs for Django

GITHUB.COM/TYPEDDJANGO

flask-swagger-types: Swagger API Spec Generator and Type Checker for Flask

GITHUB.COM/PLAINAS

pybind11: Seamless Operability Between C++11 and Python

PYBIND11.READTHEDOCS.IO

pyflow: Dependency and Version Management System for Python

GITHUB.COM/DAVID-OCONNOR

pycld3: Python 3 Bindings for the Compact Language Detector V3 (CLD3)

BRAD SOLOMON

Events SciPy Latam

October 8 to October 11, 2019
SCIPYLA.ORG

PyCon ZA 2019

October 9 to October 14, 2019
PYCON.ORG

PyConDE & PyData Berlin 2019

October 9 to October 12, 2019
PYCON.ORG

Python Miami

October 12 to October 13, 2019
PYTHONDEVELOPERSMIAMI.COM

PyCon Pakistan 2019

October 12 to October 13, 2019
PYCON.PK

PyCon India 2019

October 12 to October 16, 2019
PYCON.ORG

PyCode Conference 2019

October 14 to October 17, 2019
PYCODE-CONFERENCE.ORG

Happy Pythoning!
This was PyCoder’s Weekly Issue #389.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

Hook 42: Come for Education, Stay for Community - BADCamp 2019

Planet Drupal - Tue, 2019-10-08 15:04
Come for Education, Stay for Community - BADCamp 2019 Lindsey Gemmill Tue, 10/08/2019 - 20:25
Categories: FLOSS Project Planets

Python Circle: 5 lesser used Django template tags

Planet Python - Tue, 2019-10-08 12:45
rarely used Django template tags, lesser-known Django template tags, 5 awesome Django template tags, Fun with Django template tags,
Categories: FLOSS Project Planets

Python Circle: Solving Python Error- KeyError: 'key_name'

Planet Python - Tue, 2019-10-08 12:45
Solving KeyError in python, How to handle KeyError in python dictionary, Safely accessing and deleting keys from python dictionary, try except Key error in Python
Categories: FLOSS Project Planets

Anarcat: Tip of the day: batch PDF conversion with LibreOffice

Planet Python - Tue, 2019-10-08 12:28

Someone asked me today why they couldn't write on the DOCX document they received from a student using the pen in their Onyx Note Pro reader. The answer, of course, is that while the Onyx can read those files, it can't annotate them: that only works with PDFs.

Next question then, is of course: do I really need to open each file separately and save them as PDF? That's going to take forever, I have 30 students per class!

Fear not, shell scripting and headless mode flies in to the rescue!

As it turns out, one of the Libreoffice parameters allow you to run batch operations on files. By calling:

libreoffice --headless --convert-to pdf *.docx

LibreOffice will happily convert all the *.docx files in the current directory to PDF. But because navigating the commandline can be hard, I figured I could push this a tiny little bit further and wrote the following script:

#!/bin/sh exec libreoffice --headless --convert-to pdf "$@"

Drop this in ~/.local/share/nautilus/scripts/libreoffice-pdf, mark it executable, and voilà! You can batch-convert basically any text file (or anything supported by LibreOffice, really) into PDF.

Now I wonder if this would be a useful addition to the Debian package, anyone?

Categories: FLOSS Project Planets

Antoine Beaupré: Tip of the day: batch PDF conversion with LibreOffice

Planet Debian - Tue, 2019-10-08 12:28

Someone asked me today why they couldn't write on the DOCX document they received from a student using the pen in their Onyx Note Pro reader. The answer, of course, is that while the Onyx can read those files, it can't annotate them: that only works with PDFs.

Next question then, is of course: do I really need to open each file separately and save them as PDF? That's going to take forever, I have 30 students per class!

Fear not, shell scripting and headless mode flies in to the rescue!

As it turns out, one of the Libreoffice parameters allow you to run batch operations on files. By calling:

libreoffice --headless --convert-to pdf *.docx

LibreOffice will happily convert all the *.docx files in the current directory to PDF. But because navigating the commandline can be hard, I figured I could push this a tiny little bit further and wrote the following script:

#!/bin/sh exec libreoffice --headless --convert-to pdf "$@"

Drop this in ~/.local/share/nautilus/scripts/libreoffice-pdf, mark it executable, and voilà! You can batch-convert basically any text file (or anything supported by LibreOffice, really) into PDF.

Now I wonder if this would be a useful addition to the Debian package, anyone?

Categories: FLOSS Project Planets

Andy Wingo: thoughts on rms and gnu

GNU Planet! - Tue, 2019-10-08 11:34

Yesterday, a collective of GNU maintainers publicly posted a statement advocating collective decision-making in the GNU project. I would like to expand on what that statement means to me and why I signed on.

For many years now, I have not considered Richard Stallman (RMS) to be the head of the GNU project. Yes, he created GNU, speaking it into existence via prophetic narrative and via code; yes, he inspired many people, myself included, to make the vision of a GNU system into a reality; and yes, he should be recognized for these things. But accomplishing difficult and important tasks for GNU in the past does not grant RMS perpetual sovereignty over GNU in the future.

ontological considerations

More on the motivations for the non serviam in a minute. But first, a meta-point: the GNU project does not exist, at least not in the sense that many people think it does. It is not a legal entity. It is not a charity. You cannot give money to the GNU project. Besides the manifesto, GNU has no by-laws or constitution or founding document.

One could describe GNU as a set of software packages that have been designated by RMS as forming part, in some way, of GNU. But this artifact-centered description does not capture movement: software does not, by itself, change the world; it lacks agency. It is the people that maintain, grow, adapt, and build the software that are the heart of the GNU project -- the maintainers of and contributors to the GNU packages. They are the GNU of whom I speak and of whom I form a part.

wasted youth

Richard Stallman describes himself as the leader of the GNU project -- the "chief GNUisance", he calls it -- but this position only exists in any real sense by consent of the people that make GNU. So what is he doing with this role? Does he deserve it? Should we consent?

To me it has been clear for many years that to a first approximation, the answer is that RMS does nothing for GNU. RMS does not write software. He does not design software, or systems. He does hold a role of accepting new projects into GNU; there, his primary criteria is not "does this make a better GNU system"; it is, rather, "does the new project meet the minimum requirements".

By itself, this seems to me to be a failure of leadership for a software project like GNU. But unfortunately when RMS's role in GNU isn't neglect, more often as not it's negative. RMS's interventions are generally conservative -- to assert authority over the workings of the GNU project, to preserve ways of operating that he sees as important. See for example the whole glibc abortion joke debacle as an example of how RMS acts, when he chooses to do so.

Which, fair enough, right? I can hear you saying it. RMS started GNU so RMS decides what it is and what it can be. But I don't accept that. GNU is about practical software freedom, not about RMS. GNU has long outgrown any individual contributor. I don't think RMS has the legitimacy to tell this group of largely volunteers what we should build or how we should organize ourselves. Or rather, he can say what he thinks, but he has no dominion over GNU; he does not have majority sweat equity in the project. If RMS actually wants the project to outlive him -- something that by his actions is not clear -- the best thing that he could do for GNU is to stop pretending to run things, to instead declare victory and retire to an emeritus role.

Note, however, that my personal perspective here is not a consensus position of the GNU project. There are many (most?) GNU developers that still consider RMS to be GNU's rightful leader. I think they are mistaken, but I do not repudiate them for this reason; we can work together while differing on this and other matters. I simply state that I, personally, do not serve RMS.

selective attrition

Though the "voluntary servitude" questions are at the heart of the recent joint statement, I think we all recognize that attempts at self-organization in GNU face a grave difficulty, even if RMS decided to retire tomorrow, in the way that GNU maintainers have selected themselves.

The great tragedy of RMS's tenure in the supposedly universalist FSF and GNU projects is that he behaves in a way that is particularly alienating to women. It doesn't take a genius to conclude that if you're personally driving away potential collaborators, that's a bad thing for the organization, and actively harmful to the organization's goals: software freedom is a cause that is explicitly for everyone.

We already know that software development in people's free time skews towards privilege: not everyone has the ability to devote many hours per week to what is for many people a hobby, and it follows of course that those that have more privilege in society will be more able to establish a position in the movement. And then on top of these limitations on contributors coming in, we additionally have this negative effect of a toxic culture pushing people out.

The result, sadly, is that a significant proportion of those that have stuck with GNU don't see any problems with RMS. The cause of software freedom has always run against the grain of capitalism so GNU people are used to being a bit contrarian, but it has also had the unfortunate effect of creating a cult of personality and a with-us-or-against-us mentality. For some, only a traitor would criticise the GNU project. It's laughable but it's a thing; I prefer to ignore these perspectives.

Finally, it must be said that there are a few GNU people for whom it's important to check if the microphone is on before making a joke about rape culture. (Incidentally, RMS had nothing to say on that issue; how useless.)

So I honestly am not sure if GNU as a whole effectively has the demos to make good decisions. Neglect and selective attrition have gravely weakened the project. But I stand by the principles and practice of software freedom, and by my fellow GNU maintainers who are unwilling to accept the status quo, and I consider attempts to reduce GNU to founder-loyalty to be mistaken and without legitimacy.

where we're at

Given this divided state regarding RMS, the only conclusion I can make is that for the foreseeable future, GNU is not likely to have a formal leadership. There will be affinity groups working in different ways. It's not ideal, but the differences are real and cannot be papered over. Perhaps in the medium term, GNU maintainers can reach enough consensus to establish a formal collective decision-making process; here's hoping.

In the meantime, as always, happy hacking, and: no gods! No masters! No chief!!!

Categories: FLOSS Project Planets

Codementor: Python-compatible IDEs: What is It and Why Do You Need It?

Planet Python - Tue, 2019-10-08 11:14
There is no better way to build in Python than by using an IDE (Integrated Development Environment). They not only make your work much easier as well as logical; they also enhance the coding...
Categories: FLOSS Project Planets

Steve Kemp: A blog overhaul

Planet Debian - Tue, 2019-10-08 11:00

When this post becomes public I'll have successfully redeployed my blog!

My blog originally started in 2005 as a Wordpress installation, at some point I used Mephisto, and then I wrote my own solution.

My project was pretty cool; I'd parse a directory of text-files, one file for each post, and insert them into an SQLite database. From there I'd initiate a series of plugins, each one to generate something specific:

  • One plugin would output an archive page.
  • Another would generate a tag cloud.
  • Yet another would generate the actual search-results for a particular month/year, or tag-name.

All in all the solution was flexible and it wasn't too slow because finding posts via the SQLite database was pretty good.

Anyway I've come to realize that freedom and architecture was overkill. I don't need to do fancy presentation, I don't need a loosely-coupled set of plugins.

So now I have a simpler solution which uses my existing template, uses my existing posts - with only a few cleanups - and generates the site from scratch, including all the comments, in less than 2 seconds.

After running make clean a complete rebuild via make upload (which deploys the generated site to the remote host via rsync) takes 6 seconds.

I've lost the ability to be flexible in some areas, but I've gained all the speed. The old project took somewhere between 20-60 seconds to build, depending on what had changed.

In terms of simplifying my life I've dropped the remote installation of a site-search which means I can now host this site on a static site with only a single handler to receive any post-comments. (I was 50/50 on keeping comments. I didn't want to lose those I'd already received, and I do often find valuable and interesting contributions from readers, but being 100% static had its appeal too. I guess they stay for the next few years!)

Categories: FLOSS Project Planets

Jonas Meurer: debian lts report 2019.09

Planet Debian - Tue, 2019-10-08 10:34
Debian LTS report for September 2019

This month I was allocated 10 hours and carried over 9.5 hours from August. Unfortunately, again I didn't find much time to work on LTS issues, partially because I was travelling. I spent 5 hours on the task listed below. That means that I carry over 14.5 hours to October.

Links
Categories: FLOSS Project Planets

PyCharm: Webinar Preview: Project Setup for React+TS+TDD

Planet Python - Tue, 2019-10-08 10:33

Next Wednesday (Oct 16) I’m giving a webinar on React+TypeScript+TDD in PyCharm. Let’s give a little background on the webinar and spotlight one of the first parts covered.

Background

Earlier this year we announced a twelve-part in-depth tutorial on React, TypeScript, and Test-Driven Development (TDD) in PyCharm. For the most part it highlights PyCharm Professional’s bundling of WebStorm, our professional IDE for web development.

The tutorial is unique in a few ways:

  • It’s React, but with TypeScript, a combination with spotty coverage
  • Concepts are introduced through the lens of test-driven development, where you explore in the IDE, not the browser
  • Each tutorial step has a narrated video plus an in-depth discussion with working code
  • The tutorial highlights places where the IDE does your janitorial work for you and helps you “fail faster” and stay in the “flow”

This webinar will cover sections of that tutorial in a “ask-me-anything” kind of format. There’s no way we’ll get through all the sections, but we’ll get far enough give attendees a good grounding on the topic. As a note, I’m also later giving this tutorial later as a guest for both an IntelliJ IDEA webinar and Rider webinar.

I really enjoy telling the story about combining TypeScript with TDD to “fail faster”, as well as letting the IDE do your janitorial work to help you stay in the “flow.” Hope you’ll join us!

Spotlight: Project Setup

The tutorial starts with Project Setup, which uses the IDE’s integration with Create React App to scaffold a React+TypeScript project. Note: The tutorial is, as is common in the world of JS, six months out of date. Things have changed, such as tslint -> eslint. I’ll re-record once PyCharm 2019.3 is released, but for the webinar, I’ll use the latest Create React App.

For this tutorial I started with an already-generated directory, but in the webinar I’ll show how to use the IDE’s New Project to do so.

This tutorial step orients us in the IDE: opening package.json to run scripts in tool windows, generating a production build, and running a test then re-running it after an edit.

The second tutorial step does some project cleanup before getting into testing.

Categories: FLOSS Project Planets

Drupal Association blog: Colombia joins the growing community of local Drupal associations

Planet Drupal - Tue, 2019-10-08 10:17

The Drupal project is global. There are people using, implementing, and contributing to the Drupal project in nearly every country of the world.

Being able to encourage and support our global community to promote and grow the project must also be a global operation, and we are delighted to read that the Drupal Association of Colombia (ADC) has now been officially formed.

This local association will help to promote and stimulate the use of Drupal in Colombia and act as a focus to propel the Colombia community’s efforts and initiatives in accordance with the Drupal values and principles.

The founding members of the local association are partners and executives of two experienced Drupal Agencies in Colombia: Jairo Pinzón, Aldibier Morales, and William Vera from Seed EM and Jorge Alexander Salcedo and Carolina Poveda from Bits Americas; a senior Drupal developer and very active community member, Iván Chaquea; a marketer and very active Drupal adopter Jonathan Osorio from Grupo Éxito; and from the academic side, Socrates Rojas, dean of the faculty of computer science from the Instituto Técnico Central of Bogota, a prestigious public technical school.

Membership is now open to organizations and individuals who wish to join the Drupal Association of Colombia and who share the same interest. By joining, members will have access to all the local activities, training, official Drupal events, and the opportunity to contribute in a more cohesive way. More information will soon be available at www.asociaciondrupal.org​.

The ADC is now preparing for its first official event — Drupal Camp Medellin on June 5-6, 2020.

Please join us in congratulating all involved and wishing them a successful future!

File attachments:  colombia.png
Categories: FLOSS Project Planets

Real Python: Get Started With Django: Build a Portfolio App

Planet Python - Tue, 2019-10-08 10:00

Django is a fully featured Python web framework that can be used to build complex web applications. In this course, you’ll jump in and learn Django by example. You’ll follow the steps to create a fully functioning web application and, along the way, learn some of the most important features of the framework and how they work together.

By the end of this course, you will be able to:

  • Understand what Django is and why it’s a great web framework
  • Understand the architecture of a Django site and how it compares with other frameworks
  • Set up a new Django 2 project and app
  • Build a personal portfolio website with Django 2 and Python 3

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

PyBites: Linting with Flake8

Planet Python - Tue, 2019-10-08 09:45

For so long the word "Linting" meant nothing to me. It sounded like some supercoder leet speak that was way out of my league. Then I discovered flake8 and realised I was a fool.

This article is a simple one. It covers what linting is; what Flake8 is and has an embarrassing example of it in use.

Before we get started, I need to get something off my chest. I don't know why but I really hate the word "linting". It's a hatred akin to people and the word "moist".

Linting. Linting. Linting. shudder. Let's move on!

What is Linting?

Just so I never have to type it again, let's quickly cover off what linting is.

It's actually pretty simple. Linting is the process of running a program that analyses code for programmatic errors such as bugs, actual errors, styling issues etc.

Put it in the same basket as the process running in your favourite text editor that keeps an eye out for typos and grammatical errors.

This brings us to Flake8.

What is Flake8?

It's one of these linting programs and is pretty damn simple to use. It also happens to analyse your code for PEP8 standard violations!

I love it for a few reasons:

  • I'm constantly learning something new. It picks away at my code, pointing out my failings, much like annoying friends.
  • It keeps my code looking schmick. It's easy to miss spacing and other tiny things while coding so running Flake8 against my code catches little annoyances before it's pushed to prod.
  • It's a much nicer word than "linting".
Flake8 in Action

To demonstrate my beloved Flake8 I thought I'd grab an old, and I mean old, script that's likely riddled with issues. Judge me not friends!

In an older (I'm really stressing old here) article I wrote a simple script to send emails. No functions or anything, just line by line code. Ignoring what the code actually does take a look at this snippet below. Full code here.

λ cat generic_emailer.py #!python3 #emailer.py is a simple script for sending emails using smtplib #The idea is to assign a web-scraped file to the DATA_FILE constant. #The data in the file is then read in and sent as the body of the email. <snip> DATA_FILE = 'scraped_data_file' from_addr = 'your_email@gmail.com' to_addr = 'your_email@gmail.com' #Or any generic email you want all recipients to see bcc = EMAILS <snip>

Now that we have my script, let's run flake8 against it.

  1. pip install the sucker:
(venv) λ pip install flake8
  1. Simply run flake8 and point it at your script. Given generic_emailer.py is in my current directory I'd run the following:
(venv) λ flake8 generic_emailer.py
  1. In traditional CLI fashion, if you don't receive any output at all, you have no issues. In my case, yeah, nope. The output I receive when running Flake8 against my script is as follows:
(venv) λ flake8 generic_emailer.py generic_emailer.py:2:1: E265 block comment should start with '# ' generic_emailer.py:3:1: E265 block comment should start with '# ' generic_emailer.py:4:1: E265 block comment should start with '# ' generic_emailer.py:14:35: E262 inline comment should start with '# ' generic_emailer.py:14:80: E501 line too long (86 > 79 characters) generic_emailer.py:27:50: E261 at least two spaces before inline comment generic_emailer.py:27:51: E262 inline comment should start with '# ' generic_emailer.py:29:19: E261 at least two spaces before inline comment generic_emailer.py:29:20: E262 inline comment should start with '# ' generic_emailer.py:31:23: E261 at least two spaces before inline comment generic_emailer.py:31:24: E262 inline comment should start with '# ' generic_emailer.py:33:1: E265 block comment should start with '# ' generic_emailer.py:38:1: E265 block comment should start with '# ' generic_emailer.py:41:1: E265 block comment should start with '# ' generic_emailer.py:44:1: E265 block comment should start with '# ' Analysing the Output

Before we look into the actual issues, here's a quick breakdown of what the above means.

  • The first section is the name of the file we're... flaking... Yes, I'm making the word "flaking" a thing!
  • The next two numbers represent the line number and the character position in that line. ie: line 2, position 1.
  • Finally, we have the actual issue. The "E" number is the error/violation number. The rest is the detail of the problem.

Now what does it all mean?

Well, the majority of my violations here have to do with the spacing in front of my comments.

  • The E265 violations are simply telling me to add a space after my # to satisfy standards.
  • E510 is saying I have too many characters in my line with the limit being 79.

You can read the rest!

Fixing the Violations

Let's quickly fix two of the violations:

  • generic_emailer.py:14:35: E262 inline comment should start with '# '
  • generic_emailer.py:14:80: E501 line too long (86 > 79 characters)

The code in question on line 14 is this:

to_addr = 'your_email@gmail.com' #Or any generic email you want all recipients to see

I can actually fix both issues by simply removing the comment. Doing this and running Flake8 again gets me the following output:

(venv) λ flake8 generic_emailer.py generic_emailer.py:2:1: E265 block comment should start with '# ' generic_emailer.py:3:1: E265 block comment should start with '# ' generic_emailer.py:4:1: E265 block comment should start with '# ' generic_emailer.py:27:50: E261 at least two spaces before inline comment generic_emailer.py:27:51: E262 inline comment should start with '# ' generic_emailer.py:29:19: E261 at least two spaces before inline comment generic_emailer.py:29:20: E262 inline comment should start with '# ' generic_emailer.py:31:23: E261 at least two spaces before inline comment generic_emailer.py:31:24: E262 inline comment should start with '# ' generic_emailer.py:33:1: E265 block comment should start with '# ' generic_emailer.py:38:1: E265 block comment should start with '# ' generic_emailer.py:41:1: E265 block comment should start with '# ' generic_emailer.py:44:1: E265 block comment should start with '# '

Note the two violations are gone.

Ignoring Violations

What if I don't care about the spacing of my comment #s?

Sometimes you'll want Flake8 to ignore specific issues. One of the most common use cases is to ignore line length.

You can do this by running flake8 --ignore=E<number>. Just specify which violations you want to ignore and Flake8 will overlook them.

To save yourself time you can also create a Flake8 config file and hardcode the violation codes into that. This method will save you specifying the code every time you run Flake8.

In my case I'm going to ignore those pesky E265 violations because I can.

I need to create a .flake8 file in my parent directory and add the following (with vim of course!):

(venv) λ touch flake8 (venv) λ cat .flake8 [flake8] ignore = E265

When I re-run Flake8 I now see the following:

(venv) λ flake8 generic_emailer.py generic_emailer.py:27:50: E261 at least two spaces before inline comment generic_emailer.py:27:51: E262 inline comment should start with '# ' generic_emailer.py:29:19: E261 at least two spaces before inline comment generic_emailer.py:29:20: E262 inline comment should start with '# ' generic_emailer.py:31:23: E261 at least two spaces before inline comment generic_emailer.py:31:24: E262 inline comment should start with '# '

The rest of the errors are an easy clean up so I'll leave it here.

Flake8 on PyBites CodeChallenges

As luck would have it, we've just implemented a new feature on the PyBites CodeChallenges platform that allows you to run flake8 against your browser based code!

Now you can have flake8 lint your code to perfection while you solve our Bites.

Check it out in all its glory:

Conclusion

Whether you like the word Linting or not, there's no denying the value it can provide - Flake8 case in point.

While it can definitely grow a little tiresome at times if you have a well crafted config file you can customise it to your liking and pain threshold.

It really is a brilliant tool to add to your library so give it a try!

Keep Calm and Code in Python!

-- Julian

Categories: FLOSS Project Planets

Jamie McClelland: Editing video without a GUI? Really?

Planet Debian - Tue, 2019-10-08 09:19

It seems counter intuitive - if ever there was a program in need of a graphical user interface, it's a non-linear video editing program.

However, as part of the May First board elections, I discovered otherwise.

We asked each board candidate to submit a 1 - 2 minute video introduction about why they want to be on the board. My job was to connect them all into a single video.

I had an unrealistic thought that I could find some simple tool that could concatenate them all together (like mkvmerge) but I soon realized that this approach requires that everyone use the exact same format, codec, bit rate, sample rate and blah blah blah.

I soon realized that I needed to actually make a video, not compile one. I create videos so infrequently, that I often forget the name of the video editing software I used last time so it takes some searching. This time I found that I had openshot-qt installed but when I tried to run it, I got a back trace (which someone else has already reported).

I considered looking for another GUI editor, but I wasn't that interested in learning what might be a complicated user interface when what I need is so simple.

So I kept searching and found melt. Wow.

I ran:

melt originals/* -consumer avformat:all.webm acodec=libopus vcodec=libvpx

And a while later I had a video. Impressive. It handled people who submitted their videos in portrait mode on their cell phones in mp4 as well as web cam submissions using webm/vp9 on landscape mode.

Thank you melt developers!

Categories: FLOSS Project Planets

Stack Abuse: Getting Started with Python PyAutoGUI

Planet Python - Tue, 2019-10-08 08:34
Introduction

In this tutorial, we're going to learn how to use pyautogui library in Python 3. The PyAutoGUI library provides cross-platform support for managing mouse and keyboard operations through code to enable automation of tasks. The pyautogui library is also available for Python 2; however, we will be using Python 3 throughout the course of this tutorial.

A tool like this has many applications, a few of which include taking screenshots, automating GUI testing (like Selenium), automating tasks that can only be done with a GUI, etc.

Before you go ahead with this tutorial, please note that there are a few prerequisites. You should have a basic understanding of Python's syntax, and/or have done at least beginner level programming in some other language. Other than that, the tutorial is quite simple and easy to follow for beginners.

Installation

The installation process for PyAutoGUI is fairly simple for all Operating Systems. However, there are a few dependencies for Mac and Linux that need to be installed before the PyAutoGUI library can be installed and used in programs.

Windows

For Windows, PyAutoGUI has no dependencies. Simply run the following command in your command prompt and the installation will be done.

$ pip install PyAutoGUI Mac

For Mac, pyobjc-core and pyobjc modules are needed to be installed in sequence first. Below are the commands that you need to run in sequence in your terminal for successful installation:

$ pip3 install pyobjc-core $ pip3 install pyobjc $ pip3 install pyautogui Linux

For Linux, the only dependency is python3-xlib (for Python 3). To install that, followed by pyautogui, run the two commands mentioned below in your terminal:

$ pip3 install python3-xlib $ pip3 install pyautogui Basic Code Examples

In this section, we are going to cover some of the most commonly used functions from the PyAutoGUI library.

Generic Functions The position() Function

Before we can use PyAutoGUI functions, we need to import it into our program:

import pyautogui as pag

This position() function tells us the current position of the mouse on our screen:

pag.position()

Output:

Point (x = 643, y = 329) The onScreen() Function

The onScreen() function tells us whether the point with coordinates x and y exists on the screen:

print(pag.onScreen(500, 600)) print(pag.onScreen(0, 10000))

Output:

True False

Here we can see that the first point exists on the screen, but the second point falls beyond the screen's dimensions.

The size() Function

The size() function finds the height and width (resolution) of a screen.

pag.size()

Output:

Size (width = 1440, height = 900)

Your output may be different and will depend on your screen's size.

Common Mouse Operations

In this section, we are going to cover PyAutoGUI functions for mouse manipulation, which includes both moving the position of the cursor as well as clicking buttons automatically through code.

The moveTo() Function

The syntax of the moveTo() function is as follows:

pag.moveTo(x_coordinate, y_coordinate)

The value of x_coordinate increases from left to right on the screen, and the value of y_coordinate increases from top to bottom. The value of both x_coordinate and y_coordinate at the top left corner of the screen is 0.

Look at the following script:

pag.moveTo(0, 0) pag.PAUSE = 2 pag.moveTo(100, 500) # pag.PAUSE = 2 pag.moveTo(500, 500)

In the code above, the main focus is the moveTo() function that moves the mouse cursor on the screen based on the coordinates we provide as parameters. The first parameter is the x-coordinate and the second parameter is the y-coordinate. It is important to note that these coordinates represent the absolute position of the cursor.

One more thing that has been introduced in the code above is the PAUSE property; it basically pauses the execution of the script for the given amount of time. The PAUSE property has been added in the above code so that you can see the function execution; otherwise, the functions would execute in a split second and you wont be able to actually see the cursor moving from one location to the other on the screen.

Another workaround for this would be to indicate the time for each moveTo() operation as the third parameter in the function, e.g. moveTo(x, y, time_in_seconds).

Executing the above script may result in the following error:

Note: Possible Error

Traceback (most recent call last): File "a.py", line 5, in <module> pag.moveTo (100, 500) File "/anaconda3/lib/python3.6/site-packages/pyautogui/__init__.py", line 811, in moveTo _failSafeCheck() File "/anaconda3/lib/python3.6/site-packages/pyautogui/__init__.py", line 1241, in _failSafeCheck raise FailSafeException ('PyAutoGUI fail-safe triggered from mouse moving to a corner of the screen. To disable this fail-safe, set pyautogui.FAILSAFE to False. DISABLING FAIL-SAFE IS NOT RECOMMENDED.') pyautogui.FailSafeException: PyAutoGUI fail-safe triggered from mouse moving to a corner of the screen. To disable this fail-safe, set pyautogui.FAILSAFE to False. DISABLING FAIL-SAFE IS NOT RECOMMENDED.

If the execution of the moveTo() function generates an error similar to the one shown above, it means that your computer's fail-safe is enabled. To disable the fail-safe, add the following line at the start of your code:

pag.FAILSAFE = False

This feature is enabled by default so that you can easily stop execution of your pyautogui program by manually moving the mouse to the upper left corner of the screen. Once the mouse is in this location, pyautogui will throw an exception and exit.

The moveRel() Function

The coordinates of the moveTo() function are absolute. However, if you want to move the mouse position relative to the current mouse position, you can use the moveRel() function.

What this means is that the reference point for this function, when moving the cursor, would not be the top left point on the screen (0, 0), but the current position of the mouse cursor. So, if your mouse cursor is currently at point (100, 100) on the screen and you call the moveRel() function with the parameters (100, 100, 2) the new position of your move cursor would be (200, 200).

You can use the moveRel() function as shown below:

pag.moveRel(100, 100, 2)

The above script will move the cursor 100 points to the right and 100 points down in 2 seconds, with respect to the current cursor position.

The click() Function

The click() function is used to imitate mouse click operations. The syntax for the click() function is as follows:

pag.click(x, y, clicks, interval, button)

The parameters are explained as follows:

  • x: the x-coordinate of the point to reach
  • y: the y-coordinate of the point to reach
  • clicks: the number of clicks that you would like to do when the cursor gets to that point on screen
  • interval: the amount of time in seconds between each mouse click i.e. if you are doing multiple mouse clicks
  • button: specify which button on the mouse you would like to press when the cursor gets to that point on screen. The possible values are right, left, and middle.

Here is an example:

pag.click(100, 100, 5, 2, 'right')

You can also execute specific click functions as follows:

pag.rightClick(x, y) pag.doubleClick(x, y) pag.tripleClick(x, y) pag.middleClick(x, y)

Here the x and y represent the x and y coordinates, just like in the previous functions.

You can also have more fine-grained control over mouse clicks by specifying when to press the mouse down, and when to release it up. This is done using the mouseDown and mouseUp functions, respectively.

Here is a short example:

pag.mouseDown(x=x, y=y, button='left') pag.mouseUp(x=x, y=y, button='left')

The above code is equivalent to just doing a pag.click(x, y) call.

The scroll() Function

The last mouse function we are going to cover is scroll. As expected, it has two options: scroll up and scroll down. The syntax for the scroll() function is as follows:

pag.scroll(amount_to_scroll, x=x_movement, y=y_movement)

To scroll up, specify a positive value for amount_to_scroll parameter, and to scroll down, specify a negative value. Here is an example:

pag.scroll(100, 120, 120)

Alright, this was it for the mouse functions. By now, you should be able to control your mouse's buttons as well as movements through code. Let's now move to keyboard functions. There are plenty, but we will cover only those that are most frequently used.

Common Keyboard Operations

Before we move to the functions, it is important that we know which keys can be pressed through code in pyautogui, as well as their exact naming convention. To do so, run the following script:

print(pag.KEYBOARD_KEYS)

Output:

['\t', '\n', '\r', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~', 'accept', 'add', 'alt', 'altleft', 'altright', 'apps', 'backspace', 'browserback', 'browserfavorites', 'browserforward', 'browserhome', 'browserrefresh', 'browsersearch', 'browserstop', 'capslock', 'clear', 'convert', 'ctrl', 'ctrlleft', 'ctrlright', 'decimal', 'del', 'delete', 'divide', 'down', 'end', 'enter', 'esc', 'escape', 'execute', 'f1', 'f10', 'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f2', 'f20', 'f21', 'f22', 'f23', 'f24', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'final', 'fn', 'hanguel', 'hangul', 'hanja', 'help', 'home', 'insert', 'junja', 'kana', 'kanji', 'launchapp1', 'launchapp2', 'launchmail', 'launchmediaselect', 'left', 'modechange', 'multiply', 'nexttrack', 'nonconvert', 'num0', 'num1', 'num2', 'num3', 'num4', 'num5', 'num6', 'num7', 'num8', 'num9', 'numlock', 'pagedown', 'pageup', 'pause', 'pgdn', 'pgup', 'playpause', 'prevtrack', 'print', 'printscreen', 'prntscrn', 'prtsc', 'prtscr', 'return', 'right', 'scrolllock', 'select', 'separator', 'shift', 'shiftleft', 'shiftright', 'sleep', 'space', 'stop', 'subtract', 'tab', 'up', 'volumedown', 'volumemute', 'volumeup', 'win', 'winleft', 'winright', 'yen', 'command', 'option', 'optionleft', 'optionright'] The typewrite() Function

The typewrite() function is used to type something in a text field. Syntax for the function is as follows:

pag.typewrite(text, interval)

Here text is what you wish to type in the field and interval is time in seconds between each key stroke. Here is an example:

pag.typewrite('Junaid Khalid', 1)

Executing the script above will enter the text "Junaid Khalid" in the field that is currently selected with a pause of 1 second between each key press.

Another way this function can be used is by passing in a list of keys that you'd like to press in a sequence. To do that through code, see the example below:

pag.typewrite(['j', 'u', 'n', 'a', 'i', 'd', 'e', 'backspace', 'enter'])

In the above example, the text junaide would be entered, followed by the removal of the trailing e. The input in the text field will be submitted by pressing the Enter key.

The hotkey() Function

If you haven't noticed this so far, the keys we've shown above have no mention for combined operations like Control + C for the copy command. In case you're thinking you could do that by passing the list ['ctrl', 'c'] to the typewrite() function, you are wrong. The typewrite() function would press both those buttons in a sequence, not simultaneously. And as you probably already know, to execute the copy command, you need to press the C key while holding the ctrl key.

To press two or more keys simultaneously, you can use the hotkey() function, as shown here:

pag.hotkey('shift', 'enter') pag.hotkey('ctrl', '2' ) # For the @ symbol pag.hotkey('ctrl', 'c') # For the copy command The screenshot() Function

If you would like to take a screenshot of the screen at any instance, the screenshot() function is the one you are looking for. Let's see how we can implement that using PyAutoGUI:

scree_shot = pag.screenshot() # to store a PIL object containing the image in a variable

This will store a PIL object containing the image in a variable.

If, however, you want to store the screenshot directly to your computer, you can call the screenshot function like this instead:

pag.screenshot('ss.png')

This will save the screenshot in a file, with the filename given, on your computer.

The confirm(), alert(), and prompt() Functions

The last set of functions that we are going to cover in this tutorial are the message box functions. Here is a list of the message box functions available in PyAutoGUI:

  1. Confirmation Box: Displays information and gives you two options i.e. OK and Cancel
  2. Alert Box: Displays some information and to acknowledge that you have read it. It displays a single button i.e. OK
  3. Prompt Box: Requests some information from the user, and upon entering, the user has to click the OK button

Now that we have seen the types, let's see how we can display these buttons on the screen in the same sequence as above:

pag.confirm("Are you ready?") pag.alert("The program has crashed!") pag.prompt("Please enter your name: ")

In the output, you will see the following sequence of message boxes.

Confirm:

Alert:

Prompt:

Conclusion

In this tutorial, we learned how to use PyAutoGUI automation library in Python. We started off by talking about pre-requisites for this tutorial, its installation process for different operating systems, followed by learning about some of its general function. After that we studied the functions specific to mouse movements, mouse control, and keyboard control.

After following this tutorial, you should be able to use PyAutoGUI to automate GUI operations for repetitive tasks in your own application.

Categories: FLOSS Project Planets

Qt Creator 4.10.1 released

Planet KDE - Tue, 2019-10-08 08:32

We are happy to announce the release of Qt Creator 4.10.1 !

Categories: FLOSS Project Planets

Consensus Enterprises: Drupal 8 hook_update() Tricks

Planet Drupal - Tue, 2019-10-08 07:00
In Drupal 7, hook_update()/hook_install() were well-established mechanisms for manipulating the database when installing a new site or updating an existing one. Most of these routines ended up directly running SQL against the database, where all kinds of state, configuration, and content data lived. This worked reasonably well if you were careful and had a good knowledge of how the database schema fit together, but things tended to get complicated.
Categories: FLOSS Project Planets

Pages