Feeds

Neil Williams: New directions

Planet Debian - Mon, 2019-05-20 10:15

It's been a difficult time, the last few months, but I've finally got some short updates.

First, in two short weeks I will be gainfully employed again at (UltraSoc) as Senior Software Tester, developing test framework solutions for SoC debugging, including on RISC-V. Despite vast numbers of discussions with a long list of recruitment agences, success came from a face to face encounter at a local Job Fair. Many thanks to Cambridge Network for hosting the event.

Second, I've finally accepted that https://www.codehelp.co.uk was too old to retain and I'm simply redirecting the index page to this blog. The old codehelp site hasn't kept up with new technology and the CSS handles modern screen resolutions particularly badly. I don't expect that many people were finding the PHP and XML content useful, let alone the now redundant WML content. In time, I'll add redirects to the other codehelp.co.uk pages.

Third, my job hunting has shown that the centralisation of decentralised version control is still a thing. As far as recruitment is concerned, if the code isn't visible on GitHub, it doesn't exist. (It's not the recruitment agencies asking for GitHub links, it is the company HR departments themselves.) So I had to add a bunch of projects to GitHub and there's a link now in the blog.

Time to pick up some Debian work again, well after I pay a visit or two to the Cambridge Beer Festival 2019, of course.

Categories: FLOSS Project Planets

Stack Abuse: Overview of Async IO in Python 3.7

Planet Python - Mon, 2019-05-20 10:10

Python 3's asyncio module provides fundamental tools for implementing asynchronous I/O in Python. It was introduced in Python 3.4, and with each subsequent minor release, the module has evolved significantly.

This tutorial contains a general overview of the asynchronous paradigm, and how it's implemented in Python 3.7.

Blocking vs Non-Blocking I/O

The problem that asynchrony seeks to resolve is blocking I/O.

By default, when your program accesses data from an I/O source, it waits for that operation to complete before continuing to execute the program.

with open('myfile.txt', 'r') as file: data = file.read() # Until the data is read into memory, the program waits here print(data)

The program is blocked from continuing its flow of execution while a physical device is accessed, and data is transferred.

Network operations are another common source of blocking:

# pip install --user requests import requests req = requests.get('https://www.stackabuse.com/') # # Blocking occurs here, waiting for completion of an HTTPS request # print(req.text)

In many cases, the delay caused by blocking is negligible. However, blocking I/O scales very poorly. If you need to wait for 1010 file reads or network transactions, performance will suffer.

Multiprocessing, Threading, and Asynchrony

Strategies for minimizing the delays of blocking I/O fall into three major categories: multiprocessing, threading, and asynchrony.

Multiprocessing

Multiprocessing is a form of parallel computing: instructions are executed in an overlapping time frame on multiple physical processors or cores. Each process spawned by the kernel incurs an overhead cost, including an independently-allocated chunk of memory (heap).

Python implements parallelism with the multiprocessing module.

The following is an example of a Python 3 program that spawns four child processes, each of which exhibits a random, independent delay. The output shows the process ID of each child, the system time before and after each delay, and the current and peak memory allocation at each step.

from multiprocessing import Process import os, time, datetime, random, tracemalloc tracemalloc.start() children = 4 # number of child processes to spawn maxdelay = 6 # maximum delay in seconds def status(): return ('Time: ' + str(datetime.datetime.now().time()) + '\t Malloc, Peak: ' + str(tracemalloc.get_traced_memory())) def child(num): delay = random.randrange(maxdelay) print(f"{status()}\t\tProcess {num}, PID: {os.getpid()}, Delay: {delay} seconds...") time.sleep(delay) print(f"{status()}\t\tProcess {num}: Done.") if __name__ == '__main__': print(f"Parent PID: {os.getpid()}") for i in range(children): proc = Process(target=child, args=(i,)) proc.start()

Output:

Parent PID: 16048 Time: 09:52:47.014906 Malloc, Peak: (228400, 240036) Process 0, PID: 16051, Delay: 1 seconds... Time: 09:52:47.016517 Malloc, Peak: (231240, 240036) Process 1, PID: 16052, Delay: 4 seconds... Time: 09:52:47.018786 Malloc, Peak: (231616, 240036) Process 2, PID: 16053, Delay: 3 seconds... Time: 09:52:47.019398 Malloc, Peak: (232264, 240036) Process 3, PID: 16054, Delay: 2 seconds... Time: 09:52:48.017104 Malloc, Peak: (228434, 240036) Process 0: Done. Time: 09:52:49.021636 Malloc, Peak: (232298, 240036) Process 3: Done. Time: 09:52:50.022087 Malloc, Peak: (231650, 240036) Process 2: Done. Time: 09:52:51.020856 Malloc, Peak: (231274, 240036) Process 1: Done. Threading

Threading is an alternative to multiprocessing, with benefits and downsides.

Threads are independently scheduled, and their execution may occur within an overlapping time period. Unlike multiprocessing, however, threads exist entirely in a single kernel process, and share a single allocated heap.

Python threads are concurrent — multiple sequences of machine code are executed in overlapping time frames. But they are not parallel — execution does not occur simultaneously on multiple physical cores.

The primary downsides to Python threading are memory safety and race conditions. All child threads of a parent process operate in the same shared memory space. Without additional protections, one thread may overwrite a shared value in memory without other threads being aware of it. Such data corruption would be disastrous.

To enforce thread safety, CPython implementations use a global interpreter lock (GIL). The GIL is a mutex mechanism that prevents multiple threads from executing simultaneously on Python objects. Effectively, this means that only one thread runs at any given time.

Here's the threaded version of the multiprocessing example from the previous section. Notice that very little has changed: multiprocessing.Process is replaced with threading.Thread. As indicated in the output, everything happens in a single process, and the memory footprint is significantly smaller.

from threading import Thread import os, time, datetime, random, tracemalloc tracemalloc.start() children = 4 # number of child threads to spawn maxdelay = 6 # maximum delay in seconds def status(): return ('Time: ' + str(datetime.datetime.now().time()) + '\t Malloc, Peak: ' + str(tracemalloc.get_traced_memory())) def child(num): delay = random.randrange(maxdelay) print(f"{status()}\t\tProcess {num}, PID: {os.getpid()}, Delay: {delay} seconds...") time.sleep(delay) print(f"{status()}\t\tProcess {num}: Done.") if __name__ == '__main__': print(f"Parent PID: {os.getpid()}") for i in range(children): thr = Thread(target=child, args=(i,)) thr.start()

Output:

Parent PID: 19770 Time: 10:44:40.942558 Malloc, Peak: (9150, 9264) Process 0, PID: 19770, Delay: 3 seconds... Time: 10:44:40.942937 Malloc, Peak: (13989, 14103) Process 1, PID: 19770, Delay: 5 seconds... Time: 10:44:40.943298 Malloc, Peak: (18734, 18848) Process 2, PID: 19770, Delay: 3 seconds... Time: 10:44:40.943746 Malloc, Peak: (23959, 24073) Process 3, PID: 19770, Delay: 2 seconds... Time: 10:44:42.945896 Malloc, Peak: (26599, 26713) Process 3: Done. Time: 10:44:43.945739 Malloc, Peak: (26741, 27223) Process 0: Done. Time: 10:44:43.945942 Malloc, Peak: (26851, 27333) Process 2: Done. Time: 10:44:45.948107 Malloc, Peak: (24639, 27475) Process 1: Done. Asynchrony

Asynchrony is an alternative to threading for writing concurrent applications. Asynchronous events occur on independent schedules, "out of sync" with one another, entirely within a single thread.

Unlike threading, in asynchronous programs the programmer controls when and how voluntary preemption occurs, making it easier to isolate and avoid race conditions.

Introduction to the Python 3.7 asyncio Module

In Python 3.7, asynchronous operations are provided by the asyncio module.

High-Level vs Low-Level asyncio API

Asyncio components are divided into high-level APIs (for writing programs), and low-level APIs (for writing libraries or frameworks based on asyncio).

Every asyncio program can be written using only the high-level APIs. If you're not writing a framework or library, you never need to touch the low-level stuff.

With that said, let's look at the core high-level APIs, and discuss the core concepts.

Coroutines

In general, a coroutine (short for cooperative subroutine) is a function designed for voluntary preemptive multitasking: it proactively yields to other routines and processes, rather than being forcefully preempted by the kernel. The term "coroutine" was coined in 1958 by Melvin Conway (of "Conway's Law" fame), to describe code that actively facilitates the needs of other parts of a system.

In asyncio, this voluntary preemption is called awaiting.

Awaitables, Async, and Await

Any object which can be awaited (voluntarily preempted by a coroutine) is called an awaitable.

The await keyword suspends the execution of the current coroutine, and calls the specified awaitable.

In Python 3.7, the three awaitable objects are coroutine, task, and future.

An asyncio coroutine is any Python function whose definition is prefixed with the async keyword.

async def my_coro(): pass

An asyncio task is an object that wraps a coroutine, providing methods to control its execution, and query its status. A task may be created with asyncio.create_task(), or asyncio.gather().

An asyncio future is a low-level object that acts as a placeholder for data that hasn't yet been calculated or fetched. It can provide an empty structure to be filled with data later, and a callback mechanism that is triggered when the data is ready.

A task inherits all but two of the methods available to a future, so in Python 3.7 you never need to create a future object directly.

Event Loops

In asyncio, an event loop controls the scheduling and communication of awaitable objects. An event loop is required to use awaitables. Every asyncio program has at least one event loop. It's possible to have multiple event loops, but multiple event loops are strongly discouraged in Python 3.7.

A reference to the currently-running loop object is obtained by calling asyncio.get_running_loop().

Sleeping

The asyncio.sleep(delay) coroutine blocks for delay seconds. It's useful for simulating blocking I/O.

import asyncio async def main(): print("Sleep now.") await asyncio.sleep(1.5) print("OK, wake up!") asyncio.run(main()) Initiating the Main Event Loop

The canonical entrance point to an asyncio program is asyncio.run(main()), where main() is a top-level coroutine.

import asyncio async def my_coro(arg): "A coroutine." print(arg) async def main(): "The top-level coroutine." await my_coro(42) asyncio.run(main())

Calling asyncio.run() implicitly creates and runs an event loop. The loop object has many useful methods, including loop.time(), which returns a float representing the current time, as measured by the loop's internal clock.

Note: The asyncio.run() function cannot be called from within an existing event loop. Therefore, it is possible that you see errors if you're running the program within a supervising environment, such as Anaconda or Jupyter, which is running an event loop of its own. The example programs in this section and the following sections should be run directly from the command line by executing the python file.

The following program prints lines of text, blocking for one second after each line until the last.

import asyncio async def my_coro(delay): loop = asyncio.get_running_loop() end_time = loop.time() + delay while True: print("Blocking...") await asyncio.sleep(1) if loop.time() > end_time: print("Done.") break async def main(): await my_coro(3.0) asyncio.run(main())

Output:

Blocking... Blocking... Blocking... Done. Tasks

A task is an awaitable object that wraps a coroutine. To create and immediately schedule a task, you can call the following:

asyncio.create_task(coro(args...))

This will return a task object. Creating a task tells the loop, "go ahead and run this coroutine as soon as you can."

If you await a task, execution of the current coroutine is blocked until that task is complete.

import asyncio async def my_coro(n): print(f"The answer is {n}.") async def main(): # By creating the task, it's scheduled to run # concurrently, at the event loop's discretion. mytask = asyncio.create_task(my_coro(42)) # If we later await the task, execution stops there # until the task is complete. If the task is already # complete before it is awaited, nothing is awaited. await mytask asyncio.run(main())

Output:

The answer is 42.

Tasks have several useful methods for managing the wrapped coroutine. Notably, you can request that a task be canceled by calling the task's .cancel() method. The task will be scheduled for cancellation on the next cycle of the event loop. Cancellation is not guaranteed: the task may complete before that cycle, in which case the cancellation does not occur.

Gathering Awaitables

Awaitables can be gathered as a group, by providing them as a list argument to the built-in coroutine asyncio.gather(awaitables).

The asyncio.gather() returns an awaitable representing the gathered awaitables, and therefore must be prefixed with await.

If any element of awaitables is a coroutine, it is immediately scheduled as a task.

Gathering is a convenient way to schedule multiple coroutines to run concurrently as tasks. It also associates the gathered tasks in some useful ways:

  • When all gathered tasks are complete, their aggregate return values are returned as a list, ordered in accordance with the awaitables list order.
  • Any gathered task may be canceled, without canceling the other tasks.
  • The gather itself can be cancelled, cancelling all tasks.
Example: Async Web Requests with aiohttp

The following example illustrates how these high-level asyncio APIs can be implemented. The following is a modified version, updated for Python 3.7, of Scott Robinson's nifty asyncio example. His program leverages the aiohttp module to grab the top posts on Reddit, and output them to the console.

Make sure that you have aiohttp module installed before you run the script below. You can download the module via the following pip command:

$ pip install --user aiohttp import sys import asyncio import aiohttp import json import datetime async def get_json(client, url): async with client.get(url) as response: assert response.status == 200 return await response.read() async def get_reddit_top(subreddit, client, numposts): data = await get_json(client, 'https://www.reddit.com/r/' + subreddit + '/top.json?sort=top&t=day&limit=' + str(numposts)) print(f'\n/r/{subreddit}:') j = json.loads(data.decode('utf-8')) for i in j['data']['children']: score = i['data']['score'] title = i['data']['title'] link = i['data']['url'] print('\t' + str(score) + ': ' + title + '\n\t\t(' + link + ')') async def main(): print(datetime.datetime.now().strftime("%A, %B %d, %I:%M %p")) print('---------------------------') loop = asyncio.get_running_loop() async with aiohttp.ClientSession(loop=loop) as client: await asyncio.gather( get_reddit_top('python', client, 3), get_reddit_top('programming', client, 4), get_reddit_top('asyncio', client, 2), get_reddit_top('dailyprogrammer', client, 1) ) asyncio.run(main())

If you run the program multiple times, you'll see that the order of the output changes. That's because the JSON requests are displayed as they're received, which is dependent on the server's response time, and the intermediate network latency. On a Linux system, you can observe this in action by running the script prefixed with (e.g.) watch -n 5, which will refresh the output every 5 seconds:

Other High-level APIs

Hopefully, this overview gives you a solid foundation of how, when, and why to use asyncio. Other high-level asyncio APIs, not covered here, include:

  • stream, a set of high-level networking primitives for managing asynchronous TCP events.
  • lock, event, condition, async analogs of the synchronization primitives provided in the threading module.
  • subprocess, a set of tools for running async subprocesses, such as shell commands.
  • queue, an asynchronous analog of the queue module.
  • exception, for handling exceptions in async code.
Conclusion

Keep in mind that even if your program doesn't require asynchrony for performance reasons, you can still use asyncio if you prefer writing within the asynchronous paradigm. I hope this overview gives you a solid understanding of how, when, and why to begin using use asyncio.

Categories: FLOSS Project Planets

Real Python: Unicode & Character Encodings in Python: A Painless Guide

Planet Python - Mon, 2019-05-20 10:00

Handling character encodings in Python or any other language can at times seem painful. Places such as Stack Overflow have thousands of questions stemming from confusion over exceptions like UnicodeDecodeError and UnicodeEncodeError. This tutorial is designed to clear the Exception fog and illustrate that working with text and binary data in Python 3 can be a smooth experience. Python’s Unicode support is strong and robust, but it takes some time to master.

This tutorial is different because it’s not language-agnostic but instead deliberately Python-centric. You’ll still get a language-agnostic primer, but you’ll then dive into illustrations in Python, with text-heavy paragraphs kept to a minimum. You’ll see how to use concepts of character encodings in live Python code.

By the end of this tutorial, you’ll:

  • Get conceptual overviews on character encodings and numbering systems
  • Understand how encoding comes into play with Python’s str and bytes
  • Know about support in Python for numbering systems through its various forms of int literals
  • Be familiar with Python’s built-in functions related to character encodings and numbering systems

Character encoding and numbering systems are so closely connected that they need to be covered in the same tutorial or else the treatment of either would be totally inadequate.

Note: This article is Python 3-centric. Specifically, all code examples in this tutorial were generated from a CPython 3.7.2 shell, although all minor versions of Python 3 should behave (mostly) the same in their treatment of text.

If you’re still using Python 2 and are intimidated by the differences in how Python 2 and Python 3 treat text and binary data, then hopefully this tutorial will help you make the switch.

Free Bonus: Click here to get access to a chapter from Python Tricks: The Book that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

What’s a Character Encoding?

There are tens if not hundreds of character encodings. The best way to start understanding what they are is to cover one of the simplest character encodings, ASCII.

Whether you’re self-taught or have a formal computer science background, chances are you’ve seen an ASCII table once or twice. ASCII is a good place to start learning about character encoding because it is a small and contained encoding. (Too small, as it turns out.)

It encompasses the following:

  • Lowercase English letters: a through z
  • Uppercase English letters: A through Z
  • Some punctuation and symbols: "$" and "!", to name a couple
  • Whitespace characters: an actual space (" "), as well as a newline, carriage return, horizontal tab, vertical tab, and a few others
  • Some non-printable characters: characters such as backspace, "\b", that can’t be printed literally in the way that the letter A can

So what is a more formal definition of a character encoding?

At a very high level, it’s a way of translating characters (such as letters, punctuation, symbols, whitespace, and control characters) to integers and ultimately to bits. Each character can be encoded to a unique sequence of bits. Don’t worry if you’re shaky on the concept of bits, because we’ll get to them shortly.

The various categories outlined represent groups of characters. Each single character has a corresponding code point, which you can think of as just an integer. Characters are segmented into different ranges within the ASCII table:

Code Point Range Class 0 through 31 Control/non-printable characters 32 through 64 Punctuation, symbols, numbers, and space 65 through 90 Uppercase English alphabet letters 91 through 96 Additional graphemes, such as [ and \ 97 through 122 Lowercase English alphabet letters 123 through 126 Additional graphemes, such as { and | 127 Control/non-printable character (DEL)

The entire ASCII table contains 128 characters. This table captures the complete character set that ASCII permits. If you don’t see a character here, then you simply can’t express it as printed text under the ASCII encoding scheme.

ASCII Table Show/Hide

Code Point Character (Name) Code Point Character (Name) 0 NUL (Null) 64 @ 1 SOH (Start of Heading) 65 A 2 STX (Start of Text) 66 B 3 ETX (End of Text) 67 C 4 EOT (End of Transmission) 68 D 5 ENQ (Enquiry) 69 E 6 ACK (Acknowledgment) 70 F 7 BEL (Bell) 71 G 8 BS (Backspace) 72 H 9 HT (Horizontal Tab) 73 I 10 LF (Line Feed) 74 J 11 VT (Vertical Tab) 75 K 12 FF (Form Feed) 76 L 13 CR (Carriage Return) 77 M 14 SO (Shift Out) 78 N 15 SI (Shift In) 79 O 16 DLE (Data Link Escape) 80 P 17 DC1 (Device Control 1) 81 Q 18 DC2 (Device Control 2) 82 R 19 DC3 (Device Control 3) 83 S 20 DC4 (Device Control 4) 84 T 21 NAK (Negative Acknowledgment) 85 U 22 SYN (Synchronous Idle) 86 V 23 ETB (End of Transmission Block) 87 W 24 CAN (Cancel) 88 X 25 EM (End of Medium) 89 Y 26 SUB (Substitute) 90 Z 27 ESC (Escape) 91 [ 28 FS (File Separator) 92 \ 29 GS (Group Separator) 93 ] 30 RS (Record Separator) 94 ^ 31 US (Unit Separator) 95 _ 32 SP (Space) 96 ` 33 ! 97 a 34 " 98 b 35 # 99 c 36 $ 100 d 37 % 101 e 38 & 102 f 39 ' 103 g 40 ( 104 h 41 ) 105 i 42 * 106 j 43 + 107 k 44 , 108 l 45 - 109 m 46 . 110 n 47 / 111 o 48 0 112 p 49 1 113 q 50 2 114 r 51 3 115 s 52 4 116 t 53 5 117 u 54 6 118 v 55 7 119 w 56 8 120 x 57 9 121 y 58 : 122 z 59 ; 123 { 60 < 124 | 61 = 125 } 62 > 126 ~ 63 ? 127 DEL (delete) The string Module

Python’s string module is a convenient one-stop-shop for string constants that fall in ASCII’s character set.

Here’s the core of the module in all its glory:

# From lib/python3.7/string.py whitespace = ' \t\n\r\v\f' ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz' ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' ascii_letters = ascii_lowercase + ascii_uppercase digits = '0123456789' hexdigits = digits + 'abcdef' + 'ABCDEF' octdigits = '01234567' punctuation = r"""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~""" printable = digits + ascii_letters + punctuation + whitespace

Most of these constants should be self-documenting in their identifier name. We’ll cover what hexdigits and octdigits are shortly.

You can use these constants for everyday string manipulation:

>>>>>> import string >>> s = "What's wrong with ASCII?!?!?" >>> s.rstrip(string.punctuation) 'What's wrong with ASCII'

Note: string.printable includes all of string.whitespace. This disagrees slightly with another method for testing whether a character is considered printable, namely str.isprintable(), which will tell you that none of {'\v', '\n', '\r', '\f', '\t'} are considered printable.

The subtle difference is because of definition: str.isprintable() considers something printable if “all of its characters are considered printable in repr().”

A Bit of a Refresher

Now is a good time for a short refresher on the bit, the most fundamental unit of information that a computer knows.

A bit is a signal that has only two possible states. There are different ways of symbolically representing a bit that all mean the same thing:

  • 0 or 1
  • “yes” or “no”
  • True or False
  • “on” or “off”

Our ASCII table from the previous section uses what you and I would just call numbers (0 through 127), but what are more precisely called numbers in base 10 (decimal).

You can also express each of these base-10 numbers with a sequence of bits (base 2). Here are the binary versions of 0 through 10 in decimal:

Decimal Binary (Compact) Binary (Padded Form) 0 0 00000000 1 1 00000001 2 10 00000010 3 11 00000011 4 100 00000100 5 101 00000101 6 110 00000110 7 111 00000111 8 1000 00001000 9 1001 00001001 10 1010 00001010

Notice that as the decimal number n increases, you need more significant bits to represent the character set up to and including that number.

Here’s a handy way to represent ASCII strings as sequences of bits in Python. Each character from the ASCII string gets pseudo-encoded into 8 bits, with spaces in between the 8-bit sequences that each represent a single character:

>>>>>> def make_bitseq(s: str) -> str: ... if not s.isascii(): ... raise ValueError("ASCII only allowed") ... return " ".join(f"{ord(i):08b}" for i in s) >>> make_bitseq("bits") '01100010 01101001 01110100 01110011' >>> make_bitseq("CAPS") '01000011 01000001 01010000 01010011' >>> make_bitseq("$25.43") '00100100 00110010 00110101 00101110 00110100 00110011' >>> make_bitseq("~5") '01111110 00110101'

Note: .isascii() was introduced in Python 3.7.

The f-string f"{ord(i):08b}" uses Python’s Format Specification Mini-Language, which is a way of specifying formatting for replacement fields in format strings:

  • The left side of the colon, ord(i), is the actual object whose value will be formatted and inserted into the output. Using ord() gives you the base-10 code point for a single str character.

  • The right hand side of the colon is the format specifier. 08 means width 8, 0 padded, and the b functions as a sign to output the resulting number in base 2 (binary).

This trick is mainly just for fun, and it will fail very badly for any character that you don’t see present in the ASCII table. We’ll discuss how other encodings fix this problem later on.

We Need More Bits!

There’s a critically important formula that’s related to the definition of a bit. Given a number of bits, n, the number of distinct possible values that can be represented in n bits is 2n:

def n_possible_values(nbits: int) -> int: return 2 ** nbits

Here’s what that means:

  • 1 bit will let you express 21 == 2 possible values.
  • 8 bits will let you express 28 == 256 possible values.
  • 64 bits will let you express 264 == 18,446,744,073,709,551,616 possible values.

There’s a corollary to this formula: given a range of distinct possible values, how can we find the number of bits, n, that is required for the range to be fully represented? What you’re trying to solve for is n in the equation 2n = x (where you already know x).

Here’s what that works out to:

>>>>>> from math import ceil, log >>> def n_bits_required(nvalues: int) -> int: ... return ceil(log(nvalues) / log(2)) >>> n_bits_required(256) 8

The reason that you need to use a ceiling in n_bits_required() is to account for values that are not clean powers of 2. Say you need to store a character set of 110 characters total. Naively, this should take log(110) / log(2) == 6.781 bits, but there’s no such thing as 0.781 bits. 110 values will require 7 bits, not 6, with the final slots being unneeded:

>>>>>> n_bits_required(110) 7

All of this serves to to prove one concept: ASCII is, strictly speaking, a 7-bit code. The ASCII table that you saw above contains 128 code points and characters, 0 through 127 inclusive. This requires 7 bits:

>>>>>> n_bits_required(128) # 0 through 127 7 >>> n_possible_values(7) 128

The issue with this is that modern computers don’t store much of anything in 7-bit slots. They traffic in units of 8 bits, conventionally known as a byte.

Note: Throughout this tutorial, I assume that a byte refers to 8 bits, as it has since the 1960s, rather than some other unit of storage. You are free to call this an octet if you prefer.

This means that the storage space used by ASCII is half-empty. If it’s not clear why this is, think back to the decimal-to-binary table from above. You can express the numbers 0 and 1 with just 1 bit, or you can use 8 bits to express them as 00000000 and 00000001, respectively.

You can express the numbers 0 through 3 with just 2 bits, or 00 through 11, or you can use 8 bits to express them as 00000000, 00000001, 00000010, and 00000011, respectively. The highest ASCII code point, 127, requires only 7 significant bits.

Knowing this, you can see that make_bitseq() converts ASCII strings into a str representation of bytes, where every character consumes one byte:

>>>>>> make_bitseq("bits") '01100010 01101001 01110100 01110011'

ASCII’s underutilization of the 8-bit bytes offered by modern computers led to a family of conflicting, informalized encodings that each specified additional characters to be used with the remaining 128 available code points allowed in an 8-bit character encoding scheme.

Not only did these different encodings clash with each other, but each one of them was by itself still a grossly incomplete representation of the world’s characters, regardless of the fact that they made use of one additional bit.

Over the years, one character encoding mega-scheme came to rule them all. However, before we get there, let’s talk for a minute about numbering systems, which are a fundamental underpinning of character encoding schemes.

Covering All the Bases: Other Number Systems

In the discussion of ASCII above, you saw that each character maps to an integer in the range 0 through 127.

This range of numbers is expressed in decimal (base 10). It’s the way that you, me, and the rest of us humans are used to counting, for no reason more complicated than that we have 10 fingers.

But there are other numbering systems as well that are especially prevalent throughout the CPython source code. While the “underlying number” is the same, all numbering systems are just different ways of expressing the same number.

If I asked you what number the string "11" represents, you’d be right to give me a strange look before answering that it represents eleven.

However, this string representation can express different underlying numbers in different numbering systems. In addition to decimal, the alternatives include the following common numbering systems:

  • Binary: base 2
  • Octal: base 8
  • Hexadecimal (hex): base 16

But what does it mean for us to say that, in a certain numbering system, numbers are represented in base N?

Here is the best way that I know of to articulate what this means: it’s the number of fingers that you’d count on in that system.

If you want a much fuller but still gentle introduction to numbering systems, Charles Petzold’s Code is an incredibly cool book that explores the foundations of computer code in detail.

One way to demonstrate how different numbering systems interpret the same thing is with Python’s int() constructor. If you pass a str to int(), Python will assume by default that the string expresses a number in base 10 unless you tell it otherwise:

>>>>>> int('11') 11 >>> int('11', base=10) # 10 is already default 11 >>> int('11', base=2) # Binary 3 >>> int('11', base=8) # Octal 9 >>> int('11', base=16) # Hex 17

There’s a more common way of telling Python that your integer is typed in a base other than 10. Python accepts literal forms of each of the 3 alternative numbering systems above:

Type of Literal Prefix Example n/a n/a 11 Binary literal 0b or 0B 0b11 Octal literal 0o or 0O 0o11 Hex literal 0x or 0X 0x11

All of these are sub-forms of integer literals. You can see that these produce the same results, respectively, as the calls to int() with non-default base values. They’re all just int to Python:

>>>>>> 11 11 >>> 0b11 # Binary literal 3 >>> 0o11 # Octal literal 9 >>> 0x11 # Hex literal 17

Here’s how you could type the binary, octal, and hexadecimal equivalents of the decimal numbers 0 through 20. Any of these are perfectly valid in a Python interpreter shell or source code, and all work out to be of type int:

Decimal Binary Octal Hex 0 0b0 0o0 0x0 1 0b1 0o1 0x1 2 0b10 0o2 0x2 3 0b11 0o3 0x3 4 0b100 0o4 0x4 5 0b101 0o5 0x5 6 0b110 0o6 0x6 7 0b111 0o7 0x7 8 0b1000 0o10 0x8 9 0b1001 0o11 0x9 10 0b1010 0o12 0xa 11 0b1011 0o13 0xb 12 0b1100 0o14 0xc 13 0b1101 0o15 0xd 14 0b1110 0o16 0xe 15 0b1111 0o17 0xf 16 0b10000 0o20 0x10 17 0b10001 0o21 0x11 18 0b10010 0o22 0x12 19 0b10011 0o23 0x13 20 0b10100 0o24 0x14

Integer Literals in CPython Source Show/Hide

It’s amazing just how prevalent these expressions are in the Python Standard Library. If you want to see for yourself, navigate to wherever your lib/python3.7/ directory sits, and check out the use of hex literals like this:

$ grep -nri --include "*\.py" -e "\b0x" lib/python3.7

This should work on any Unix system that has grep. You could use "\b0o" to search for octal literals or “\b0b” to search for binary literals.

What’s the argument for using these alternate int literal syntaxes? In short, it’s because 2, 8, and 16 are all powers of 2, while 10 is not. These three alternate number systems occasionally offer a way for expressing values in a computer-friendly manner. For example, the number 65536 or 216, is just 10000 in hexadecimal, or 0x10000 as a Python hexadecimal literal.

Enter Unicode

As you saw, the problem with ASCII is that it’s not nearly a big enough set of characters to accommodate the world’s set of languages, dialects, symbols, and glyphs. (It’s not even big enough for English alone.)

Unicode fundamentally serves the same purpose as ASCII, but it just encompasses a way, way, way bigger set of code points. There are a handful of encodings that emerged chronologically between ASCII and Unicode, but they are not really worth mentioning just yet because Unicode and one of its encoding schemes, UTF-8, has become so predominantly used.

Think of Unicode as a massive version of the ASCII table—one that has 1,114,112 possible code points. That’s 0 through 1,114,111, or 0 through 17 * (216) - 1, or 0x10ffff hexadecimal. In fact, ASCII is a perfect subset of Unicode. The first 128 characters in the Unicode table correspond precisely to the ASCII characters that you’d reasonably expect them to.

In the interest of being technically exacting, Unicode itself is not an encoding. Rather, Unicode is implemented by different character encodings, which you’ll see soon. Unicode is better thought of as a map (something like a dict) or a 2-column database table. It maps characters (like "a", "¢", or even "ቈ") to distinct, positive integers. A character encoding needs to offer a bit more.

Unicode contains virtually every character that you can imagine, including additional non-printable ones too. One of my favorites is the pesky right-to-left mark, which has code point 8207 and is used in text contains with both left-to-right and right-to-left language scripts, such as an article containing both English and Arabic paragraphs.

Note: The world of character encodings is one of many fine-grained technical details over which some people love to nitpick about. One such detail is that only 1,111,998 of the Unicode code points are actually usable, due to a couple of archaic reasons.

Unicode vs UTF-8

It didn’t take long for people to realize that all of the world’s characters could not be packed into one byte each. It’s evident from this that modern, more comprehensive encodings would need to use multiple bytes to encode some characters.

You also saw above that Unicode is not technically a full-blown character encoding. Why is that?

There is one thing that Unicode doesn’t tell you: it doesn’t tell you how to get actual bits from text—just code points. It doesn’t tell you enough about how to convert text to binary data and vice versa.

Unicode is an abstract encoding standard, not an encoding. That’s where UTF-8 and other encoding schemes come into play. The Unicode standard (a map of characters to code points) defines several different encodings from its single character set.

UTF-8 as well as its lesser-used cousins, UTF-16 and UTF-32, are encoding formats for representing Unicode characters as binary data of one or more bytes per character. We’ll discuss UTF-16 and UTF-32 in a moment, but UTF-8 has taken the largest share of the pie by far.

That brings us to a definition that is long overdue. What does it mean, formally, to encode and decode?

Encoding and Decoding in Python 3

Python 3’s str type is meant to represent human-readable text and can contain any Unicode character.

The bytes type, conversely, represents binary data, or sequences of raw bytes, that do not intrinsically have an encoding attached to it.

Encoding and decoding is the process of going from one to the other:

Encoding vs decoding (Image: Real Python)

In .encode() and .decode(), the encoding parameter is "utf-8" by default, though it’s generally safer and more unambiguous to specify it:

>>>>>> "résumé".encode("utf-8") b'r\xc3\xa9sum\xc3\xa9' >>> "El Niño".encode("utf-8") b'El Ni\xc3\xb1o' >>> b"r\xc3\xa9sum\xc3\xa9".decode("utf-8") 'résumé' >>> b"El Ni\xc3\xb1o".decode("utf-8") 'El Niño'

The results of str.encode() is a bytes object. Both bytes literals (such as b"r\xc3\xa9sum\xc3\xa9") and the representations of bytes permit only ASCII characters.

This is why, when calling "El Niño".encode("utf-8"), the ASCII-compatible "El" is allowed to be represented as it is, but the n with tilde is escaped to "\xc3\xb1". That messy-looking sequence represents two bytes, 0xc3 and 0xb1 in hex:

>>>>>> " ".join(f"{i:08b}" for i in (0xc3, 0xb1)) '11000011 10110001'

That is, the character ñ requires two bytes for its binary representation under UTF-8.

Note: If you type help(str.encode), you’ll probably see a default of encoding='utf-8'. Be careful about excluding this and just using "résumé".encode(), because the default may be different in Windows prior to Python 3.6.

Python 3: All-In on Unicode

Python 3 is all-in on Unicode and UTF-8 specifically. Here’s what that means:

  • Python 3 source code is assumed to be UTF-8 by default. This means that you don’t need # -*- coding: UTF-8 -*- at the top of .py files in Python 3.

  • All text (str) is Unicode by default. Encoded Unicode text is represented as binary data (bytes). The str type can contain any literal Unicode character, such as "Δv / Δt", all of which will be stored as Unicode.

  • Anything from the Unicode character set is kosher in identifiers, meaning résumé = "~/Documents/resume.pdf" is valid if this strikes your fancy.

  • Python’s re module defaults to the re.UNICODE flag rather than re.ASCII. This means, for instance, that r"\w" matches Unicode word characters, not just ASCII letters.

  • The default encoding in str.encode() and bytes.decode() is UTF-8.

There is one other property that is more nuanced, which is that the default encoding to the built-in open() is platform-dependent and depends on the value of locale.getpreferredencoding():

>>>>>> # Mac OS X High Sierra >>> import locale >>> locale.getpreferredencoding() 'UTF-8' >>> # Windows Server 2012; other Windows builds may use UTF-16 >>> import locale >>> locale.getpreferredencoding() 'cp1252'

Again, the lesson here is to be careful about making assumptions when it comes to the universality of UTF-8, even if it is the predominant encoding. It never hurts to be explicit in your code.

One Byte, Two Bytes, Three Bytes, Four

A crucial feature is that UTF-8 is a variable-length encoding. It’s tempting to gloss over what this means, but it’s worth delving into.

Think back to the section on ASCII. Everything in extended-ASCII-land demands at most one byte of space. You can quickly prove this with the following generator expression:

>>>>>> all(len(chr(i).encode("ascii")) == 1 for i in range(128)) True

UTF-8 is quite different. A given Unicode character can occupy anywhere from one to four bytes. Here’s an example of a single Unicode character taking up four bytes:

>>>>>> ibrow = "🤨" >>> len(ibrow) 1 >>> ibrow.encode("utf-8") b'\xf0\x9f\xa4\xa8' >>> len(ibrow.encode("utf-8")) 4 >>> # Calling list() on a bytes object gives you >>> # the decimal value for each byte >>> list(b'\xf0\x9f\xa4\xa8') [240, 159, 164, 168]

This is a subtle but important feature of len():

  • The length of a single Unicode character as a Python str will always be 1, no matter how many bytes it occupies.
  • The length of the same character encoded to bytes will be anywhere between 1 and 4.

The table below summarizes what general types of characters fit into each byte-length bucket:

Decimal Range Hex Range What’s Included Examples 0 to 127 "\u0000" to "\u007F" U.S. ASCII "A", "\n", "7", "&" 128 to 2047 "\u0080" to "\u07FF" Most Latinic alphabets* "ę", "±", "ƌ", "ñ" 2048 to 65535 "\u0800" to "\uFFFF" Additional parts of the multilingual plane (BMP)** "ത", "ᄇ", "ᮈ", "‰" 65536 to 1114111 "\U00010000" to "\U0010FFFF" Other*** "𝕂", "𐀀", "😓", "🂲",

*Such as English, Arabic, Greek, and Irish
**A huge array of languages and symbols—mostly Chinese, Japanese, and Korean by volume (also ASCII and Latin alphabets)
***Additional Chinese, Japanese, Korean, and Vietnamese characters, plus more symbols and emojis

Note: In the interest of not losing sight of the big picture, there is an additional set of technical features of UTF-8 that aren’t covered here because they are rarely visible to a Python user.

For instance, UTF-8 actually uses prefix codes that indicate the number of bytes in a sequence. This enables a decoder to tell what bytes belong together in a variable-length encoding, and lets the first byte serve as an indicator of the number of bytes in the coming sequence.

Wikipedia’s UTF-8 article does not shy away from technical detail, and there is always the official Unicode Standard for your reading enjoyment as well.

What About UTF-16 and UTF-32?

Let’s get back to two other encoding variants, UTF-16 and UTF-32.

The difference between these and UTF-8 is substantial in practice. Here’s an example of how major the difference is with a round-trip conversion:

>>>>>> letters = "αβγδ" >>> rawdata = letters.encode("utf-8") >>> rawdata.decode("utf-8") 'αβγδ' >>> rawdata.decode("utf-16") # 😧 '뇎닎돎듎'

In this case, encoding four Greek letters with UTF-8 and then decoding back to text in UTF-16 would produce a text str that is in a completely different language (Korean).

Glaringly wrong results like this are possible when the same encoding isn’t used bidirectionally. Two variations of decoding the same bytes object may produce results that aren’t even in the same language.

This table summarizes the range or number of bytes under UTF-8, UTF-16, and UTF-32:

Encoding Bytes Per Character (Inclusive) Variable Length UTF-8 1 to 4 Yes UTF-16 2 to 4 Yes UTF-32 4 No

One other curious aspect of the UTF family is that UTF-8 will not always take up less space than UTF-16. That may seem mathematically counterintuitive, but it’s quite possible:

>>>>>> text = "記者 鄭啟源 羅智堅" >>> len(text.encode("utf-8")) 26 >>> len(text.encode("utf-16")) 22

The reason for this is that the code points in the range U+0800 through U+FFFF (2048 through 65535 in decimal) take up three bytes in UTF-8 versus only two in UTF-16.

I’m not by any means recommending that you jump aboard the UTF-16 train, regardless of whether or not you operate in a language whose characters are commonly in this range. Among other reasons, one of the strong arguments for using UTF-8 is that, in the world of encoding, it’s a great idea to blend in with the crowd.

Not to mention, it’s 2019: computer memory is cheap, so saving 4 bytes by going out of your way to use UTF-16 is arguably not worth it.

Python’s Built-In Functions

You’ve made it through the hard part. Time to use what you’ve seen thus far in Python.

Python has a group of built-in functions that relate in some way to numbering systems and character encoding:

These can be logically grouped together based on their purpose:

  • ascii(), bin(), hex(), and oct() are for obtaining a different representation of an input. Each one produces a str. The first, ascii(), produces an ASCII only representation of an object, with non-ASCII characters escaped. The remaining three give binary, hexadecimal, and octal representations of an integer, respectively. These are only representations, not a fundamental change in the input.

  • bytes(), str(), and int() are class constructors for their respective types, bytes, str, and int. They each offer ways of coercing the input into the desired type. For instance, as you saw earlier, while int(11.0) is probably more common, you might also see int('11', base=16).

  • ord() and chr() are inverses of each other in that ord() converts a str character to its base-10 code point, while chr() does the opposite.

Here’s a more detailed look at each of these nine functions:

Function Signature Accepts Return Type Purpose ascii() ascii(obj) Varies str ASCII only representation of an object, with non-ASCII characters escaped bin() bin(number) number: int str Binary representation of an integer, with the prefix "0b" bytes() bytes(iterable_of_ints)

bytes(s, enc[, errors])

bytes(bytes_or_buffer)

bytes([i]) Varies bytes Coerce (convert) the input to bytes, raw binary data chr() chr(i) i: int

i>=0

i<=1114111 str Convert an integer code point to a single Unicode character hex() hex(number) number: int str Hexadecimal representation of an integer, with the prefix "0x" int() int([x])

int(x, base=10) Varies int Coerce (convert) the input to int oct() oct(number) number: int str Octal representation of an integer, with the prefix "0o" ord() ord(c) c: str

len(c) == 1 int Convert a single Unicode character to its integer code point str() str(object=’‘)

str(b[, enc[, errors]]) Varies str Coerce (convert) the input to str, text

You can expand the section below to see some examples of each function.

Examples: ascii() Show/Hide

ascii() gives you an ASCII-only representation of an object, with non-ASCII characters escaped:

>>>>>> ascii("abcdefg") "'abcdefg'" >>> ascii("jalepeño") "'jalepe\\xf1o'" >>> ascii((1, 2, 3)) '(1, 2, 3)' >>> ascii(0xc0ffee) # Hex literal (int) '12648430'

Examples: bin() Show/Hide

bin() gives you a binary representation of an integer, with the prefix "0b":

>>>>>> bin(0) '0b0' >>> bin(400) '0b110010000' >>> bin(0xc0ffee) # Hex literal (int) '0b110000001111111111101110' >>> [bin(i) for i in [1, 2, 4, 8, 16]] # `int` + list comprehension ['0b1', '0b10', '0b100', '0b1000', '0b10000']

Examples: bytes() Show/Hide

bytes() coerces the input to bytes, representing raw binary data:

>>>>>> # Iterable of ints >>> bytes((104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100)) b'hello world' >>> bytes(range(97, 123)) # Iterable of ints b'abcdefghijklmnopqrstuvwxyz' >>> bytes("real 🐍", "utf-8") # String + encoding b'real \xf0\x9f\x90\x8d' >>> bytes(10) b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >>> bytes.fromhex('c0 ff ee') b'\xc0\xff\xee' >>> bytes.fromhex("72 65 61 6c 70 79 74 68 6f 6e") b'realpython'

Examples: chr() Show/Hide

chr() converts an integer code point to a single Unicode character:

>>>>>> chr(97) 'a' >>> chr(7048) 'ᮈ' >>> chr(1114111) '\U0010ffff' >>> chr(0x10FFFF) # Hex literal (int) '\U0010ffff' >>> chr(0b01100100) # Binary literal (int) 'd'

Examples: hex() Show/Hide

hex() gives the hexadecimal representation of an integer, with the prefix "0x":

>>>>>> hex(100) '0x64' >>> [hex(i) for i in [1, 2, 4, 8, 16]] ['0x1', '0x2', '0x4', '0x8', '0x10'] >>> [hex(i) for i in range(16)] ['0x0', '0x1', '0x2', '0x3', '0x4', '0x5', '0x6', '0x7', '0x8', '0x9', '0xa', '0xb', '0xc', '0xd', '0xe', '0xf']

Examples: int() Show/Hide

int() coerces the input to int, optionally interpreting the input in a given base:

>>>>>> int(11.0) 11 >>> int('11') 11 >>> int('11', base=2) 3 >>> int('11', base=8) 9 >>> int('11', base=16) 17 >>> int(0xc0ffee - 1.0) 12648429 >>> int.from_bytes(b"\x0f", "little") 15 >>> int.from_bytes(b'\xc0\xff\xee', "big") 12648430

Examples: ord() Show/Hide

ord() converts a single Unicode character to its integer code point:

>>>>>> ord("a") 97 >>> ord("ę") 281 >>> ord("ᮈ") 7048 >>> [ord(i) for i in "hello world"] [104, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100]

Examples: str() Show/Hide

str() coerces the input to str, representing text:

>>>>>> str("str of string") 'str of string' >>> str(5) '5' >>> str([1, 2, 3, 4]) # Like [1, 2, 3, 4].__str__(), but use str() '[1, 2, 3, 4]' >>> str(b"\xc2\xbc cup of flour", "utf-8") '¼ cup of flour' >>> str(0xc0ffee) '12648430' Python String Literals: Ways to Skin a Cat

Rather than using the str() constructor, it’s commonplace to type a str literally:

>>>>>> meal = "shrimp and grits"

That may seem easy enough. But the interesting side of things is that, because Python 3 is Unicode-centric through and through, you can “type” Unicode characters that you probably won’t even find on your keyboard. You can copy and paste this right into a Python 3 interpreter shell:

>>>>>> alphabet = 'αβγδεζηθικλμνξοπρςστυφχψ' >>> print(alphabet) αβγδεζηθικλμνξοπρςστυφχψ

Besides placing the actual, unescaped Unicode characters in the console, there are other ways to type Unicode strings as well.

One of the densest sections of Python’s documentation is the portion on lexical analysis, specifically the section on string and bytes literals. Personally, I had to read this section about one, two, or maybe nine times for it to really sink in.

Part of what it says is that there are up to six ways that Python will allow you to type the same Unicode character.

The first and most common way is to type the character itself literally, as you’ve already seen. The tough part with this method is finding the actual keystrokes. That’s where the other methods for getting and representing characters come into play. Here’s the full list:

Escape Sequence Meaning How To Express "a" "\ooo" Character with octal value ooo "\141" "\xhh" Character with hex value hh "\x61" "\N{name}" Character named name in the Unicode database "\N{LATIN SMALL LETTER A}" "\uxxxx" Character with 16-bit (2-byte) hex value xxxx "\u0061" "\Uxxxxxxxx" Character with 32-bit (4-byte) hex value xxxxxxxx "\U00000061"

Here’s some proof and validation of the above:

>>>>>> ( ... "a" == ... "\x61" == ... "\N{LATIN SMALL LETTER A}" == ... "\u0061" == ... "\U00000061" ... ) True

Now, there are two main caveats:

  1. Not all of these forms work for all characters. The hex representation of the integer 300 is 0x012c, which simply isn’t going to fit into the 2-hex-digit escape code "\xhh". The highest code point that you can squeeze into this escape sequence is "\xff" ("ÿ"). Similarly for "\ooo", it will only work up to "\777" ("ǿ").

  2. For \xhh, \uxxxx, and \Uxxxxxxxx, exactly as many digits are required as are shown in these examples. This can throw you for a loop because of the way that Unicode tables conventionally display the codes for characters, with a leading U+ and variable number of hex characters. They key is that Unicode tables most often do not zero-pad these codes.

For instance, if you consult unicode-table.com for information on the Gothic letter faihu (or fehu), "𐍆", you’ll see that it is listed as having the code U+10346.

How do you put this into "\uxxxx" or "\Uxxxxxxxx"? Well, you can’t fit it in "\uxxxx" because it’s a 4-byte character, and to use "\Uxxxxxxxx" to represent this character, you’ll need to left-pad the sequence:

>>>>>> "\U00010346" '𐍆'

This also means that the "\Uxxxxxxxx" form is the only escape sequence that is capable of holding any Unicode character.

Note: Here’s a short function to convert strings that look like "U+10346" into something Python can work with. It uses str.zfill():

>>>>>> def make_uchr(code: str): ... return chr(int(code.lstrip("U+").zfill(8), 16)) >>> make_uchr("U+10346") '𐍆' >>> make_uchr("U+0026") '&' Other Encodings Available in Python

So far, you’ve seen four character encodings:

  1. ASCII
  2. UTF-8
  3. UTF-16
  4. UTF-32

There are a ton of other ones out there.

One example is Latin-1 (also called ISO-8859-1), which is technically the default for the Hypertext Transfer Protocol (HTTP), per RFC 2616. Windows has its own Latin-1 variant called cp1252.

Note: ISO-8859-1 is still very much present out in the wild. The requests library follows RFC 2616 “to the letter” in using it as the default encoding for the content of an HTTP/HTTPS response. If the word “text” is found in the Content-Type header, and no other encoding is specified, then requests will use ISO-8859-1.

The complete list of accepted encodings is buried way down in the documentation for the codecs module, which is part of Python’s Standard Library.

There’s one more useful recognized encoding to be aware of, which is "unicode-escape". If you have a decoded str and want to quickly get a representation of its escaped Unicode literal, then you can specify this encoding in .encode():

>>>>>> alef = chr(1575) # Or "\u0627" >>> alef_hamza = chr(1571) # Or "\u0623" >>> alef, alef_hamza ('ا', 'أ') >>> alef.encode("unicode-escape") b'\\u0627' >>> alef_hamza.encode("unicode-escape") b'\\u0623' You Know What They Say About Assumptions…

Just because Python makes the assumption of UTF-8 encoding for files and code that you generate doesn’t mean that you, the programmer, should operate with the same assumption for external data.

Let’s say that again because it’s a rule to live by: when you receive binary data (bytes) from a third party source, whether it be from a file or over a network, the best practice is to check that the data specifies an encoding. If it doesn’t, then it’s on you to ask.

All I/O happens in bytes, not text, and bytes are just ones and zeros to a computer until you tell it otherwise by informing it of an encoding.

Here’s an example of where things can go wrong. You’re subscribed to an API that sends you a recipe of the day, which you receive in bytes and have always decoded using .decode("utf-8") with no problem. On this particular day, part of the recipe looks like this:

>>>>>> data = b"\xbc cup of flour"

It looks as if the recipe calls for some flour, but we don’t know how much:

>>>>>> data.decode("utf-8") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 0: invalid start byte

Uh oh. There’s that pesky UnicodeDecodeError that can bite you when you make assumptions about encoding. You check with the API host. Lo and behold, the data is actually sent over encoded in Latin-1:

>>>>>> data.decode("latin-1") '¼ cup of flour'

There we go. In Latin-1, every character fits into a single byte, whereas the “¼” character takes up two bytes in UTF-8 ("\xc2\xbc").

The lesson here is that it can be dangerous to assume the encoding of any data that is handed off to you. It’s usually UTF-8 these days, but it’s the small percentage of cases where it’s not that will blow things up.

If you really do need to abandon ship and guess an encoding, then have a look at the chardet library, which uses methodology from Mozilla to make an educated guess about ambiguously encoded text. That said, a tool like chardet should be your last resort, not your first.

Odds and Ends: unicodedata

We would be remiss not to mention unicodedata from the Python Standard Library, which lets you interact with and do lookups on the Unicode Character Database (UCD):

>>>>>> import unicodedata >>> unicodedata.name("€") 'EURO SIGN' >>> unicodedata.lookup("EURO SIGN") '€' Wrapping Up

In this article, you’ve decoded the wide and imposing subject of character encoding in Python.

You’ve covered a lot of ground here:

  • Fundamental concepts of character encodings and numbering systems
  • Integer, binary, octal, hex, str, and bytes literals in Python
  • Python’s built-in functions related to character encoding and numbering systems
  • Python 3’s treatment of text versus binary dat

Now, go forth and encode!

Resources

For even more detail about the topics covered here, check out these resources:

The Python docs have two pages on the subject:

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Help Test Plasma 5.16 Beta

Planet KDE - Mon, 2019-05-20 09:43

Plasma 5.16 beta was released last week and there’s now a further couple of weeks to test it to find and fix all the beasties. To help out download the Neon Testing image and install it in a virtual machine or on your raw hardware. You probably want to do a full-upgrade to make sure you have the latest builds. Then try out the new notifications system, or the new animated wallpaper settings or anything else mentioned in the release announcement. When you find a problem report it on bugs.kde.org and/or chat on the Plasma Matrix room. Thanks for your help!

Categories: FLOSS Project Planets

J On The Beach: a great event

Planet KDE - Mon, 2019-05-20 08:31

I have been at many software events and have helped or have been part of the organization in a few of them. Based on that experience and the fact that I have participated in the last two editions, let me tell you that J On The Beach is a great event.

The main factors that leads me to such a conclusion are:

  • It is all about contents. I have seen many events that, over time, loose the focus on the quality of the contents. It is a hard focus to keep, specially as you grow. @JOTB19 had great content: well delivered talks and workshops, performed by bright people with something to say which was relevant to the audience.
    • I think the event has not reached its limit yet, specially when it comes to workshops.
    • Designing the content structure to target the right audience is as important as bringing speakers with great things to say. As any event matures, tough decisions will need to be taken in order to find its own space and identity among outstanding competitors.
      • When it comes to themes, will J On The Beach keep targeting several topics, or will it narrow them to one or two? Will they always be the same or will they rotate?
      • When it comes to size, will it grow or will it remain in the current numbers? Will the price increase or will be kept in the current range?
      • When it comes to contents, will the event focus more energy and time allocation on the “hands on” learning sessions or will workshops be kept as a relevant compared to the talks as they are today?  Will the talks length be reduced? Will we see lightning talks?
  • J On The Beach was well organised. A good organization is not the one that does not run into any trouble but the one that handles them smoothly so there is little or no perceived impact. This event has a diligent team behind it, based on the little/no impact I perceived.
  • Support from local companies. As Málaga matures as software hub, more and more companies arrive to this area expecting to grow in size, so the need to attract local talent grows in parallel.
    • Some of these foreign companies understand how important it is to show up in local events to be known by as many local developers as possible. J On The Beach has captured the attention of several of these companies.
    • The organizers have understood this reality and support them to use the event to openly  recruit people. This symbiotic relation is a very productive one from what I have witnessed.
    • It is a hard relation to sustain though, specially if the event does not grow is size, so probably in the future the current relation will need to add additional common interests to remain productive for both sides.
  • Global by default. Most events in Spain have traditionally been designed for Spaniards first, turning into more global events as they grow. J On The Beach is global by default, by design, since day 1. It is harder to succeed that way, but beyond the activation point it turns out to be easier to become sustainable. The organizers took the risk and have reached that point already, which provides the event a bright future in my opinion.
    • The fact that the event is able to attract developers from many countries, specially from eastern European ones, makes J On The Beach very attractive to foreign companies already located in Málaga, from the recruitment perspective. Málaga is a great place not just to work in English but also to live in English. There are well established communities from many different countries in the metropolitan area, due to how strong the touristic industry is here. These factors, together with others like logistics, affordable living costs, good public health care system, sunny weather, availability of international and multilingual schools, etc., reduce the adaptation effort when relocating,  specially for developer’s families. J On The Beach brings tasty fishes to the pond.

Let me name a couple of points that can make the event even better:

  • It is very hard to find a venue that fits any event during its consolidation phase and evolves with it. This edition’s venue represents a significant improvement compared to last year edition. There is room for improvement though.
    • It would be ideal to find a place in Málaga itself, closer to where the companies are located and places to hang out after the event, which at the same time, keep the good things the current venue/location provides, which are plenty.
    • Finding the right venue is tough. There are decision-making factors that participants do not usually see but are essential like costs, how supportive the venue staff and owners are, accommodation availability in the surrounding area, availability on the selected dates, etc. It is one of the most difficult points to get right, in my experience.                      
  • Great events deserve great keynote speakers. They are hard to get but often reflects the difference between great and must-attend events.
    • Great keynote speakers does not necessarily mean popular ones. I see already celebrities in bigger and more expensive events. I would love to see in Málaga old time computer science cowboys.  I mean those first class engineers who did something relevant some time ago and have witnessed the evolution of our industry and their own inventions. They are able to bring a perspective that very few can provide, extremely valuable in these fast pace changing times. Those gems are harder to see at big/popular events and might be a good target for a smaller, high quality event. I think that it would be a great sign of success if such a kind of professionals come to talk at J On The Beach.

I am very glad there is such a great event close to where I live. J On The Beach is not just worth for local developers but also for those abroad. I attend to several events in other countries every year with more name but less value than J On The Beach. It will definitely be on my 2020 agenda. Thanks to every person involved in making it possible.

Pictures taken from the J On The Beach website.
Categories: FLOSS Project Planets

Dirk Eddelbuettel: digest 0.6.19

Planet Debian - Mon, 2019-05-20 07:48

Overnight, digest version 0.6.19 arrived on CRAN. It will get uploaded to Debian in due course.

digest creates hash digests of arbitrary R objects (using the md5, sha-1, sha-256, sha-512, crc32, xxhash32, xxhash64, murmur32, and spookyhash algorithms) permitting easy comparison of R language objects.

This version contains two new functions adding new digest functionality. First, Dmitriy Selivanov added a fast and vectorized digest2int to convert (arbitrary) strings into 32 bit integers using one-at-a-time hashing. Second, Kendon Bell, over a series of PRs, put together a nice implementation of spookyhash as a first streaming hash algorithm in digest. So big thanks to both Dmitriy and Kendon.

No other changes were made.

CRANberries provides the usual summary of changes to the previous version.

For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

Abhijeet Pal: Adding Pagination With Django

Planet Python - Mon, 2019-05-20 07:39

While working on a modern web application quite often you will need to paginate the app be it for better user experience or performance. Fortunately, Django comes with built-in pagination classes for managing paginating data of your application.

In this article, we will go through the pagination process with class-based views and function based views in Django.

Prerequisite

For the sake of this tutorial I am using a blog application  – Github repo

The above project is made on Python 3.7, Django 2.1 and Bootstrap 4.3. This is a very basic blog application displaying a list of posts on the homepage but when the number of posts increases we need to split them up.

Recommended Article:  Building A Blog Application With Django

Adding Pagination Using Class-Based-Views [ ListView ]

The Django ListView class comes with built-in support for pagination so all we need to do is take advantage of it. Pagination is controlled by the GET parameter that controls which page to show.

First, open the views.py file of your app.

from django.views import generic from .models import Post class PostList(generic.ListView): queryset = Post.objects.filter(status=1).order_by('-created_on') template_name = 'index.html' class PostDetail(generic.DetailView): model = Post template_name = 'post_detail.html'

Now in the PostList view we will introduce a new attribute paginate_by which takes an integer specifying how many objects should be displayed per page. If this is given, the view will paginate objects with paginate_by objects per page. The view will expect either a page query string parameter (via request.GET) or a page variable specified in the URLconf.

class PostList(generic.ListView): queryset = Post.objects.filter(status=1).order_by('-created_on') template_name = 'index.html' paginate_by = 3

Now our posts are paginated by 3 posts a page.

Next, to see the pagination in action, we need to edit the template which for this application is the index.html file paste the below snippet.

{% if is_paginated %} <nav aria-label="Page navigation conatiner"></nav> <ul class="pagination justify-content-center"> {% if page_obj.has_previous %} <li><a href="?page={{ page_obj.previous_page_number }}" class="page-link">&laquo; PREV </a></li> {% endif %} {% if page_obj.has_next %} <li><a href="?page={{ page_obj.next_page_number }}" class="page-link"> NEXT &raquo;</a></li> {% endif %} </ul> </nav> </div> {% endif %}

Note that we are using Bootstrap 4.3 for this project, if you are using any other frontend framework you may change the classes.

Now run the server and visit http://127.0.0.1:8000/ you should see the page navigation buttons below the posts.

 

Adding Pagination Using Function-Based-Views

Equivalent function based view for the above PosList class would be.

def PostList(request): object_list = Post.objects.filter(status=1).order_by('-created_on') paginator = Paginator(object_list, 3) # 3 posts in each page page = request.GET.get('page') try: post_list = paginator.page(page) except PageNotAnInteger: # If page is not an integer deliver the first page post_list = paginator.page(1) except EmptyPage: # If page is out of range deliver last page of results post_list = paginator.page(paginator.num_pages) return render(request, 'index.html', {'page': page, 'post_list': post_list})

So in the view, we instantiate the Paginator class with the number of objects to be displayed on each page i.e 3. Then we have the request.GET.get('page') parameter which returns the current page number. The page() method is used to obtain the objects from the desired page number. Below that we have two exception statements for PageNotAnInteger and EmptyPage both are subclasses of InvalidPage finally at the end we are rendering out the HTML file.

Now in your templates paste the below snippet.

{% if post_list.has_other_pages %} <nav aria-label="Page navigation conatiner"></nav> <ul class="pagination justify-content-center"> {% if post_list.has_previous %} <li><a href="?page={{ post_list.previous_page_number }}" class="page-link">&laquo; PREV </a></li> {% endif %} {% if post_list.has_next %} <li><a href="?page={{ post_list.next_page_number }}" class="page-link"> NEXT &raquo;</a></li> </ul> </nav> {% endif %}

Save the files and run the server you should see the NEXT button below the post list.

The post Adding Pagination With Django appeared first on Django Central.

Categories: FLOSS Project Planets

Catalin George Festila: Python 3.7.3 : Use the tweepy to deal with twitter api - part 002.

Planet Python - Mon, 2019-05-20 07:29
The tutorial for today is about with python 3.7.3 The development team comes with this intro: An easy-to-use Python library for accessing the Twitter API. The first step is to install this python module: C:\Python373\Scripts>pip install tweepy ... Successfully built PySocks Installing collected packages: PySocks, tweepy Successfully installed PySocks-1.6.8 tweepy-3.7.0 This is the source code I
Categories: FLOSS Project Planets

Candy Tsai: Outreachy 2019 March-August Internship – The Application Process

Planet Debian - Mon, 2019-05-20 05:57
Blah: Introduction

Really excited to be accepted for the project “Debian Continuous Integration: user experience improvements” (referred to as debci in this post) of the 2019 March-August round of the Outreachy internship! A huge thanks to my company and my manager Frank for letting me do this since I mentioned it out of the blue. Thanks to the Women Techmakers community for letting me know this program exists.

There are already blog posts that also has an introduction of the program, such as:

  1. How I got into Mozilla’s Outreachy open source internship program
  2. My Pathway to Outreachy with Mozilla
  3. Outreachy: What? How? Why?

To me, the biggest difference between Outreachy and Google Summer of Code (GSoC) is that you don’t have to be a student in order to apply.

This post won’t be going into the details of “What is Outreachy” in this post, and will focus on the process, where everyone will have a different story. This is my version, and hope that you can find yours in the near future!

Table of contents: Goals: The Why

What I like about Outreachy’s application process is that it definitely lets you think about why you want to apply. For me, things were pretty simple and straightforward:

  • Experience what it is like to work remotely
  • Use my current knowledge to contribute to open source
  • Learn something different from my current job

Actually the most important reason that I kind of feel bad mentioning here is that I felt like leaving the male-dominated tech space for a bit. My colleagues are really nice and friendly, but… it’s hard to put into words.

Mindset: Start Right Away

The two main reasons I failed in the past:

  1. Hesitation
  2. Spent too much time browsing the project list
Hesitation

I have known about the Outreachy since 2017, but because it requires you to make a few contributions in order to apply, any bit of hesitation will result in a late start. It was a bit scary to approach the project mentors and thought my code has to be perfect in order to make a contribution. The truth is, without discussion, you might not know the details of the issue, hence you can’t even start coding. Almost every accepted applicant mentions the importance of starting early. To be precise, just start at the day when applications are open.

Spent too much time browsing the project list

Another reason that kept me from starting right away was that I had been browsing the project list for too long. Since the project list on the first day is not complete, it means that there will be projects that I might be more interested in joining the list as the time passes. Past projects can be referenced to get a better picture of which organizations were involved, but it is never a 100% sure bet. Also, the organizations participating for the March-August round is different from the December-March round. To avoid the starting too late scenario, the strategy I used was to choose 2 projects to contribute to. One in the initial phase (first week or so), and another during the following weeks.

Strategy: Choose 2 Projects

Choosing how many projects to work on really depends on the time you have available. The main idea of this strategy was to eliminate the cause of spending too much time on browsing the project list. Since already having a full-time job at the time, I really had to weigh my priorities. To be honest, I barely had time to work on the second project.

On the day the project list was announced, I quickly assessed my skills with the projects available and decided to try applying for Mozilla. Yep, you heard me right, my first choice wasn’t Debian because Mozilla seemed more familiar to me. Then I instantly realized that there were a flooding number of applicants also applying for Mozilla. All of the newcomer issues were claimed and it all happened in just a matter of days!

I started to look for other projects that were also in line with my goals, which led me to debci. Never have I used Ruby in projects and neither the Debian system. On the other hand, I’m familiar with the other skills listed in the requirements, so some of my knowledge can still be utilized.

The second project was announced at a later stage and came from Openstack. Had to admit it was a little too hard for me to setup the Ironic baremetal environment so wasn’t able to put in much.

Plan: Learn About the Community

An important aspect through the application process was to get in touch with the community. Although Debci and Openstack Ironic both use IRC chat, it feels very different. From a wiki search, it seems Openstack is backed by a corporate entity while Debian by volunteers.

Despite the difference, both communities were pretty active with Openstack involving more members. As long as the community was active and friendly, it fits the vision I was looking for.

Execution: Communication

Looking back at the contribution process, it actually took more time than I initially imagined. The whole application process consists of three stages:

  1. Initial application: has to wait a few days for it to be approved
  2. Record Contributions: the main part
  3. Final application: final dash

Except for the initial application, which can be done by myself, the rest involves communicating with others. Communication differs a lot compared to an office environment. My first merge request (a.k.a. pull request) had a ton of comments, and I couldn’t understand what the comments were suggesting at first. It became to clear up after some discussion and I was really excited to have it being merged. This was huge for me since this all happened online, which contains a bit of a time lag since in an office environment, my colleague would just come around for a face-to-face discussion.

TL;DR

Had no idea that so many words were written, so guess I will stop for now. Up until now, I haven’t mentioned much about writing code and that’s because you will feel for yourself whether you can get through during the process. So the TL;DR version of this post is:

  • Do not hesitate, just do it
  • Start as soon as applications are open
  • Do not lurk around the project list
  • Get in touch with the community and mentors
  • Communicate about the issues

Really excited to begin this Outreachy journey with debci and grateful for this opportunity. Stay tuned for more articles about the project itself!

Categories: FLOSS Project Planets

Catalin George Festila: Python 3.7.3 : The google-cloud-vision python module - part 002.

Planet Python - Mon, 2019-05-20 04:50
I used Windows 8.1 and python 3.7.3 version. The first step is to install the python module. C:\Python373\Scripts>pip install --upgrade google-cloud-visionYou can see another tutorial about this python module here. Let's test the python module. C:\Python373>python Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD6 4)] on win32 Type "help", "copyright", "credits" or
Categories: FLOSS Project Planets

Bits from Debian: Lenovo Platinum Sponsor of DebConf19

Planet Debian - Mon, 2019-05-20 04:00

We are very pleased to announce that Lenovo has committed to supporting DebConf19 as a Platinum sponsor.

"Lenovo is proud to sponsor the 20th Annual Debian Conference." said Egbert Gracias, Senior Software Development Manager at Lenovo. "We’re excited to see, up close, the great work being done in the community and to meet the developers and volunteers that keep the Debian Project moving forward!”

Lenovo is a global technology leader manufacturing a wide portfolio of connected products, including smartphones, tablets, PCs and workstations as well as AR/VR devices, smart home/office solutions and data center solutions.

With this commitment as Platinum Sponsor, Lenovo is contributing to make possible our annual conference, and directly supporting the progress of Debian and Free Software, helping to strengthen the community that continues to collaborate on Debian projects throughout the rest of the year.

Thank you very much Lenovo, for your support of DebConf19!

Become a sponsor too!

DebConf19 is still accepting sponsors. Interested companies and organizations may contact the DebConf team through sponsors@debconf.org, and visit the DebConf19 website at https://debconf19.debconf.org.

Categories: FLOSS Project Planets

EuroPython Society: EuroPython 2019: Conference and training ticket sale opens today

Planet Python - Mon, 2019-05-20 03:47

europython:

We will be starting the EuroPython 2019 conference and training ticket sales

today (Monday) at 12:00 CEST.

https://ep2019.europython.eu/registration/buy-tickets/


Only 300 training tickets available

After the rush to the early-bird tickets last week (we sold more than 290 tickets in 10 minutes), we expect a rush to the regular and training tickets this week as well.

We only have 300 training tickets available, so if you want to attend the training days, please consider getting your ticket soon.

Available ticket types

We will have the following ticket types available:

  • regular conference tickets - admission to the conference days (July 10-12) and sprints (July 13-14)
  • training tickets - admission to the training days (July 8-9)
  • combined tickets - admission to training, conference and sprint days (July 8-14)

Please see our registration page for full details on the available tickets.

As reminder, here’s the conference layout:

  • Monday & Tuesday, July 8 & 9: Trainings, Beginners’ Day and other workshops
  • Wednesday–Friday, July 10–12: Conference talks, keynotes & exhibition
  • Saturday & Sunday, July 13 & 14: Sprints
Combined Tickets

These are a new ticket type we are introducing for EuroPython 2019, to simplify purchase and check-in at the conference for attendees who want to attend the complete EuroPython 2019 week with a single ticket.

To make the ticket more attractive, we are granting a small discount compared to purchasing training and conference tickets separately.


Enjoy,

EuroPython 2019 Team
https://ep2019.europython.eu/
https://www.europython-society.org/

Categories: FLOSS Project Planets

Keith Packard: itsybitsy-snek

Planet Debian - Mon, 2019-05-20 03:17
ItsyBitsy Snek — snek on the Adafruit ItsyBitsy

I got an ItsyBitsy board from Adafruit a few days ago. This board is about as minimal an Arduino-compatible device as I can imagine. All it's got is an Atmel ATmega 32U4 SoC, one LED, and a few passive components.

I'd done a bit of work with the 32u4 under AltOS a few years ago when Bdale and I built a 'companion' board called TeleScience for TeleMetrum to try and measure rocket airframe temperatures in flight. So, I already had some basic drivers for some of the peripherals, including a USB driver.

USB Adventures

The 32u4 USB hardware is simple, and actually fairly easy to use. The AltOS driver used a separate thread to manage the setup messages on endpoint 0. I didn't imagine I'd have space for threading on this device, so I modified that USB driver to manage setup processing from the interrupt handler. I'd done that on a bunch of other USB parts, so while it took longer than I'd hoped, I did manage to get it working.

Then I spent a whole bunch of time reducing the code size of this driver. It started at about 2kB and is now almost down to 1kB. It's a bit less robust now; hosts sending odd setup messages may get unexpected results.

The last thing I did was to add a FIFO for OUT data. That's because we want to be able to see ^C keystrokes even while Snek is executing code.

Reset as longjmp

On the ATmega 328P, to reset Snek, I just reset the whole chip. Nice and clean. With integrated USB, I can't reset the chip without losing the USB connection, and that would be pretty annoying. Resetting Snek's state back to startup would take a pile of code, so instead, I gathered all of the snek-related .data and .bss variables by changing the linker script. Then, I wrote a reset function that does pretty much what the libc startup code does and then jumps back to main:

snek_poly_t snek_builtin_reset(void) { /* reset data */ memcpy_P(&__snek_data_start__, (&__text_end__ + (&__snek_data_start__ - &__data_start__)), &__snek_data_end__ - &__snek_data_start__); /* reset bss */ memset(&__snek_bss_start__, '\0', &__snek_bss_end__ - &__snek_bss_start__); /* and off we go! */ longjmp(snek_reset_buf, 1); return SNEK_NULL; }

I still need to write code to reset the GPIO pins.

Development Environment

To flash firmware to the device, I stuck the board into a proto board and ran jumpers from my AVRISP cable to the board.

Next, I hooked up a FTDI USB to Serial converter to the 32u4 TX/RX pins. Serial is always easier than USB, and this was certainly the case here.

Finally, I dug out my trusty Beagle USB analyzer. This lets me see every USB packet going between the host and the device and is invaluable for debugging USB issues.

You can see all of these pieces in the picture above. They're sitting on top of a knitting colorwork pattern of snakes and pyramids, which I may have to make something out of.

Current Status

Code for this part is on the master branch, which is available on my home machine as well as github:

I think this is the last major task to finish before I release snek version 1.0. I really wanted to see if I could get snek running on this tiny target. It's nearly there; I want to squeeze a few more things onto this chip.

Categories: FLOSS Project Planets

EuroPython: EuroPython 2019: Conference and training ticket sale opens today

Planet Python - Mon, 2019-05-20 03:14

We will be starting the EuroPython 2019 conference and training ticket sales

today (Monday) at 12:00 CEST.

https://ep2019.europython.eu/registration/buy-tickets/


Only 300 training tickets available

After the rush to the early-bird tickets last week (we sold more than 290 tickets in 10 minutes), we expect a rush to the regular and training tickets this week as well.

We only have 300 training tickets available, so if you want to attend the training days, please consider getting your ticket soon.

Available ticket types

We will have the following ticket types available:

  • regular conference tickets - admission to the conference days (July 10-12) and sprints (July 13-14)
  • training tickets - admission to the training days (July 8-9)
  • combined tickets - admission to training, conference and sprint days (July 8-14)

Please see our registration page for full details on the available tickets.

As reminder, here’s the conference layout:

  • Monday & Tuesday, July 8 & 9: Trainings, Beginners’ Day and other workshops
  • Wednesday–Friday, July 10–12: Conference talks, keynotes & exhibition
  • Saturday & Sunday, July 13 & 14: Sprints
Combined Tickets

These are a new ticket type we are introducing for EuroPython 2019, to simplify purchase and check-in at the conference for attendees who want to attend the complete EuroPython 2019 week with a single ticket.

To make the ticket more attractive, we are granting a small discount compared to purchasing training and conference tickets separately.


Enjoy,

EuroPython 2019 Team
https://ep2019.europython.eu/
https://www.europython-society.org/

Categories: FLOSS Project Planets

Lullabot: Behind the Screens: Behind the Screens with Adam Bergstein

Planet Drupal - Mon, 2019-05-20 03:00

For years, SimplyTest.me has provided a once-and-done tool for testing Drupal, and Adam Bergstein has recently taken over maintainership. In this episode we find out why, how you can help, and coffee!

Categories: FLOSS Project Planets

Ramsalt Lab: The ultimate guide for faster Drupal: Part 2: Aggregation, CDN and Image Optimization

Planet Drupal - Mon, 2019-05-20 02:44
  • We are on our journey to master the Drupal performance, after having our previous Part 1: Caching published a couple of weeks ago, we've been lucky enough to get into Issue 386 of TheWeeklyDrop newsletter, Planet Drupal, and got much love and good feedback on Twitter.

    If you haven't already read the first part of the series, the ultimate guide for faster Drupal: Part 1 Caching, please feel free to read that article too.


    Note: You don't necessarily have to do all of these, some items listed here are replaceable with each other as well, so proceed with caution!

    Faster Drupal - Part 2: Aggregation and CDN
    • The one and the only holy grail: Advanced CSS/JS Aggregation
      On every Drupal optimization post you’d read you have to setup and configure AdvAgg module, but you gotta do what you gotta do!
      AdvAgg features and core benefits are listed on the module page completely, so go ahead and read them all, configure it the way that works best for you and move on
      Advanced CSS/JS Aggregation Drupal module

      Note: If you have Mod Pagespeed you might not need AdvAgg module, make sure that you don't overlap your own work

      But that’s not all, if you are on Drupal 7, you should consider checking Speedy module as well, in some areas, this might work a bit better so make sure to check it out as well
      Speedy module

    • For good JavaScript minification in Drupal, you can use modules such as minify but we’d like to recommend minifyJS instead, they listed the differences and benefits on their module page so check it out
      Drupal MinifyJS module

    • CDNize the whole project, as much as you can! You may use CDN module too

    • Move JavaScript to the footer if necessary, some JS files need to be rendered in the head based on the use case and what does the JS do! Also in Drupal 8, it’s quite easy to append your necessary library (Read it JS files) in the footer in twig template files

    • Consider if you can make your own scripts defer/async (a new challenge when it comes to Drupal js aggregation)

     

    Okay, this round was much easier thanks to AdvAgg module for taking care of half of the things we need to do for us! Note that on the frontend side you can Uglify, Minify and make sure everything that you code, will become compressed, whether it’s CSS, JS or even images or SVG files! Now let's get to it, Image optimization. 

    Image optimization
    • Drupal 8: Use the Responsive Image module wherever possible and create the appropriate styles. It uses the <picture> tag which is what we really want

    • One might say we have one too many image optimization modules in Drupal, which is a good thing! For that we tested some, experienced with some of them and here’s what we suggest: Use blazy and lazyload_images (Drupal 8 that uses IntersectionObserver instead of scrolling events), Also consider: lazyloader and image_lazy_loader when using the picture module for responsive images in Drupal 7. There is also a lazy loading option that works well

    • Image optimization: for main images/icons used in the design (Yes you can optimize SVG files as well), also the best tool for that is not in Drupal, try ImageOptim desktop app, Also there’s an API-based service available with a Drupal 7 module, take a look here, might worth setting/selling this up to clients

      Also in the same context, we can use ReSmush.it which is free (But should donate some beer to them)
      Drupal 7 Module, Drupal 8 Module

    • Image formats like JPEG 2000, JPEG XR, and WebP often provide better compression than PNG or JPEG, which means faster downloads and less data consumption. There's a really good module that help you serve WebP, it's called, you guessed it; WebP.

    • Serve WebP images with your web server with the MOD PageSpeed by Google for Apache and Nginx.
      Or conditionally serving WebP images with Nginx.
       

    Bonus tip: Even favicons should be optimized. Sometimes people ignore the weight of a favicon file. You shouldn’t! 

     

    For the next week, we will be covering subjects regarding Drupal database/web server tweaks & improvements, stay tuned.

     

    Written by Sohail Lajevardi
    Developer at Ramsalt Lab

     

    Categories: FLOSS Project Planets

    Petter Reinholdtsen: MIME type "text/vnd.sosi" for SOSI map data

    Planet Debian - Mon, 2019-05-20 02:35

    As part of my involvement in the work to standardise a REST based API for Noark 5, the Norwegian archiving standard, I spent some time the last few months to try to register a MIME type and PRONOM code for the SOSI file format. The background is that there is a set of formats approved for long term storage and archiving in Norway, and among these formats, SOSI is the only format missing a MIME type and PRONOM code.

    What is SOSI, you might ask? To quote Wikipedia: SOSI is short for Samordnet Opplegg for Stedfestet Informasjon (literally "Coordinated Approach for Spatial Information", but more commonly expanded in English to Systematic Organization of Spatial Information). It is a text based file format for geo-spatial vector information used in Norway. Information about the SOSI format can be found in English from Wikipedia. The specification is available in Norwegian from the Norwegian mapping authority. The SOSI standard, which originated in the beginning of nineteen eighties, was the inspiration and formed the basis for the XML based Geography Markup Language.

    I have so far written a pattern matching rule for the file(1) unix tool to recognize SOSI files, submitted a request to the PRONOM project to have a PRONOM ID assigned to the format (reference TNA1555078202S60), and today send a request to IANA to register the "text/vnd.sosi" MIME type for this format (referanse IANA #1143144). If all goes well, in a few months, anyone implementing the Noark 5 Tjenestegrensesnitt API spesification should be able to use an official MIME type and PRONOM code for SOSI files. In addition, anyone using SOSI files on Linux should be able to automatically recognise the format and web sites handing out SOSI files can begin providing a more specific MIME type. So far, SOSI files has been handed out from web sites using the "application/octet-stream" MIME type, which is just a nice way of stating "I do not know". Soon, we will know. :)

    As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

    Categories: FLOSS Project Planets

    Mike Driscoll: PyDev of the Week: Adrienne Tacke

    Planet Python - Mon, 2019-05-20 01:05

    This week we welcome Adrienne Tacke (@AdrienneTacke) as our PyDev of the Week! Adrienne is the author of Coding for Kids: Python: Learn to Code with 50 Awesome Games and Activities and her book came out earlier this year. You can see what Adrienne is up to on Instagram or via her website. Let’s take some time to get to know her better!

    Can you tell us a little about yourself (hobbies, education, etc):

    I’m a software engineer in Las Vegas and have a degree in Management Information Systems from UNLV. I’ve worked in the education and healthcare industries and now focus on building awesome things in the fintech space. I love learning new languages (spoken and programming), eating every dessert imaginable, traveling the world with my husband, and finding ways to encourage more young girls and women to try out a career as a software engineer.

    Why did you start using Python?

    I truly began using the language when I started writing my book Coding for Kids: Python. It was a simple language that allowed me to focus on the programming concepts versus the syntax which is important for anyone new to coding.

    What other programming languages do you know and which is your favorite?

    My main languages are C# and JavaScript. I use these everyday for work. My first language was vb.net *blushes*. I don’t really have a favorite language, but I do strongly believe in some principles. These include separation of concerns, reusability, human-readable > machine-readable code, and clean code and design.

    What projects are you working on now?

    I have several projects that involve me teaching something related to software engineering. I can’t wait to share more in the coming months! As far as code, I just (FINALLY) updated my website which has been in dire need of a refresh. And the project I’m most excited about: I am starting the migration for our company’s client-side architecture to React!

    Which Python libraries are your favorite (core or 3rd party)?

    I’m don’t have any favorites yet as I’ve barely scratched the surface of what’s available. That said, I have tried TensorFlow and Luminoth and absolutely love their potential. I hope to work with them more in a future weekend project!

    How did your book come about?

    I was approached by a publisher with this opportunity through my Instagram account! I share peeks into my career, serve as an example of a feminine software engineer, and more importantly, share educational posts and mini-tutorials. One of my series is called #DontBeAfraidOfTheTerminal where I explain common and widely used commands in an approachable way. I think they liked they way I taught technical topics, so they asked me to write this book!

    What sorts of things did you learn while writing your book?

    It is very difficult to reduce topics down into a manageable and kid-friendly way! Writing this book for this type of audience (kids or someone with absolutely no programming experience) was a welcome challenge as it required me to rephrase and refine how I explain things work in programming.

    Do you have advice for other aspiring authors?

    If you have to re-read something you wrote, start over or find a way to make the sentence flow better. More importantly, just start! The sooner you get your ideas into a draft, the faster you can refine and polish it to become a valuable piece of work.

    Is there anything else you’d like to say?

    I just want to remind everyone that there is no developer uniform. Knowledge is power and if you know your stuff, it shouldn’t matter what you look like!

    Thanks for doing the interview, Adrienne!

    The post PyDev of the Week: Adrienne Tacke appeared first on The Mouse Vs. The Python.

    Categories: FLOSS Project Planets

    Podcast.__init__: Hardware Hacking Made Easy With CircuitPython

    Planet Python - Sun, 2019-05-19 22:29
    Learning to program can be a frustrating process, because even the simplest code relies on a complex stack of other moving pieces to function. When working with a microcontroller you are in full control of everything so there are fewer concepts that need to be understood in order to build a functioning project. CircuitPython is a platform for beginner developers that provides easy to use abstractions for working with hardware devices. In this episode Scott Shawcroft explains how the project got started, how it relates to MicroPython, some of the cool ways that it is being used, and how you can get started with it today. If you are interested in playing with low cost devices without having to learn and use C then give this a listen and start tinkering!Summary

    Learning to program can be a frustrating process, because even the simplest code relies on a complex stack of other moving pieces to function. When working with a microcontroller you are in full control of everything so there are fewer concepts that need to be understood in order to build a functioning project. CircuitPython is a platform for beginner developers that provides easy to use abstractions for working with hardware devices. In this episode Scott Shawcroft explains how the project got started, how it relates to MicroPython, some of the cool ways that it is being used, and how you can get started with it today. If you are interested in playing with low cost devices without having to learn and use C then give this a listen and start tinkering!

    Announcements
    • Hello and welcome to Podcast.__init__, the podcast about Python and the people who make it great.
    • When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With 200 Gbit/s private networking, scalable shared block storage, node balancers, and a 40 Gbit/s public network, all controlled by a brand new API you’ve got everything you need to scale up. And for your tasks that need fast computation, such as training machine learning models, they just launched dedicated CPU instances. Go to pythonpodcast.com/linode to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
    • You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to pythonpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
    • Visit the site to subscribe to the show, sign up for the newsletter, and read the show notes. And if you have any questions, comments, or suggestions I would love to hear them. You can reach me on Twitter at @Podcast__init__ or email hosts@podcastinit.com)
    • To help other people find the show please leave a review on iTunes and tell your friends and co-workers
    • Join the community in the new Zulip chat workspace at pythonpodcast.com/chat
    • Your host as usual is Tobias Macey and today I’m interviewing Scott Shawcroft about CircuitPython, the easiest way to program microcontrollers
    Interview
    • Introductions
    • How did you get introduced to Python?
    • Can you start by explaining what CircuitPython is and how the project got started?
      • I understand that you work at Adafruit and I know that a number of their products support CircuitPython. What other runtimes do you support?
    • Microcontrollers have typically been the domain of C because of the resource and performance constraints. What are the benefits of using Python to program hardware devices?
    • With the wide availability of powerful computing platforms, what are the benefits of experimenting with microcontrollers and their peripherals?
    • I understand that CircuitPython is a friendly fork of MicroPython. What have you changed in your version?
      • How do you structure your development to avoid conflicts with the upstream project?
      • What are some changes that you have contributed back to MicroPython?
    • What are some of the features of CircuitPython that make it easier for users to interact with sensors, motors, etc.?
    • CircuitPython provides an easy on-ramp for experimenting with hardware projects. Is there a point where a user will outgrow it and need to move to a different language or framework?
    • What are some of the most interesting/innovative/unexpected projects that you have seen people build using CircuitPython?
      • Are there any cases of someone building and shipping a production grade project in CircuitPython?
    • What have been some of the most interesting/challenging/unexpected aspects of building and maintaining CircuitPython?
    • What is in store for the future of the project?
    Keep In Touch Picks Links

    The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA

    Categories: FLOSS Project Planets

    Wingware Blog: Selecting Logical Units of Python Code in Wing

    Planet Python - Sun, 2019-05-19 21:00

    In this issue of Wing Tips we take a look at quickly selecting Python code in logical units, which can make some editing tasks easier.

    Select More and Select Less

    The easiest way to select code from the keyboard, starting from the current selection or caret position, is to repeatedly press Ctrl-Up (the Control key together with the up arrow key). Wing selects more and more code, working outward in logical units, as follows:

    Press Ctrl-Up repeatedly to select increasingly larger units of Python code

    If you select too much, pressing Ctrl-Down instead reduces the selection size again:

    Press Ctrl-Down repeatedly to return to selecting smaller units of Python code

    Select Statement, Block or Scope

    Wing also provides commands for selecting the current, previous, or next Statement (a single logical line of code that may span multiple physical lines), Block (a contiguous section of code at same indent level, without any blank lines), or Scope (an entire function, method, or class).

    Here's an example using several of these commands:

    Execute "Select Statement", then "Select Block", "Select Scope", and finally "Select Next Scope"

    Adding Key Bindings

    If you plan to use these commands, you will probably want to bind them to keys using the User Interface > Keyboard > Custom Key Bindings preference for the following commands:

    select-statement next-statement previous-statement select-block next-block previous-block select-scope next-scope previous-scope

    Since free key combinations are often in short supply, you may want to make use of a multi-key sequence in your bindings. For example, pressing Ctrl-\ followed by B results in the binding Ctrl-\ B:

    Add a key binding "Ctrl-\ B" for "select-block"

    This only works if Ctrl-\ is not itself already a binding. Any other free key combination (and not only Ctrl-\) can be used as the starting key in the sequence.



    That's it for now! We'll be back next week with more Wing Tips for Wing Python IDE.

    Categories: FLOSS Project Planets

    Pages