FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023

Planet Python - Mon, 2023-05-29 13:28
Every year, just before the start of PyCon US, core developers, triagers, and special guests gather for the Python Language Summit: an all-day event of talks where the future direction of Python is discussed. The Language Summit 2023 included three back-to-back talks on the C API, an update on work towards making the Global Interpreter Lock optional, and a discussion on how to tackle burnout in the community.
This year's summit received around 45 attendees, and the summit was covered by Alex Waygood.

Attendees of the Python Language Summit


Categories: FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023: Pattern Matching, __match__, and View Patterns

Planet Python - Mon, 2023-05-29 13:22
One of the most exciting new features in Python 3.10 was the introduction of pattern matching (introduced in PEPs 634, 635 and 636). Pattern matching has a wide variety of uses, but really shines in situations where you need to undergo complex destructurings of tree-like datastructures.

That’s a lot of words which may or may not mean very much to you – but consider, for example, using the ast module to parse Python source code. If you’re unfamiliar with the ast module: the module provides tools that enable you to compile Python source code into an “abstract syntax tree” (AST) representing the code’s structure. The Python interpreter itself converts Python source code into an AST in order to understand how to run that code – but parsing Python source code using ASTs is also a common task for linters, such as plugins for flake8 or pylint. In the following example, ast.parse() is used to parse the source code x = 42 into an ast.Module node, and ast.dump() is then used to reveal the tree-like structure of that node in a human-readable form:

>>> import ast >>> source = "x = 42" >>> node = ast.parse(source) >>> node <ast.Module object at 0x000002A70F928D80> >>> print(ast.dump(node, indent=2)) Module( body=[ Assign( targets=[ Name(id='x', ctx=Store())], value=Constant(value=42))], type_ignores=[])

How does working with ASTs relate to pattern-matching? Well, a function to determine whether (to a reasonable approximation) an arbitrary AST node represents the symbol collections.deque might have looked something like this, before pattern matching…

import ast # This obviously won't work if the symbol is imported with an alias # in the source code we're inspecting # (e.g. "from collections import deque as d"). # But let's not worry about that here :-) def node_represents_collections_dot_deque(node: ast.AST) -> bool: """Determine if *node* represents 'deque' or 'collections.deque'""" return ( isinstance(node, ast.Name) and node.id == "deque" ) or ( isinstance(node, ast.Attribute) and isinstance(node.value, ast.Name) and node.value.id == "collections" and node.value.attr == "deque" )

But in Python 3.10, pattern matching allows an elegant destructuring syntax:

import ast def node_represents_collections_dot_deque(node: ast.AST) -> bool: """Determine if *node* represents 'deque' or 'collections.deque'""" match node: case ast.Name("deque"): return True case ast.Attribute(ast.Name("collections"), "deque"): return True case _: return False

I know which one I prefer.

For some, though, this still isn’t enough – and Michael “Sully” Sullivan is one of them. At the Python Language Summit 2023, Sullivan shared ideas for where pattern matching could go next.

Playing with matches (without getting burned)

Sullivan’s contention is that, while pattern matching provides elegant syntactic sugar in simple cases such as the one above, our ability to chain destructurings using pattern matching is currently fairly limited. For example, say we want to write a function inspecting Python AST that takes an ast.FunctionDef node and identifies whether the node represents a synchronous function with exactly two parameters, both of them annotated as accepting integers. The function would behave so that the following holds true:

>>> import ast >>> source = "def add_2(number1: int, number2: int): pass" >>> node = ast.parse(source).body[0] >>> type(node) <class 'ast.FunctionDef'> >>> is_function_taking_two_ints(node) True

With pre-pattern-matching syntax, we might have written such a function like this:

def is_int(node: ast.AST | None) -> bool: """Determine if *node* represents 'int' or 'builtins.int'""" return ( isinstance(node, ast.Name) and node.id == "int" ) or ( isinstance(node, ast.Attribute) and isinstance(node.value, ast.Name) and node.value.id == "builtins" and node.attr == "int" ) def is_function_taking_two_ints(node: ast.FunctionDef) -> bool: """Determine if *node* represents a function that accepts two ints""" args = node.args.posonlyargs + node.args.args return len(args) == 2 and all(is_int(node.annotation) for node in args)

If we wanted to rewrite this using pattern matching, we could possibly do something like this:

def is_int(node: ast.AST | None) -> bool: """Determine if *node* represents 'int' or 'builtins.int'""" match node: case ast.Name("int"): return True case ast.Attribute(ast.Name("builtins"), "int"): return True case _: return False def is_function_taking_two_ints(node: ast.FunctionDef) -> bool: """Determine if *node* represents a function that accepts two ints""" match node.args.posonlyargs + node.args.args: case [ast.arg(), ast.arg()] as arglist: return all(is_int(arg.annotation) for arg in arglist) case _: return False

That leaves a lot to be desired, however! The is_int() helper function can be rewritten in a much cleaner way. But integrating it into the is_function_taking_two_ints() is… somewhat icky! The code feels harder to understand than before, whereas the goal of pattern matching is to improve readability.

Something like this, (ab)using metaclasses, gets us a lot closer to what it feels pattern matching should be like. By using one of Python’s hooks for customising isinstance() logic, it’s possible to rewrite our is_int() helper function as a class, meaning we can seamlessly integrate it into our is_function_taking_two_ints() function in a very expressive way:

import abc import ast class PatternMeta(abc.ABCMeta): def __instancecheck__(cls, inst: object) -> bool: return cls.match(inst) class Pattern(metaclass=PatternMeta): """Abstract base class for types representing 'abstract patterns'""" @staticmethod @abc.abstractmethod def match(node) -> bool: """Subclasses must override this method""" raise NotImplementedError class int_node(Pattern): """Class representing AST patterns signifying `int` or `builtins.int`""" @staticmethod def match(node) -> bool: match node: case ast.Name("int"): return True case ast.Attribute(ast.Name("builtins"), "int"): return True case _: return False def is_function_taking_two_ints(node: ast.FunctionDef) -> bool: """Determine if *node* represents a function that accepts two ints""" match node.args.posonlyargs + node.args.args: case [ ast.arg(annotation=int_node()), ast.arg(annotation=int_node()), ]: return True case _: return False

This is still hardly ideal, however – that’s a lot of boilerplate we’ve had to introduce to our helper function for identifying int annotations! And who wants to muck about with metaclasses?


A slide from Sullivan's talk


A __match__ made in heaven?

Sullivan proposes that we make it easier to write helper functions for pattern matching, such as the example above, without having to resort to custom metaclasses. Two competing approaches were brought for discussion.

The first idea – a __match__ special method – is perhaps the easier of the two to immediately grasp, and appeared in early drafts of the pattern matching PEPs. (It was eventually removed from the PEPs in order to reduce the scope of the proposed changes to Python.) The proposal is that any class could define a __match__ method that could be used to customise how match statements apply to the class. Our is_function_taking_two_ints() case could be rewritten like so:

class int_node: """Class representing AST patterns signifying `int` or `builtins.int`""" # The __match__ method is understood by Python to be a static method, # even without the @staticmethod decorator, # similar to __new__ and __init_subclass__ def __match__(node) -> ast.Name | ast.Attribute: match node: case ast.Name("int"): # Successful matches can return custom objects, # that can be bound to new variables by the caller return node case ast.Attribute(ast.Name("builtins"), "int"): return node case _: # Return `None` to indicate that there was no match return None def is_function_taking_two_ints(node: ast.FunctionDef) -> bool: """Determine if *node* represents a function that accepts two ints""" match node.args.posonlyargs + node.args.args: case [ ast.arg(annotation=int_node()), ast.arg(annotation=int_node()), ]: return True case _: return False

The second idea is more radical: the introduction of some kind of new syntax (perhaps reusing Python’s -> operator) that would allow Python coders to “apply” functions during pattern matching. With this proposal, we could rewrite is_function_taking_two_ints() like so:

def is_int(node: ast.AST | None) -> bool: """Determine if *node* represents 'int' or 'builtins.int'""" match node: case ast.Name("int"): return True case ast.Attribute(ast.Name("builtins"), "int"): return True case _: return False def is_function_taking_two_ints(node: ast.FunctionDef) -> bool: """Determine if *node* represents a function that accepts two ints""" match node.args.posonlyargs + node.args.args: case [ ast.arg(annotation=is_int -> True), ast.arg(annotation=is_int -> True), ] case _: return False
Match-maker, match-maker, make me a __match__
A slide from Sullivan's talk

The reception in the room to Sullivan’s ideas was positive; the consensus seemed to be that there was clearly room for improvement in this area. Brandt Bucher, author of the original pattern matching implementation in Python 3.10, concurred that this kind of enhancement was needed. Łukasz Langa, meanwhile, said he’d received many queries from users of other programming languages such as C#, asking how to tackle this kind of problem.

The proposal for a __match__ special method follows a pattern common in Python’s data model, where double-underscore “dunder” methods are overridden to provide a class with special behaviour. As such, it will likely be less jarring, at first glance, to those new to the idea. Attendees of Sullivan’s talk seemed, broadly, to slightly prefer the __match__ proposal, and Sullivan himself said he thought it “looked prettier”.

Jelle Zijlstra argued that the __match__ dunder would provide an elegant symmetry between the construction and destruction of objects. Brandt Bucher, meanwhile, said he thought the usability improvements weren’t significant enough to merit new syntax.

Nonetheless, the alternative proposal for new syntax also has much to recommend it. Sullivan argued that having dedicated syntax to express the idea of “applying” a function during pattern matching was more explicit. Mark Shannon agreed, noting the similarity between this idea and features in the Haskell programming language. “This is functional programming,” Shannon argued. “It feels weird to apply OOP models to this.”

Addendum: pattern-matching resources and recipes

In the meantime, while we wait for a PEP, there are plenty of innovative uses of pattern matching springing up in the ecosystem. For further reading/watching/listening, I recommend:

Categories: FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023: What is the Standard Library for?

Planet Python - Mon, 2023-05-29 13:22

 Brett Cannon came to the Python Language Summit this year with a fundamental question for the assembled core developers: What is the standard library for?

According to a quick python -c "import sys; print(len(sys.stdlib_module_names))" call on my laptop, the standard library in Python 3.11 consists of 305 importable modules. Many of these are implementation details that, if you’re a good citizen, you really shouldn’t be importing – but the point stands that the Python standard library is perhaps now larger than it should be.

But the goal of his question, Cannon explained, wasn’t to decide which modules to get rid of. Instead, it was to create guidelines on when and why new modules should be accepted into the standard library.

"We need to audit the standard library, and not deprecate it, but decide which bits should probably not have been added if we had perfect hindsight. 

-- Guido van Rossum, CPython Core Developer and former BDFL

Carol Willing agreed that the core dev team shouldn’t be looking to remove modules en masse, but should decide what kinds of modules they wanted to admit in the future. Łukasz Langa agreed, and pointed out that it was often hard removing modules even when we wanted to, due to the fact that “the standard library is a huge import cycle”.

Where do we go now?

Cannon himself put forward two possible answers to his question, before tossing it out to the audience:

  1. The standard library should contain everything required to bootstrap an installer.
  2. The standard library should make it easy for beginners to be able to write scripts without installing anything.

The conversation was free-flowing, but a common point of consensus among the attendees was that the standard library should focus on tools and utilities that allow users to write better Python code. Hynek Schlawack cited dataclasses as an example of a module that made writing classes much less painful, and generally led to them writing better code as a result. (Schlawack is the author of the attrs library, the third-party inspiration for dataclasses, which itself is still going strong.) Filipe Laíns agreed, arguing that the core dev team should focus on building business implementations for third-party libraries to build on top of.

“The default answer for ‘Should this be in the standard library?’ should be ‘No’, but we should bless smaller utilities that help people write better Python code” 

-- Antonio Cuni, HPy Core Developer

There was a certain amount of regret in the air about modules that perhaps should never have been added to the standard library, and had proved themselves to be significant maintenance burdens in the years since, but could now never be removed. tkinter, it was universally agreed, was the primary example here; possibly multiprocessing also.

Guido van Rossum pondered whether asyncio should ever have been added to the standard library, remarking that it had been difficult to evolve asyncio while it was in the standard library, and had possibly been added before it was “fully baked”. The ssl integration had probably been a mistake, he said, and should have been left to third parties.

Łukasz Langa noted that modules such as asyncio and typing, which had continued to evolve rapidly after being added to the standard library, had helped spur new syntax changes to Python that had been to the language’s betterment. Without asyncio in the standard library, Langa argued, we would probably never have adopted the async/await syntax that is now the foundation of asynchronous Python programming.

Zac Hatfield-Dods, maintainer of several prominent third-party packages, said that different standard-library packages had different impacts on the Python ecosystem. Pytest, one of the libraries he maintains, had managed to flourish and find success despite the existence of unittest in the standard library. But another of his libraries, the asynchronous Trio framework, had struggled to attract users while asyncio had been part of the standard library. “Nobody supports alternative async implementations,” he complained, despite Trio’s development often being years ahead of where asyncio is. (In the coffee break afterwards, Hatfield-Dods was keen to emphasise that he is, in fact, a fan of asyncio and the work of the asyncio maintainers.)


Zac Hatfield-Dods (left), speaking at the Language Summit
(Photo by Hugo van Kemenade)


Cannon brought up the question of whether a module like pathlib belonged. “It’s just sugar,” he remarked – i.e., hardly a “core utility” or a protocol that allowed people to write better code. But it has nonetheless been one of the more popular additions to the standard library in recent years. Langa again pushed back, arguing that without the addition of pathlib to the standard library, we would never have added os.PathLike, a protocol that had allowed a common interface for describing file-system paths in Python. “A third-party PyPI package wouldn’t have convinced us to make that change,” Langa argued.

Several attendees noted that adding a module to the standard library often made it hard for users to use features added to the module in newer versions of Python, due to CPython’s slow development cycle. One solution could to provide third-party versions of standard-library modules on PyPI, backporting the latest features of a module to older versions of Python. Thomas Wouters argued that previous attempts at providing these backport modules had often been disastrous. However, Jelle Zijlstra noted that typing_extensions, which backports features from the latest version of the typing module, had been incredibly successful (though it was sometimes hard to maintain).

Overall, there was agreement that the original motivations for a large, “batteries-included” standard library no longer held up to scrutiny. “In the good old days,” Ned Deily reminisced, “We said ‘batteries-included’ because we didn’t have a good story for third-party installation.” But in 2023, installing third-party packages from PyPI is much easier.

Often, Thomas Wouters noted, people preferred using standard-library modules in a corporate setting due to the fact that the installation of any third-party package would require approval from their company’s IT department. But, he noted, this was hardly Python’s problem.

Categories: FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023: Lightning Talks

Planet Python - Mon, 2023-05-29 13:21
The Python Language Summit 2023 closed off with a trio of lightning talks from Dong-hee Na, Carl Meyer and Amethyst Reese.


Dong-hee Na: Let’s support LLVM-BOLT as an official feature

CPython Core Developer Dong-hee Na gave a short presentation on the LLVM-BOLT optimiser, arguing that we should support it as a standard feature of CPython.

LLVM-BOLT is a “post link time binary optimiser” that was adopted by the Pyston project, a performance-oriented fork of CPython 3.8. The Pyston team had reported that use of the optimiser resulted in performance gains of 2-4% on their benchmarks, although the Faster CPython team had reported smaller returns when they had applied the optimiser to Python 3.11.

Dong-hee Na showed benchmark results that showed significant speedups in some areas with LLVM-BOLT applied to Python 3.12, but noted that LLVM-BOLT also caused regressions in some other areas due to overly aggressive optimisations. He announced that he had added support for LLVM-BOLT to CPython as an optional compile-time flag, --enable-bolt, to allow experimentation with the feature.


A slide from Dong-hee Na's talk on LLVM-Bolt


Carl Meyer: Lazy Imports – the sequel

Carl Meyer instigated a short discussion on proposals to introduce a mechanism enabling lazy imports in Python. Following Meyer’s lightning talk on the same subject last year, Meyer – along with his colleague, Germán Méndez Bravo (who, according to Meyer, deserves “most of the credit”) – had written PEP 690, proposing a compile-time flag that would make imports in Python lazy by default. The PEP, however, was rejected by the Steering Council in December 2022, due to concern that the new flag would have created a split in the community between programmers who used Python with lazy imports enabled, and those who used Python with eager imports.

Meyer’s question to the audience was: where next for lazy imports? Was it worth modifying the proposal and trying again, or was the whole idea doomed? Meyer noted that the team at Instagram, where he worked, had seen start-up time improvements of 50-80%, and 40-90% reductions in memory usage, by adopting lazy imports in the fork of CPython they used for the Instagram web server.

Meyer floated a series of possible changes (some mutually exclusive) that could be made to the PEP. For each possible change, he asked if the change would make attendees more or less likely to support adding support for lazy imports to Python:

  1. Explicit opt-in syntax marking a specific import as lazy (e.g. lazy import inspect).
  2. A clear roadmap detailed in the PEP, outlining the timeframe in which it was expected that lazy-import behaviour would become the default in Python.
  3. A promise that the implementation of lazy imports would not lead to any changes being made to the dict data structure.
  4. Generalised support of “lazy names”, rather than just support for lazy imports specifically.

The room unanimously agreed that change (3) would make them more likely to support the PEP, and largely agreed that change (4) would make them less likely to support it. The room was (frustratingly, for Meyer) split on whether proposals (1) and (2) would make them more or less likely to give the PEP their support.

On the bright side, only one attendee said they thought they could never support a proposal for lazy imports in Python. Unfortunately for Meyer, the attendee in question was Thomas Wouters, currently serving on the Steering Council.



Amethyst Reese: Can we __call__ modules?

Amethyst Reese presented on an idea that has since become PEP 713: a proposal to add a mechanism allowing developers to easily create callable modules.


Amethyst Reese at the Python Language Summit
(Photo by Hugo van Kemenade)

Strictly speaking, it’s possible to create a callable module today, but it’s not exactly easy. The example given in the PEP looks something like the following:

import sys import types def fancy(...): ... class FancyModule(types.ModuleType): def __call__(self, ...): return fancy(...) sys.modules[__name__].__class__ = FancyModule

Reese proposes that we provide a simpler mechanism to create callable modules: simply provide special recognition for module-level __call__ functions, similar to the way that PEP 562 added special recognition of module-level __getattr__ and __dir__ functions. With the semantics specified in PEP 713, fancy.py could be rewritten as follows:

def fancy(...): ... __call__ = fancy

With a module, fancy.py, defined like the above, users would simply be able to do the following:

import fancy fancy()

This would allow users of Python to avoid constructs which often feel unnecessarily verbose and involve frustrating amounts of boilerplate, such as:

import datetime import pprint import dis d = datetime.datetime() pprint.pprint(...) dis.dis(...)

It would also allow users to create callable modules in a way that would be easier for type checkers to support, as dynamically inserting custom objects into sys.modules can cause issues for these tools.

The proposal was met with curiosity by attendees of the Language Summit. Thomas Wouters said that he had originally opposed the addition of module-level __getattr__ and __dir__, introduced by PEP 562. However, now they had been introduced to Python, he was of the opinion that it might make sense to add support for module-level dunder methods including __call__, but also others such as __setattr__.

Categories: FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023: Python on Mobile

Planet Python - Mon, 2023-05-29 13:21

At the Python Language Summit 2023, Russell Keith-Magee presented on the ongoing efforts of BeeWare, a project that aims to make it easier to run Python on mobile platforms such as Android and iOS.


The BeeWare logo

Russell Keith-Magee is one busy bee

Improving Python’s story for running on mobile has been a labour of love for Keith-Magee for eight years at this point. Keith-Magee last presented at the Python Language Summit in 2020 (a year when the summit was conducted entirely virtually due to the Covid-19 pandemic). Since then, however, great progress has been made.

The biggest change since his last update, Keith-Magee reported, wasn’t technical – it was financial. For the last year, BeeWare has no longer been a hobby project for Keith-Magee. He is now paid by Anaconda to work on the project full time, along with a colleague, Malcolm Smith. “I now have the resources to do the work” required to make this happen, he announced.

Keith-Magee came to the Language Summit this year with a proposal: to add Android and iOS as platforms with “tier-3” support from CPython in Python 3.13.

What does “tier-3 support” mean? Tier-3 support, as defined by PEP 11, describes a level of support that the CPython core developers commit to giving a specific platform. The CPython test suite is run constantly on popular platforms such as Ubuntu, Windows and MacOS, and test failures on these platforms can block releases until they are fixed. More esoteric platforms, meanwhile, are tested in CI less frequently. Test failures on those platforms will not necessarily block a release of CPython.

Tier-3 support is the current level of support Python provides to the emscripten, WASI and FreeBSD platforms, among others. If a platform has tier-3 support, the test suite will be run on the platform on a regular basis, but not on every pull request. Tier-3 support indicates that at least one core developer has committed to supporting CPython on that platform as best they can. However, test failures on that platform will not block a release of CPython.

The path to tier-3 support

Historically, a significant barrier standing in the way of mobile-platform support from CPython has been the difficulties associated with running tests on mobile platforms in CI. Keith-Magee announced, however, that it was now possible to run the CPython test suite on mobile platforms via Briefcase, BeeWare’s packaging and development tool. (Getting the test suite to pass is another issue – But Keith-Magee felt confident that it would be easy to make progress on that front.) As such, Keith-Magee reported, it should be feasible for CPython to integrate running tests on these platforms into the project’s testing infrastructure on GitHub.

One remaining issue is a fairly specific question, but an important one nonetheless: on a mobile platform, what should sys.platform be? The two major mobile platforms are currently inconsistent about this: on iOS, sys.platform == "ios", whereas on Android, sys.platform == "linux".

The advantage of the first approach is that it is easy for user code to detect whether the code is being run on iOS or not. The advantage of the second approach, meanwhile, is that most existing Python code won’t necessarily account for the possibility that it might be run on Android or iOS, so will run into difficulties with idioms such as the following:

if sys.platform == "linux": do_fancy_linux_only_feature()

The Android platform, Keith-Magee noted, is very similar to Linux, so by setting sys.platform to “linux”, a lot of code “just works” on Android even though the code hasn’t explicitly accounted for the possibility that it might be run on that platform.

Abuzz with excitement

Keith-Magee in flight (photo by Hugo van Kemenade)

Keith-Magee’s talk was greeted enthusiastically by the core developers in the room; there was strong consensus that Python needed a better story on mobile platforms. Carol Willing expressed excitement about the ways in which support for mobile platforms could help Python spread globally, to countries where large numbers of people had no access to desktop computers (but had easy access to phones). Łukasz Langa agreed, noting that he had received many enquiries about Python on mobile after giving a talk on the subject about a year ago. “It’s interesting to a lot of people,” Langa commented. “We need it.”

“Wooooo!” 

-- Carol Willing, CPython Core Developer

On the sys.platform question, Core Developer Filipe Laíns said that he was working on a new API for the sysconfig standard-library module, which will provide a more granular way of distinguishing between platforms from user code. In the meantime, Brett Cannon wondered if BeeWare could use the same approach as CPython builds for WebAssembly: on WebAssembly builds, unusually, sys.platform has a different value to os.name (sys.platform is either "wasi" or "emscripten", but os.name is "linux").

Another outstanding question, however, is what the release process would look like for these new platforms. There was appreciation of the work Keith-Magee had already put into BeeWare, and nobody doubted that he would continue to be committed to the project. However, Keith-Magee is not currently a core developer, leading to a concern that CPython might be supporting a platform that nobody on the core team had expertise in.

Ned Deily, release manager for Python 3.6 and 3.7, worried that distributing CPython binaries for these platforms might not be feasible, as it would make the release process “even more arduous”. Keith-Magee responded that it could be possible to automate the build process for these platforms. If it wasn’t, he said, it also wouldn’t necessarily be essential for CPython to distribute official binaries for these platforms, at least at first.

Where next for BeeWare?

Keith-Magee’s next steps are to work towards upstreaming the patches to CPython that the BeeWare project has made, so that CPython on mobile platforms can “just work” without any changes being made. The alterations that have already been made to support CPython on WebAssembly have made this task much easier, Keith-Magee noted.

Categories: FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023: Three Talks on the C API

Planet Python - Mon, 2023-05-29 13:20
The Python Language Summit 2023 began with an extended discussion of Python’s C API – the interface through which it is possible to communicate between code written in Python and code written in low-level languages, like C. The fullness of this interface is a significant factor behind the vibrancy of Python’s ecosystem, enabling libraries such as NumPy and pandas that are foundational to Python’s widespread use in data science.

Three speakers had booked slots to discuss the C API this year: Mark Shannon, Guido van Rossum, and Antonio Cuni. The conversation evolved naturally from one talk to the next, so in this blog post, I’ll be discussing the three talks together.

All at sea on the C API
The C API at sea (illustration by DALL-E)

“I still don’t know what the C-API is, and I’ve been trying for years!” 

-- Mark Shannon, CPython Core Developer

Mark Shannon spoke on the problems (as he saw them) with Python’s C API. Shannon lamented that with every minor Python version, a slew of third-party C extensions seemed to break. He argued that the root cause was that the needs of C extensions were not adequately met by the formal C API, which had evolved in a haphazard and often-unplanned way over the past three decades of Python’s existence.

As a result of the C API’s flaws, Shannon said, extension authors were forced to reach beyond the formal API and into implementation details that had emerged as a kind of “implicit API”. The implementation details that constituted the new “implicit API” had become so widely depended upon that it was now impossible for CPython to change some parts of its code without breaking large parts of the Python ecosystem.

Shannon believes that the new “implicit API” should be formalised in Python 3.13. This, he argues, would put an end to the cycle of CPython releases inevitably leading to widespread breakages in C extensions.

Sam Gross, who (among other things) has contributed to pytorch, agreed with Shannon that the C API was lacking in many areas. Gross argued that there was a great deal of important functionality that wasn’t exposed to extension authors. “Projects just end up copying-and-pasting CPython C code,” Gross said, meaning the extensions broke with each new release of CPython.

Pablo Galindo Salgado, release manager for Python 3.10 and 3.11, said that the release process for those versions had felt like a “game of whackamole” when it came to third-party C extensions breaking. Salgado argued that CPython needed to reach out to the authors of extensions such as pytorch to gather detailed feedback on what core functionality was missing from the API. Several attendees expressed frustration with a perceived tendency among C extension authors to immediately reach into CPython implementation details when something they needed was missing from the API. The result of this was that CPython core developers were often in the dark about which things the C API should be providing, but currently wasn’t. “We might not be able to give you a solution,” Salgado said, “But please come to us and tell us what your problem is, if you have a problem!”

Gross proposed that CPython should run third-party test suites with new versions of CPython as they were being developed, so that the Core Dev team would be able to spot third-party breakages early and gauge the impact of their changes. Pytorch operated a similar programme, Gross said, and it had been successful in helping to limit breakages of third-party pytorch models as the core pytorch team evolved their API.

Brandt Bucher noted, however, that the problem often wasn’t so much that CPython was unaware when they were breaking third-party code – the benchmarks run in the pyperformance suite often served as an early warning signal for breakages in C extensions. The problem was often that CPython would offer to help affected projects, only to have their help rejected. Several core developers had previously sent pull requests to help third-party projects become compatible with an upcoming version of CPython, only for their pull requests to remain unmerged for several months due to burned-out maintainers of these projects.

Let’s get specific

Guido van Rossum speaks to the Language Summit on the C API
(photo by Hugo van Kemenade)

Shannon was clear about what he thought the problem with the C API was. The problem was that the C API was insufficient for the authors of C extensions, leading these authors to reach into CPython implementation details, leading to an unending cycle of projects breaking with each new release of CPython. Others, however, argued that this wasn’t so much a specific problem but a genre of problems. Each specific project might have a different notion about which things were imperfect with the C API, and which things were missing from the C API. Each imperfection or absence could be considered a concrete problem in its own way. “Things break for everybody, but things break in different ways for different people,” Carol Willing argued. “We need more granularity in our understanding of that.”

As Mark Shannon’s slot drew to an end, Guido van Rossum opted to continue the discussion that Shannon had started, but sought to draw attention to a more precise enumeration of the difficulties C API users were facing.

“There’s lots of ideas here, but I don’t know what the problem is!” 

-- Carol Willing, CPython Core Developer

Simon Cross, a contributor to the HPy project, reported that the HPy project had, in the early stages of the project, put together a long list of the problems, as they saw them, with the CPython C API. Cross offered to share the list with the Core Dev team. Thomas Wouters, a Google employee, also offered to provide a list of difficulties Google had experienced when upgrading to recent Python releases, something the company keeps detailed records of. There was agreement that putting together a comprehensive list of problems with the existing API was an important first step, before CPython could consider drawing up plans to fix the problem.

The C API discussions ended with an agreement that further discussion was required. Interested parties can follow the ongoing conversation at https://github.com/capi-workgroup/problems/issues. The plan is to work towards an informational PEP, with input from an array of stakeholders, outlining a consensus around the problems and pitfalls in the current C API. Once the problems with the status quo have been enumerated in detail, the community might be in a position to consider possible solutions.

HPy: A possible solution?
A slide from Antonio Cuni's talk on HPy

While the C API discussions ended with a detailed discussion of the problems in the current C API, the first talk of the day was in fact by Antonio Cuni, a core developer with the HPy project. HPy is an alternative C API for Python – an API that seeks to avoid many of the pitfalls of the current API. The contention of the HPy developers is that the current C API is bad for CPython, bad for alternative implementations of Python such as PyPy or GraalPython, and, ultimately, bad for end users.

HPy is a specification of a new API and ABI for extending Python that is Python implementation agnostic and designed to hide and abstract internal details 

-- The HPy GitHub README

Cuni began by describing the key goals of the HPy project:

  • An API that doesn’t leak CPython-specific implementation details
  • A 0% (or close to 0%) performance overhead when compared with CPython’s current C API
  • A “Universal ABI” that allows compiled extension modules to use the same interface to communicate with PyPy (for example) as they would do to communicate with CPython
  • An API that is garbage-collection friendly.

Cuni argued that if the Python ecosystem as a whole moved to using HPy, instead of the “official” C API, there would be dramatically fewer breakages of C extensions with each new Python release. Cuni’s proposal was that CPython should “officially bless” HPy as the recommended C API for Python.

HPy Hpy Hooray?

Simon Cross, HPy Core Developer
(photo by Hugo van Kemenade)

Cuni’s talk was exuberant, but it was at times somewhat unclear what exactly he was asking for from the room. “The investment from CPython,” Cuni argued “would be a political investment rather than a technical one”. Thomas Wouters, however, argued that the CPython team didn’t have the “moral authority” to coerce extension authors into using HPy. Hynek Schlawack agreed, arguing that it was perhaps unrealistic to migrate the entire Python ecosystem towards HPy.

Many were uncertain about what it would even mean to “officially bless” HPy – would CPython host HPy’s documentation on docs.python.org? Or would CPython simply add a note to the documentation of the C API that the “official” C API was no longer the recommended way to write a C extension? Guido van Rossum emphasised that a top-down approach from the Core Dev team to extension authors wouldn’t work: nobody wanted a repeat of the decade-long transition from Python 2 to Python 3. Carol Willing agreed that pushing C extension authors to use HPy could be counterproductive, arguing that it was important to remember the impact of our decisions on end users of Python.

Other core developers were sceptical about the fundamentals of HPy itself. HPy’s origin story lies in difficulties PyPy encountered when it trying to use CPython’s existing C API. These difficulties led to attempts to create an API that could be easily used on either Python implementation. Larry Hastings argued that blessing an API that was friendlier to PyPy had clear benefits to PyPy (and other implementations), but that it was less clear where CPython stood to gain from this change.

Cuni’s response was that if the ecosystem migrated to an API that exposed fewer implementation details directly, CPython would be able to refactor its internal implementation far more seamlessly, as CPython would no longer have to worry about breaking a multitude of third-party C extensions with every internal reorganisation of code. Cuni mentioned efforts to run CPython using WebAssembly as a specific area where CPython could stand to gain, as the HPy API could interact far more easily with the Javascript garbage collector. And Cuni also noted that HPy made it easy for extension authors to test whether they were sticking to the API or not, something which is famously hard to know with the current C API. “We don’t know all the experimentation that this might enable,” Cuni exclaimed, however, “Because we haven’t implemented this change yet!”

Mark Shannon was another core developer expressing scepticism about HPy. Shannon argued that while HPy had strengths over the “official” C API, it was far from perfect. “We should try to fix CPython’s API” before CPython recommended users switch to HPy, Shannon argued. Simon Cross, also of the HPy project, said that the team welcomed feedback about where they could improve. It was still easy for HPy to make changes, Cross argued, given they had not yet achieved widespread adoption.

Further Reading on HPy
  1. HPy’s overview of changes needed to the C API.
  2. HPy’s explanation of why the changes are needed.
Categories: FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023: Making the Global Interpreter Lock Optional

Planet Python - Mon, 2023-05-29 13:20
The Global Interpreter Lock (“GIL”), is one of the most fundamental parts of how Python works today. It’s also one of the most controversial parts, as it prevents true concurrency between threads – another way of saying that it’s difficult to run two functions simultaneously while writing pure-Python code.

If there’s one blog that really “took off” after I wrote last year’s coverage on the Python Language Summit, it was my blog on Sam Gross’s proposal to make Python’s Global Interpreter Lock (the “GIL”) optional. One week following the publication of my articles, the blog had been viewed nearly 38,000 times; the blog in “second place” had only been viewed 5,300 times.

Interest in removing the GIL is clear, therefore – and this year, Gross returned to the Python Language Summit to discuss the development of his plans.

Dare to dream of a GIL-free world

Gross started off by giving an update on how work on nogil – Gross’s fork of CPython with the GIL removed – had progressed over the past year. Gross had been spending the last few months rebasing his fork onto CPython 3.12. As a result, he was now able to give an accurate estimate of how bad the performance costs to single-threaded code would be, if the GIL were removed.

Over the past year, Gross had also written a PEP – PEP 703 – which, following the Language Summit, was submitted to the Steering Council for their consideration on May 12. If the PEP is accepted by the Steering Council, a version of Python with the GIL disabled could be available as soon as Python 3.13. (To discuss the ideas in the PEP, head to the thread on discuss.python.org.)

Gross reported that the latest version of nogil was around 6% slower on single-threaded code than the CPython main branch, and that he was confident that the performance overhead could be reduced even further, possibly to nearly 0%. Most of the overhead was due to reference counting and changes that had been required to the operation of CPython’s new-in-3.11 specialising adaptive interpreter. With multiple threads, the performance overhead was around 8%.


A slide from Gross's talk

Having rebased onto Python 3.12, Gross was also able to give an estimate on the extent of the code changes that would be required to CPython, were nogil to be accepted. Excluding generated files, Gross reported that around 15,000 lines of code would need to be changed. There would also need to be some alterations to mimalloc, which nogil depends on – but Gross was hopeful that these would be accepted to mimalloc itself, reducing the need for CPython to maintain mimalloc patches in its codebase.

Some audience members expressed concern about the prospect of “#ifdef hell” across the code base, to account for the two possible CPython build options. However, Gross countered that most of the changes required could be made unconditionally without resorting to #ifdefs, since the changes were simply no-ops with the GIL enabled.

The plan for nogil remains that it would be enabled via a compile-time flag, named --disable-gil. Third-party C extensions would need to provide separate wheels for GIL-disabled Python.

GIL gotta go

Last year, it felt as though Gross’s proposal was met with a certain degree of excitement, but also a certain degree of scepticism. It was thrilling to see how far nogil had come, but there was a certain amount of frustration in the room at the lack of a concrete plan for where to go next.

This year, it felt like the proposal was greeted much more warmly. Gross had come to the summit with a far more clearly defined roadmap for how we might get to nogil, and had taken the time to put together detailed estimates of the performance impact and the extent of the code changes.

“Thank you for doing this… now we have something we can actually consider!” 

-- Brandt Bucher, CPython Core Developer

Attendees at the Language Summit were also impressed at how low the performance overhead of nogil was, although Mark Shannon commented that he thought the numbers might be “a slight underestimate”. Gross explained that he had managed to achieve a 2-3% speedup by optimising for the case where only a single thread was active. Even in multithreaded programs, he explained, this provided a performance boost, since even in multithreaded code, it was often the case that only a single thread would be attempting to access an object at any given point in time.

Larry Hastings expressed concern that nogil might make debugging code harder; Gross responded that there was some impact on debuggability, but that it wasn’t too bad, in his opinion. Pablo Galindo Salgado, release manager for Python 3.10 and 3.11, expressed concern that nogil could also make implementing debuggers trickier – but commented that “it might be worth the price” anyway.

Another point of discussion was the changes Gross had made to the specialising adaptive interpreter in the nogil fork. In order for the specialisations to work with the GIL disabled, Gross had had to guard the adaptive specialisations to the bytecode behind a lock. As well as this, each thread had been limited to a single specialisation of any given bytecode; with the GIL enabled, the adaptive interpreter can respecialise bytecode multiple times. Gross commented that he thought it would probably be possible to allow multiple specialisations of bytecode in multithreaded code, but that this would require further investigation. His current solution was the simplest one he had found, for now.

Categories: FLOSS Project Planets

Python Software Foundation: The Python Language Summit 2023: Towards Native Profiling for Python

Planet Python - Mon, 2023-05-29 13:20
Joannah Nanjekye came to the Python Language Summit 2023 to discuss innovations by Scalene, a sampling-based Python profiler that can distinguish between native code and Python code in its reports. After its initial release in late 2019, Scalene has become one of the most popular Python profiling tools. It has now been downloaded 500,000 times from PyPI.


The Scalene project logo

A profiler is a tool that can monitor a program as it is running. Once the program has run, the profiler can provide a report analysing which lines of code were visited most often, which were the most expensive in terms of time spent, and which were the most expensive in terms of memory usage. Profilers can therefore be hugely useful tools for addressing performance issues in code. If you’re unsure where your program is spending most of its time, it can be hard to optimise it.

Profilers can be split into two broad categories: trace-based profilers and sampling-based profilers. Trace-based profilers work by intercepting each function call as your program is running and logging information about the time spent, memory usage, etc. Sampling-based profilers, meanwhile, take snapshots of your program at periodic intervals to monitor these things. A trace-based profiler has the advantage that it can provide a granular and precise level of detail about which lines of code were executed and when each function call finishes; this makes it ideal for use as a tool to monitor test coverage, for example. However, injecting tracing hooks into each function call can sometimes slow down a program and distort the analysis of where most time was spent. As a result, sampling-based profilers are sometimes preferred for profiling performance.

Scalene is a sampling-based profiler, and aims to address the shortcomings of previous sampling-based profilers for Python. One of the key challenges sampling-based profilers have faced in the past has been accurately measuring the time Python programs spend in “native code”.


Slide from Nanjekye’s talk, illustrating sampling-based profiling


Handling the problem of native code

“Native code”, also sometimes referred to as “machine code”, refers to code consisting of low-level instructions that can be interpreted directly by the hardware processor. Using extensions to Python written in C, C++ or Rust that will compile to native code – such as NumPy, scikit-learn, and TensorFlow – can lead to dramatic speedups for a program written in Python.

It also, however, makes life difficult for sampling-based profilers. Samplers often use Python’s signal module as a way of knowing when to take a periodic snapshot of a program as it is running. However, due to the way the signal module works, no signalling events will be delivered while a Python program is spending time in a function that has been compiled to native code via an extension module. The upshot of this is that sample-based profilers are often “flying blind” for Python code that makes extensive use of C extensions, and will sometimes erroneously report that no time at all was spent executing native code, even if the program in fact spent the majority of its time there.

Scalene’s solution to this problem is to monitor delays in signal delivery. It uses this information to deduce the amount of time that the program spent outside CPython’s main interpreter loop (due to the use of native, compiled code from an extension module). Further details on Scalene’s methods, and comparisons with other leading Python profilers, can be found in a recent paper by Emery D. Berger, Sam Stern and Juan Altmayer Pizzorno, “Triangulating Python Performance Issues with Scalene”.

Nanjekye also detailed Scalene’s sophisticated approach to measuring performance in child threads. Signal-based profilers often struggle with multi-threaded code, as signals can only be delivered and received from the main thread in Python. Scalene’s solution is to monkey-patch functions that might block the main thread, and add timeouts to these functions. This allows signals to be delivered even in multithreaded code.

Discussion

Nanjekye asked attendees at the Language Summit if they would be interested in integrating Scalene's ideas into the standard library's cProfile module, which was met with a somewhat muted response.

Pablo Galindo Salgado, a leading contributor to the Memray profiler, criticised Scalene’s signal-based approach, arguing it relied on inherently brittle monkey-patching of the standard library. It also reported unreliable timings, Salgado said: for example, if code in a C extension checks for signals to support CTRL-C, the resulting delays measured by Scalene will be distorted.

Salgado argued that integration with the perf profiler, which Python is introducing support for in Python 3.12, would be a better option for users. Mark Shannon, however, argued that perf distorted the execution time of Python programs; Salgado responded that Scalene did as well, as the use of signals came with its own overhead.

Nanjekye argued that the huge popularity of Scalene in the Python ecosystem was evidence that it had proved its worth. Carol Willing concurred, noting that Scalene was an especially useful tool with code that made heavy use of libraries such as NumPy, Scikit-Learn and PyTorch.

Categories: FLOSS Project Planets

John Goerzen: Recommendations for Tools for Backing Up and Archiving to Removable Media

Planet Debian - Mon, 2023-05-29 12:57

I have several TB worth of family photos, videos, and other data. This needs to be backed up — and archived.

Backups and archives are often thought of as similar. And indeed, they may be done with the same tools at the same time. But the goals differ somewhat:

Backups are designed to recover from a disaster that you can fairly rapidly detect.

Archives are designed to survive for many years, protecting against disaster not only impacting the original equipment but also the original person that created them.

Reflecting on this, it implies that while a nice ZFS snapshot-based scheme that supports twice-hourly backups may be fantastic for that purpose, if you think about things like family members being able to access it if you are incapacitated, or accessibility in a few decades’ time, it becomes much less appealing for archives. ZFS doesn’t have the wide software support that NTFS, FAT, UDF, ISO-9660, etc. do.

This post isn’t about the pros and cons of the different storage media, nor is it about the pros and cons of cloud storage for archiving; these conversations can readily be found elsewhere. Let’s assume, for the point of conversation, that we are considering BD-R optical discs as well as external HDDs, both of which are too small to hold the entire backup set.

What would you use for archiving in these circumstances?

Establishing goals

The goals I have are:

  • Archives can be restored using Linux or Windows (even though I don’t use Windows, this requirement will ensure the broadest compatibility in the future)
  • The archival system must be able to accommodate periodic updates consisting of new files, deleted files, moved files, and modified files, without requiring a rewrite of the entire archive dataset
  • Archives can ideally be mounted on any common OS and the component files directly copied off
  • Redundancy must be possible. In the worst case, one could manually copy one drive/disc to another. Ideally, the archiving system would automatically track making n copies of data.
  • While a full restore may be a goal, simply finding one file or one directory may also be a goal. Ideally, an archiving system would be able to quickly tell me which discs/drives contain a given file.
  • Ideally, preserves as much POSIX metadata as possible (hard links, symlinks, modification date, permissions, etc). However, for the archiving case, this is less important than for the backup case, with the possible exception of modification date.

I would welcome your ideas for what to use. Below, I’ll highlight different approaches I’ve looked into and how they stack up.

Basic copies of directories

The initial approach might be one of simply copying directories across. This would work well if the data set to be archived is smaller than the archival media. In that case, you could just burn or rsync a new copy with every update and be done. Unfortunately, this is much less convenient with data of the size I’m dealing with. rsync is unavailable in that case. With some datasets, you could manually design some rsyncs to store individual directories on individual devices, but that gets unwieldy fast and isn’t scalable.

You could use something like my datapacker program to split the data across multiple discs/drives efficiently. However, updates will be a problem; you’d have to re-burn the entire set to get a consistent copy, or rely on external tools like mtree to reflect deletions. Not very convenient in any case.

So I won’t be using this.

tar or zip

While you can split tar and zip files across multiple media, they have a lot of issues. GNU tar’s incremental mode is clunky and buggy; zip is even worse. tar files can’t be read randomly, making it extremely time-consuming to extract just certain files out of a tar file.

The only thing going for these formats (and especially zip) is the wide compatibility for restoration.

dar

Here we start to get into the more interesting tools. Dar is, in my opinion, one of the best Linux tools that few people know about. Since I first wrote about dar in 2008, it’s added some interesting new features; among them, binary deltas and cloud storage support. So, dar has quite a few interesting features that I make use of in other ways, and could also be quite helpful here:

  • Dar can both read and write files sequentially (streaming, like tar), or with random-access (quick seek to extract a subset without having to read the entire archive)
  • Dar can apply compression to individual files, rather than to the archive as a whole, faciliting both random access and resilience (corruption in one file doesn’t invalidate all subsequent files). Dar also supports numerous compression algorithms including gzip, bzip2, xz, lzo, etc., and can omit compressing already-compressed files.
  • The end of each dar file contains a central directory (dar calls this a catalog). The catalog contains everything necessary to extract individual files from the archive quickly, as well as everything necessary to make a future incremental archive based on this one. Additionally, dar can make and work with “isolated catalogs” — a file containing the catalog only, without data.
  • Dar can split the archive into multiple pieces called slices. This can best be done with fixed-size slices (–slice and –first-slice options), which let the catalog regord the slice number and preserves random access capabilities. With the –execute option, dar can easily wait for a given slice to be burned, etc.
  • Dar normally stores an entire new copy of a modified file, but can optionally store an rdiff binary delta instead. This has the potential to be far smaller (think of a case of modifying metadata for a photo, for instance).
  • Must be easy enough to do, and sufficiently automatable, to allow frequent updates without error-prone or time-consuming manual hassle

Additionally, dar comes with a dar_manager program. dar_manager makes a database out of dar catalogs (or archives). This can then be used to identify the precise archive containing a particular version of a particular file.

All this combines to make a useful system for archiving. Isolated catalogs are tiny, and it would be easy enough to include the isolated catalogs for the entire set of archives that came before (or even the dar_manager database file) with each new incremental archive. This would make restoration of a particular subset easy.

The main thing to address with dar is that you do need dar to extract the archive. Every dar release comes with source code and a win64 build. dar also supports building a statically-linked Linux binary. It would therefore be easy to include win64 binary, Linux binary, and source with every archive run. dar is also a part of multiple Linux and BSD distributions, which are archived around the Internet. I think this provides a reasonable future-proofing to make sure dar archives will still be readable in the future.

The other challenge is user ability. While dar is highly portable, it is fundamentally a CLI tool and will require CLI abilities on the part of users. I suspect, though, that I could write up a few pages of instructions to include and make that a reasonably easy process. Not everyone can use a CLI, but I would expect a person that could follow those instructions could be readily-enough found.

One other benefit of dar is that it could easily be used with tapes. The LTO series is liked by various hobbyists, though it could pose formidable obstacles to non-hobbyists trying to aceess data in future decades. Additionally, since the archive is a big file, it lends itself to working with par2 to provide redundancy for certain amounts of data corruption.

git-annex

git-annex is an interesting program that is designed to facilitate managing large sets of data and moving it between repositories. git-annex has particular support for offline archive drives and tracks which drives contain which files.

The idea would be to store the data to be archived in a git-annex repository. Then git-annex commands could generate filesystem trees on the external drives (or trees to br burned to read-only media).

In a post about using git-annex for blu-ray backups, an earlier thread about DVD-Rs was mentioned.

This has a few interesting properties. For one, with due care, the files can be stored on archival media as regular files. There are some different options for how to generate the archives; some of them would place the entire git-annex metadata on each drive/disc. With that arrangement, one could access the individual files without git-annex. With git-annex, one could reconstruct the final (or any intermediate) state of the archive appropriately, handling deltions, renames, etc. You would also easily be able to know where copies of your files are.

The practice is somewhat more challenging. Hundreds of thousands of files — what I would consider a medium-sized archive — can pose some challenges, running into hours-long execution if used in conjunction with the directory special remote (but only minutes-long with a standard git-annex repo).

Ruling out the directory special remote, I had thought I could maybe just work with my files in git-annex directly. However, I ran into some challenges with that approach as well. I am uncomfortable with git-annex mucking about with hard links in my source data. While it does try to preserve timestamps in the source data, these are lost on the clones. I wrote up my best effort to work around all this.

In a forum post, the author of git-annex comments that “I don’t think that CDs/DVDs are a particularly good fit for git-annex, but it seems a couple of users have gotten something working.” The page he references is Managing a large number of files archived on many pieces of read-only medium. Some of that discussion is a bit dated (for instance, the directory special remote has the importtree feature that implements what was being asked for there),

git-annex supplies win64 binaries, and git-annex is included with many distributions as well. So it should be nearly as accessible as dar in the future. Since git-annex would be required to restore a consistent recovery image, similar caveats as with dar apply; CLI experience would be needed, along with some written instructions.

Bacula and BareOS

Although primarily tape-based archivers, these do also also nominally support drives and optical media. However, they are much more tailored as backup tools, especially with the ability to pull from multiple machines. They require a database and extensive configuration, making them a poor fit for both the creation and future extractability of this project.

Conclusions

I’m going to spend some more time with dar and git-annex, testing them out, and hope to write some future posts about my experiences.

Categories: FLOSS Project Planets

Real Python: Choosing the Best Coding Font for Programming

Planet Python - Mon, 2023-05-29 10:00

When you’re coding, there’s always a font involved in displaying the text on the screen. Yet, the font that you use is an often-overlooked piece in your programming tool kit. Many operating systems and code editors come with their default monospace fonts that you may end up using.

But there are a bunch of technicalities and features to take into consideration when choosing the best font for your daily programming. That’s why it’s worth investigating the requirements that a programming font should fulfill to make it a perfect match for you.

In this tutorial, you’ll learn:

  • How to spot a high-quality coding font
  • What characters are important when coding in Python
  • Which features of a programming font matter
  • Where to download programming fonts
  • How to install a font on your operating system

Throughout the tutorial, you’ll consider twenty-seven hand-picked programming fonts that you can use right away. You’ll take a close look at all the fonts and investigate why their particular features are important for you as a programmer. In the end, you’ll be able to decide which coding fonts suit your needs best.

If you want to keep a list of the fonts for future reference, then you can get an overview PDF of all the fonts by clicking the link below:

Free Bonus: Click here to download your comprehensive guide to all the coding fonts in this tutorial.

Say Hello to Your Next Coding Font

In this tutorial, you’ll take a deep dive and learn about the forms and shapes that make a quality coding font. You’ll encounter significant technical details, handy features, and surprising quirks along the way.

Note: In this tutorial, you’ll focus on the characteristics of fonts that are suitable for Python programming. But the fonts will work similarly in any other programming language.

You’ll notice that all the images in this tutorial have a similar style. The font name is always on the left. On the right, you’ll see a sample string or specific characters worth noting. To kick things off, have a look at the fonts that you’ll examine in this tutorial:

It’s a good idea to open this image in another window and keep it open while you read the tutorial. For example, you can right-click the image above, select Open Image in New Window, and then drag the tab into a new window:

Having the tutorial and the fonts list side by side lets you conveniently compare certain fonts with others. You can even go a step further and print the image to annotate the fonts with your likes and dislikes.

If you’re already excited to try out new fonts in your own coding editor, then you can scroll down to the get your new coding font section. There you can download all the fonts in this tutorial and load them up in your editor. That way, you can get a head start and evaluate any font with your own editor theme, preferred font size, and some familiar code.

Note: Color schemes and font sizes are very personal settings that can easily skew your evaluation of a font. That’s why you won’t see any larger code samples set in any font in this tutorial.

Instead, you’ll see many examples that objectively focus on specific font characteristics, independently of any coding editors.

Once you try out a coding font, you’ll recognize if you like or dislike it. Sometimes it’s just a matter of taste. But sometimes there are reasons why one font might be a better fit for you while another isn’t.

Over the next few sections, you’ll get a tool set so that you can evaluate for yourself what makes a quality programming font.

Consider the Basics

Whether you just wrote your first Hello, World! script or maintain huge codebases, choosing the right programming font will be beneficial for you as a developer.

Before you dig into the characteristics of a font on a character level, there are some broader features to take into consideration. Some of them may filter out a font in your search from the get-go.

For example, a font might not be a good choice if the font doesn’t support your language or if it costs money that you’re not willing to invest. Another criterion could be that the font must be monospace.

In this section, you’ll explore important points of comparison. That way, you’ll know which fonts to include in your circle of coding fonts to consider more closely.

Does the Font File Format Matter? Read the full article at https://realpython.com/coding-font/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python for Beginners: Tuple Comprehension in Python

Planet Python - Mon, 2023-05-29 09:00

You might have read about list comprehension, dictionary comprehension, set comprehension, etc in Python. However, there is no tuple comprehension in Python. In this article, we will discuss why there is no tuple comprehension in Python. We will also discuss how we can emulate tuple comprehension in Python.

Table of Contents
  1. Why Don’t We Have Tuple Comprehension in Python?
  2. Imitate Tuple Comprehension Using The Generator Comprehension
  3. Imitate Tuple Comprehension Using List Comprehension 
  4. Conclusion
Why Don’t We Have Tuple Comprehension in Python?

Comprehensions are used in Python to create iterable objects from other iterable objects. For instance, we define a list comprehension using the square brackets [] to create lists from other iterable objects using the following syntax.

newList=[element for element in iterable if condition]

You can observe this in the following example.

oldList=[1,2,3,4,5,6] print("The old list is:") print(oldList) newList=[element**2 for element in oldList] print("The new list is:") print(newList)

Output:

The old list is: [1, 2, 3, 4, 5, 6] The new list is: [1, 4, 9, 16, 25, 36]

Similarly, we define set comprehension using curly braces {} to create sets from other iterable using the following syntax.

newSet={element for element in iterable if condition}

You can observe this in the following example.

oldSet={1,2,3,4,5,6} print("The old set is:") print(oldSet) newSet={element**2 for element in oldSet} print("The new set is:") print(newSet)

Output:

The old set is: {1, 2, 3, 4, 5, 6} The new set is: {1, 4, 36, 9, 16, 25}

Now, you might think that we can define tuple comprehension using parentheses () to create tuples from other iterable objects using the following syntax.

newTuple= (element for element in iterable if condition)

Let’s try and create a tuple using the above syntax.

oldTuple=(1,2,3,4,5,6) print("The old tuple is:") print(oldTuple) newTuple=(element**2 for element in oldTuple) print("The new tuple is:") print(newTuple)

Output:

The old tuple is: (1, 2, 3, 4, 5, 6) The new tuple is: <generator object <genexpr> at 0x7fa988686f10>

In the output, you can observe that we get a generator object instead of a tuple. Thus, the syntax (element for element in iterable if condition) is used for generator comprehension and not tuple comprehension. 

Hence, we don’t have tuple comprehension in Python as the parentheses are used for generator comprehension. Another reason for the absence of tuple comprehension might be that tuples are immutable. 

Imitate Tuple Comprehension Using The Generator Comprehension

Although we cannot have tuple comprehension in Python, you can imitate it using different ways. For instance, you can use the following steps to imitate tuple comprehension using generator comprehension.

  • First, you need to create a generator using generator comprehension to obtain the required elements in the output tuple. 
  • Then, you can use the tuple() function to obtain the output tuple. The tuple() function takes the generator as its input argument and returns a tuple.

You can observe this in the following example.

oldTuple=(1,2,3,4,5,6) print("The old tuple is:") print(oldTuple) newTuple=tuple((element**2 for element in oldTuple)) print("The new tuple is:") print(newTuple)

Output:

The old tuple is: (1, 2, 3, 4, 5, 6) The new tuple is: (1, 4, 9, 16, 25, 36)

In the above output, you can observe that we have obtained a tuple using generator comprehension and the tuple() function.

Instead of the tuple() function, you can also use the unpacking operator to imitate tuple comprehension using the generator. For this, you just need to unpack the generator created from generator comprehension using the * operator. Then, you can place a comma “,” at the end of the unpacked elements to create a tuple as shown below.

oldTuple=(1,2,3,4,5,6) print("The old tuple is:") print(oldTuple) newTuple=*(element**2 for element in oldTuple), print("The new tuple is:") print(newTuple)

Output:

The old tuple is: (1, 2, 3, 4, 5, 6) The new tuple is: (1, 4, 9, 16, 25, 36)

In the above syntax, don’t forget the comma “,” at the end of the generator after unpacking it.

Imitate Tuple Comprehension Using List Comprehension 

Just like generator comprehension, you can also use list comprehension and the tuple() function to imitate tuple comprehension as shown below.

oldTuple=(1,2,3,4,5,6) print("The old tuple is:") print(oldTuple) newTuple=tuple([element**2 for element in oldTuple]) print("The new tuple is:") print(newTuple)

Output:

The old tuple is: (1, 2, 3, 4, 5, 6) The new tuple is: (1, 4, 9, 16, 25, 36)

You can also use the unpacking operator with list comprehension to imitate tuple comprehension in Python as shown below.

oldTuple=(1,2,3,4,5,6) print("The old tuple is:") print(oldTuple) newTuple=*[element**2 for element in oldTuple], print("The new tuple is:") print(newTuple)

Output:

The old tuple is: (1, 2, 3, 4, 5, 6) The new tuple is: (1, 4, 9, 16, 25, 36) Conclusion

In this article, we discussed why there is no tuple comprehension in Python. We also discussed how we can imitate tuple comprehension using list and generator comprehension. Here, we have discussed four ways to imitate tuple comprehension. Out of these methods, the approach using the tuple() function and generator comprehension is the fastest whereas the approach using the list comprehension and unpacking operator is the slowest.

To learn more about Python programming, you can read this article on dictionary comprehension in Python. You might also like this article on Python continue vs break statements.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy Learning!

The post Tuple Comprehension in Python appeared first on PythonForBeginners.com.

Categories: FLOSS Project Planets

Jonathan Carter: MiniDebConf Germany 2023

Planet Debian - Mon, 2023-05-29 08:48

This year I attended Debian Reunion Hamburg (aka MiniDebConf Germany) for the second time. My goal for this MiniDebConf was just to talk to people and make the most of the time I have there. No other specific plans or goals. Despite this simple goal, it was a very productive and successful event for me.

Tuesday 23rd:

  • Arrived much later than planned after about 18h of travel, went to bed early.

Wednesday 24th:

  • Was in a discussion about individual package maintainership.
  • Was in a discussion about the nature of Technical Committee.
  • Co-signed a copy of The Debian System book along with the other DDs
  • Submitted a BoF request for people who are present to bring issues to the attention of the DPL (and to others who are around).
  • Noticed I still had a blog entry draft about this event last year, and posted it just to get it done.
  • Had a stand-up meeting, was nice to see what everyone was working on.
  • Had some event budgeting discussions with Holger.
  • Worked a bit on a talk I haven’t submitted yet called “Current events” (it’s slightly punny, get it?) – it’s still very raw but I’m passively working on it just in case we need a backup talk over the weekend.
  • Had a discussion over lunch with someone who runs their HPC on Debian and learned about Octopus and Pac.
  • TIL (from -python) about pyproject.toml (https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/)
  • Was in a discussion about amd64 build times on our buildds and referred them to DSA. I also e-mailed DSA to ask them if there’s anything we can do to improve build times (since it affects both productivity and motivation).
  • Did some premium cola tasting with andrewsh
  • Had a discussion with Ilu about installers (and luks2 issues in Calamares), accessibility and organisational stuff.

Thursday 25th:

  • Spent quite a chunk of the morning in a usrmerge BoF. I’m very impressed by the amount of reading and research the people in the BoF did and gathering all the facts/data, it seems that there is now a way forward that will fix usrmerge in Debian in a way that could work for everyone, an extensive summary/proposal will be posted to debian-devel as soon as possible.
  • Mind was in zombie mode. So I did something easy and upgraded the host running this blog and a few other hosts to bookworm to see what would break.
  • Cheese and wine party, which resulted in a mao party that ran waaaay too late.

Friday 26th:

Saturday 27th:

  • Attended talks:
    • HTTP all the things – The rocky path from the basement into the “cloud”
    • Running Debian on a Smartphone
    • debvm – Ephemeral Virtual Debian Machines
    • Network Configuration on Debian Systems
    • Discussing changes to the Debian key package definition
    • Meet the Release Team
    • Towards collective decision-making and maintenance in the Debian base system
  • Performed some PGP key signing.
  • Edited group photo.

Sunday 28th:

  • Had a BoF where we had an open discussion about things on our collective minds (Debian Therapy Session).
  • Had a session on upcoming legislature in the EU (like CRA).
  • Some web statistics with MrFai.
  • Talked to Marc Haber about a DebConf bid for Heidelberg for DebConf 25.
  • Closing session.

Monday 29th:

  • Started the morning with Helmut and Jörgen convincing me switch from cowbuilder to sbuild (I’m tentatively sold, the huge new plus is that you don’t need schroot anymore, which trashed two of my systems in the past and effectively made sbuild a no-go for me until now).
  • Dealt with more laptop hardware failures, removing a stick of RAM seems to have solved that for now!

Das is nicht gut.

  • Dealt with some delegation issues for release team and publicity team.
  • Attended my last stand-up meeting.
  • Wrapped things up, blogged about the event. Probably forgot to list dozens of things in this blog entry. It is fine.

Tuesday 30th:

  • Didn’t attend the last day, basically a travel day for me.

Thank you to Holger for organising this event yet again!

Categories: FLOSS Project Planets

Mike Driscoll: PyDev of the Week: Draga Doncila Pop

Planet Python - Mon, 2023-05-29 08:33

This week we welcome Draga Doncila Pop as our PyDev of the Week! Draga is a core developer of the napari package, which is a multi-dimensional image viewer for Python. Draga also speaks at Python conferences about Python and data visualization.

You can see what else Draga is up to by visiting Draga’s GitHub profile.

Let’s spend some time getting to know Draga better!

Can you tell us a little about yourself (hobbies, education, etc):

I am a Romanian born Kiwi living in Australia. I’m currently studying towards my PhD in computer science and working part time as a software engineer. In my spare time I like to read a lot of fantasy and science fiction – I’m currently re-reading the Red Rising saga while I wait for Brandon Sanderson’s next novel. I also love baking, mostly sweet treats – but I really want to conquer yeasted doughs next.

Why did you start using Python?

Python was the second programming language I learned. The first was Borland Delphi, when I was 14. I love Python because it’s so accessible. It’s so easy to spin up something quickly, but it has all the flexibility you need for more complex applications. It used to be that it was way too slow for many applications, but these days even getting C-esque speedups isn’t that bad.

What other programming languages do you know and which is your favorite?

I know Python best, but I’ve dabbled in Java, C#, C, javascript (and its many frameworks), typescript and even a little bit of Haskell. I think Python is my favourite, but I work so much with Python that it’s hard to make this a fair comparison. I love the ethos of Python, its community, and the difference it’s made to scientific computing. I do love Haskell a lot too, just because it’s a totally different paradigm – I wish I knew it better.

What projects are you working on now?

I mostly work on napari now, and its associated bits and pieces. I’m also working on some packages as part of my PhD, mostly utilities for cell tracking, and eventually a napari plugin. I contribute here and there to other projects as well, but napari is my main focus.

Which Python libraries are your favorite (core or 3rd party)?

I’m a sucker for the array-likes. Zarr, NumPy, dask.array, SciPy sparse… I love the (often…) seamless abstraction of the array interface despite the very different backends and use cases. I love the scientific Python stack in general.

How did you get involved with the napari project?

I worked with napari during my Honours year research project, analyzing high resolution satellite images. I really loved the community and the project itself, so I continued working on it while taking a break from study. Now I’m a core developer, and it’s part of my PhD work as well, and I think I’m very privileged to get to work on something so cool almost full time.

What are the top three things you love about napari?
    • Seeing people’s cool demos and interesting data. It’s just so neat! People have all sorts of fascinating applications, and the results are often absolutely beautiful.
    • The fact that it makes research easier for people. I love that software/computer science can help accelerate research in all sorts of fields, and working with something where I can see the real world impact is very satisfying.
    • It’s open source. This one’s pretty self explanatory, but I just think it’s fantastic that it’s free, anyone can use it, anyone can contribute to it – I think that’s how it should be. Especially for a project that’s meant to support research.

Thanks for doing the interview, Draga!

The post PyDev of the Week: Draga Doncila Pop appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Stack Abuse: Simple NLP in Python with TextBlob: Pluralization and Singularization

Planet Python - Mon, 2023-05-29 08:10
Introduction

In today's digital world, there is a vast amount of text data created and transferred in the form of news, tweets, and social media posts. Can you imagine the time and effort needed to process them manually? Fortunately, Natural Language Processing (NLP) techniques help us manipulate, analyze, and interpret text data quickly and efficiently. NLP is a branch of Artificial Intelligence, where we train machines to understand human language and perform tasks ranging from summarization and translation to sentiment analysis.

One of the common requirements in NLP is dealing with singular and plural forms of words and converting one to another. In this article, we'll discuss how to perform pluralization and singularization of words using the Python package TextBlob.

What do singularization and pluralization mean?

We know nouns can be singular or plural, such as book/books, cat/cats, and sweet/sweets. We need techniques to convert text to plural/singular forms to help in data preprocessing and to achieve better accuracy in translation, text analysis, and text generation. This task of conversion from singular to plural and vice versa is referred to as word inflection in NLP.

Let's look at the common patterns/rules for converting singular to plural in the English language:

  • Adding "-s" at the end: This is the most common pattern. For example, cot->cots, desk->desks, lake->lakes. Removing 's' is the rule followed for plural to singular conversion.

  • Adding "-es" at the end: This rule is usually followed for words ending in 's', 'x', 'z', 'ch', or 'sh' sounds. Examples: bus -> buses, church-> churches, brush-> brushes.

  • Replacing "-y" with "-ies": It is followed for nouns ending in a consonant along with a "y". Examples: puppy -> puppies, city -> cities.

  • Irregular patterns: Some nouns do not follow any of the patterns mentioned above. For example, child->children, foot->feet, mouse->mice. Handling these can be a bit tricky.

Introduction to TextBlob and Installation

The Python programming language provides a variety of packages that allow us to implement various tasks in NLP. One of the widely used packages is TextBlob, which offers functions to easily perform various NLP tasks, including singularization/pluralization. This library is built on top of NLTK (Natural Language Toolkit) and is easy to learn. You can check out the official documentation of TextBlob to learn about all the functions it offers.

Let's start by installing the library using the pip package manager by running the command below.

$ pip install textblob

Once the installation is complete, you can import the library into your Python notebook or script.

from textblob import Word

You can import the Word class from the module. When a text is passed in the form of a string to this class, a TextBlob Word object will be created, upon which various functions can be called to perform tokenization, word inflection, lemmatization, etc.

In the snippet below, we create a TextBlob object of the Word class 'doc1' by passing a text string.

text = "I usually take a bus from my university to the park." doc1 = Word(text)

In the next sections, we'll show how different functions can be used on the TextBlob class object to perform pluralization and singularization.

Pluralization with TextBlob

We can easily convert a noun from its singular to plural form using the pluralize() function in TextBlob. Let's look at how to get the plural form of 'puppy' in the example code below. Simply create a TextBlob object of the word, and call the function pluralize() on it.

from textblob import Word blob = Word("puppy") pluralized_word = blob.pluralize() print(pluralized_word) # Output: puppies

TextBlob can handle most of the common patterns of pluralization, including irregular ones. Let's check this out with a few more examples:

plural_form = Word("box").pluralize() print(plural_form) # Output: boxes plural_form = Word("man").pluralize() print(plural_form) # Output: men plural_form = Word("tooth").pluralize() print(plural_form) # Output: teeth plural_form = Word("ox").pluralize() print(plural_form) # Output: oxen

From the above output, you can observe that TextBlob handles irregular patterns like ox->oxen and child->children as well.

Now, let us consider the word "water." Can you convert it to plural? No! Words like water and furniture are referred to as uncountable nouns and remain unchanged between singular/plural forms. Fortunately, TextBlob is equipped to understand these cases and does not modify them.

# Pluralization of Uncountable Nouns plural_form = Word('water').pluralize() print(plural_form) # Output: water plural_form = Word("information").pluralize() print(plural_form) # Output: information How to Pluralize Specific Words in a Sentence?

Often, while dealing with text documents, you may want to modify particular words. Let's say a text is "The container had multiple box filled with canned goods", and you wish to convert 'box' to 'boxes' to make it grammatically consistent. First, create a TextBlob object for the sentence. On this object, use the 'words()' function to get a list of the words in the sentence. Now, you can use the index of these words to convert any of them into their plural form. Note that you also need to import the NLTK package to use the functions to extract the list of words, due to package dependency.

import nltk nltk.download('punkt') from textblob import TextBlob sentence = "The container had multiple box filled with canned goods" # Create a TextBlob object blob = TextBlob(sentence) # Get the list of words in the sentence words = blob.words # Pluralize the 5th word pluralized_word = words[4].pluralize()

The modified sentence you get in the output would be:

# Output: The container had multiple boxes filled with canned goods

The output is as desired, with the changes made. This method can be used to singularize or pluralize all nouns in an entire text document too.

Singularization with TextBlob

Converting a noun from its plural form to its singular form is known as singularization and is commonly used to obtain base words. To avoid redundancy while working with large text corpora, plural forms are often reduced to their root or singular form. It helps us standardize the text data and reduce the dimensionality of the vocabulary. Similar to pluralization, TextBlob also provides a built-in method singularize() to handle singularization.

As we did in the previous section, create a TextBlob class object for your word and call the singularize() method on it. This method can handle different rules of singularization, including irregular cases. I have demonstrated this with a diverse set of examples below.

# Removing ‘s’ singular_form = Word("curtains").singularize() print(singular_form) # Output: curtain # Removing ‘es’ singular_form = Word("mattresses").singularize() print(singular_form) # Output: mattress # Changing internal vowel ('a' to 'o') singular_form = Word("geese").singularize() print(singular_form) # Output: goose # Completely irregular form singular_form = Word("mice").singularize() print(singular_form) # Output: mouse How can you singularize all nouns in your document?

As we discussed at the beginning of this section, standardizing the nouns in the text is a common need in NLP. Let's say we have a paragraph in our document as shown below.

# Define the paragraph paragraph = "The bookshop around the corner sells all genres of books, including fiction, non-fiction, autobiographies, and much more. The quality of the books is amazing, and they are all bestsellers. I recently bought the New York Times Bestseller Atomic Habits from there."

If you notice, the words 'book' and 'books', 'bestseller' and 'bestsellers' are present in this. For contextual understanding in NLP, only the root word is essential. Hence, we can convert them into standard singular forms and reduce the size of our vocabulary. I'll now show you how to do this easily with TextBlob.

# To extract nouns nltk.download('averaged_perceptron_tagger') # Create a TextBlob object blob = TextBlob(paragraph) # Extract nouns from the paragraph nouns = [word.singularize() for word, tag in blob.tags if tag.startswith('NN')] # Get unique singularized nouns unique_nouns = set(nouns) for noun in unique_nouns: print(noun) # Output: ['fiction', 'corner', 'book', 'genre', 'quality', 'shop', 'habit', 'autobiography']

From the output, you can verify that every noun has been converted to its singular form (standardized)!

How to Define Custom Rules for Singularization & Pluralization?

While TextBlob's built-in functions singularize() provide accurate results for most nouns in English, it isn't foolproof. I'll walk you through some common situations where TextBlob may not be able to handle the inflection.

Once you define these rules, TextBlob will remember them and automatically apply them when it encounters these specific words.

To help you understand better, I'll walk you through some common cases where setting custom rules may be necessary:

  • Non-English Words: TextBlob's default rules are primarily designed for English words. If you encounter non-English words that follow different pluralization or singularization patterns, you may need to set custom rules. For example: Cactus -> Cacti, Octopus -> Octopi.

  • Domain-specific Terms: Certain domain-specific terms may have unique pluralization or singularization forms that are not covered by the default rules. Example: Virus -> Viri, FAQ -> FAQs (acronyms)

What can we do in these cases?

You can use an alternative Python package called 'pattern' to easily define custom rules for word inflection. Import the singularize and pluralize functions from the pattern.en module. Next, we can define custom singularization and pluralization rules using dictionaries as shown below.

#!pip install pattern import pattern from pattern.en import singularize, pluralize custom_plural_rules = {'virus': 'viri', 'SAT': 'SATs'} # Define functions for custom singularization and pluralization def custom_plural(word): if word in custom_plural_rules: return custom_plural_rules[word] return pluralize(word) # Example usage print(custom_plural('virus')) # Output: Viri

If the input word is present in the dictionary, our custom rules will be applied. Otherwise, the default rules will be used for conversion.

When should you choose TextBlob and why?

One of the main advantages of TextBlob is its easy-to-use syntax and API, which makes it beginner-friendly. Apart from singularization/pluralization tasks, we can also perform a diverse set of tasks including part-of-speech tagging, noun phrase extraction, sentiment analysis, and more, as it comes with pre-trained models. It integrates seamlessly with other tools commonly used in the Python ecosystem, such as NLTK.

On the other hand, TextBlob may not be the best choice for large-scale text-processing tasks that are computationally intensive or when precise control is needed. In these cases, you can consider alternative packages such as spaCy and Stanford CoreNLP. spaCy provides robust capabilities for handling pluralization and singularization while dealing with large-scale text processing. You can check out the spaCy documentation here.

Conclusion

The TextBlob library's built-in methods pluralize() and singularize() are efficient ways to perform word inflection tasks. These methods can handle the different patterns of singular/plural nouns in the English language. TextBlob can also handle most irregular patterns. These features and functionalities make TextBlob a great choice for word inflection tasks.

If you want to understand how to perform other NLP techniques in TextBlob, you can check these articles. Pattern is another useful package if you want to define custom rules of inflection for your domain-specific terms. You can evaluate TextBlob and its alternatives based on your performance needs. I hope you enjoyed the read!

Categories: FLOSS Project Planets

Russell Coker: Considering Convergence

Planet Debian - Mon, 2023-05-29 03:41
What is Convergence

In 2013 Kyle Rankin (at the time Linux Journal columnist and CSO of Purism) wrote a Linux Journal article about Linux convergence [1] (which means using a phone and a dock to replace a desktop) featuring the Nokia N900 smart phone and a chroot environment on the Motorola Droid 4 Android phone. Both of them have very limited hardware even by the standards of the day and neither of which were systems I’d consider using all the time. None of the Android phones I used at that time were at all comparable to any sort of desktop system I’d want to use.

Hardware for Convergence – Comparing a Phone to a Laptop

The first hardware issue for convergence is docks and other accessories to attach a small computer to hardware designed for larger computers. Laptop docks have been around for decades and for decades I haven’t been using them because they have all been expensive and specific to a particular model of laptop. Having an expensive dock at home and an expensive dock at the office and then replacing them both when the laptop is replaced may work well for some people but wasn’t something I wanted to do. The USB-C interface supports data, power, and DisplayPort video over the same cable and now USB-C docks start at about $20 on eBay and dock functionality is built in to many new monitors. I can take a USB-C device to the office of any large company and know there’s a good chance that there will be a USB-C dock ready for me to use. The fact that USB-C is a standard feature for phones gives obvious potential for convergence.

The next issue is performance. The Passmark benchmark seems like a reasonable way to compare CPUs [2]. It may not be the best benchmark but it has an excellent set of published results for Intel and AMD CPUs. I ran that benchmark on my Librem5 [3] and got a result of 507 for the CPU score. At the end of 2017 I got a Thinkpad X301 [4] which rates 678 on Passmark. So the Librem5 has 3/4 the CPU power of a laptop that was OK for my use in 2018. Given that the X301 was about the minimum specs for a PC that I can use (for things other than serious compiles, running VMs, etc) the Librem 5 has 3/4 the CPU power, only 3G of RAM compared to 6G, and 32G of storage compared to 64G. Here is the Passmark page for my Librem5 [5]. As an aside my Libnrem5 is apparently 25% faster than the other results for the same CPU – did the Purism people do something to make their device faster than most?

For me the Librem5 would be at the very low end of what I would consider a usable desktop system. A friend’s N900 (like the one Kyle used) won’t complete the Passmark test apparently due to the “Extended Instructions (NEON)” test failing. But of the rest of the tests most of them gave a result that was well below 10% of the result from the Librem5 and only the “Compression” and “CPU Single Threaded” tests managed to exceed 1/4 the speed of the Librem5. One thing to note when considering the specs of phones vs desktop systems is that the MicroSD cards designed for use in dashcams and other continuous recording devices have TBW ratings that compare well to SSDs designed for use in PCs, so swap to a MicroSD card should work reasonably well and be significantly faster than the hard disks I was using for swap in 2013!

In 2013 I was using a Thinkpad T420 as my main system [6], it had 8G of RAM (the same as my current laptop) although I noted that 4G was slow but usable at the time. Basically it seems that the Librem5 was about the sort of hardware I could have used for convergence in 2013. But by today’s standards and with the need to drive 4K monitors etc it’s not that great.

The N900 hardware specs seem very similar to the Thinkpads I was using from 1998 to about 2003. However a device for convergence will usually do more things than a laptop (IE phone and camera functionality) and software had become significantly more bloated in 1998 to 2013 time period. A Linux desktop system performed reasonably with 32MB of RAM in 1998 but by 2013 even 2G was limiting.

Software Issues for Convergence

Jeremiah Foster (Director PureOS at Purism) wrote an interesting overview of some of the software issues of convergence [7]. One of the most obvious is that the best app design for a small screen is often very different from that for a large screen. Phone apps usually have a single window that shows a view of only one part of the data that is being worked on (EG an email program that shows a list of messages or the contents of a single message but not both). Desktop apps of any complexity will either have support for multiple windows for different data (EG two messages displayed in different windows) or a single window with multiple different types of data (EG message list and a single message). What we ideally want is all the important apps to support changing modes when the active display is changed to one of a different size/resolution. The Purism people are doing some really good work in this regard. But it is a large project that needs to involve a huge range of apps.

The next thing that needs to be addressed is the OS interface for managing apps and metadata. On a phone you swipe from one part of the screen to get a list of apps while on a desktop you will probably have a small section of a large monitor reserved for showing a window list. On a desktop you will typically have an app to manage a list of items copied to the clipboard while on Android and iOS there is AFAIK no standard way to do that (there is a selection of apps in the Google Play Store to do this sort of thing).

Purism has a blog post by Sebastian Krzyszkowiak about some of the development of the OS to make it work better for convergence and the status of getting it in Debian [8].

The limitations in phone hardware force changes to the software. Software needs to use less memory because phone RAM can’t be upgraded. The OS needs to be configured for low RAM use which includes technologies like the zram kernel memory compression feature.

Security

When mobile phones first came out they were used for less secret data. Loss of a phone was annoying and expensive but not a security problem. Now phone theft for the purpose of gaining access to resources stored on the phone is becoming a known crime, here is a news report about a thief stealing credit cards and phones to receive the SMS notifications from banks [9]. We should expect that trend to continue, stealing mobile devices for ssh keys, management tools for cloud services, etc is something we should expect to happen.

A problem with mobile phones in current use is that they have one login used for all access from trivial things done in low security environments (EG paying for public transport) to sensitive things done in more secure environments (EG online banking and healthcare). Some applications take extra precautions for this EG the Android app I use for online banking requires authentication before performing any operations. The Samsung version of Android has a system called Knox for running a separate secured workspace [10]. I don’t think that the Knox approach would work well for a full Linux desktop environment, but something that provides some similar features would be a really good idea. Also running apps in containers as much as possible would be a good security feature, this is done by default in Android and desktop OSs could benefit from it.

The Linux desktop security model of logging in to a single account and getting access to everything has been outdated for a long time, probably ever since single-user Linux systems became popular. We need to change this for many reasons and convergence just makes it more urgent.

Conclusion

I have become convinced that convergence is the way of the future. It has the potential to make transporting computers easier, purchasing cheaper (buy just a phone and not buy desktop and laptop systems), and access to data more convenient. The Librem5 doesn’t seem up to the task for my use due to being slow and having short battery life, the PinePhone Pro has more powerful hardware and allegedly has better battery life [11] so it might work for my needs. The PinePhone Pro probably won’t meet the desktop computing needs of most people, but hardware keeps getting faster and cheaper so eventually most people could have their computing needs satisfied with a phone.

The current state of software for convergence and for Linux desktop security needs some improvement. I have some experience with Linux security so this is something I can help work on.

To work on improving this I asked Linux Australia for a grant for me and a friend to get PinePhone Pro devices and a selection of accessories to go with them. Having both a Librem5 and a PinePhone Pro means that I can test software in different configurations which will make developing software easier. Also having a friend who’s working on similar things will help a lot, especially as he has some low level hardware skills that I lack.

Linux Australia awarded the grant and now the PinePhones are in transit. Hopefully I will have a PinePhone in a couple of weeks to start work on this.

Related posts:

  1. More About the Librem 5 I concluded my previous post about the Purism Librem 5...
  2. Long-term Device Use It seems to me that Android phones have recently passed...
  3. Huawei Mate9 Warranty Etc I recently got a Huawei Mate 9 phone....
Categories: FLOSS Project Planets

LN Webworks: The Ultimate Drupal Security Checklist to Safeguard Your Website

Planet Drupal - Mon, 2023-05-29 03:13

Cyber threats are escalating, and individuals are actively prioritizing their online safety by verifying website authenticity and safeguarding private data. Even with considerable efforts, websites are not immune to malware, brute force attacks, SQL injections, and DDoS attacks, posing a constant risk of unauthorized access and the compromise of customer information.

Hackers don't just target large corporations, but according to a report by AdvisorSmith, 42% of small-medium businesses are affected by these cyber attacks. That's why following a proven Drupal security checklist is imperative to safeguard your business against hackers or other malicious actors. With the right strategy and careful planning, you can make your website robust and avoid these potential threats. Read the full article, consider these 17 security checklists, and thrive in your business.

Categories: FLOSS Project Planets

The Drop Times: A Journey of Growth and Transformation

Planet Drupal - Mon, 2023-05-29 02:27

Today, let us deeply explore life's journey and the various phases we all experience. Like a captivating story, life unfolds through diverse chapters that offer unique lessons, challenges, and opportunities for personal growth and transformation. Let's embark on this exploration together and discover the beauty in embracing the different phases of life.

The Spring of Youth: Embracing Boundless Possibilities

Youth—the phase of exuberance and discovery. During this period, we plant the seeds of our dreams and ambitions, exploring the world with curiosity and an unyielding spirit. Let's celebrate the energy and potential of youth and cherish the memories we create along the way.

The Summer of Exploration: Embracing Self-Discovery

As we transition into adulthood, we enter the summer of our lives. This phase invites us to embark on a journey of self-discovery, to dive deep into our identities, values, and aspirations. It's a time to explore various paths, make important life choices, and build the foundation for our future. Let's embrace this season of exploration and savor the joy of discovering who we indeed are.

The Autumn of Wisdom: Embracing Growth and Reflection

In the autumn of life, we find ourselves at a crossroads—a time of introspection and reflection. This phase offers an opportunity to reap the wisdom accumulated through experiences, reflect on our achievements, and redefine our priorities. Let's celebrate the wealth of knowledge we have acquired and embrace the beauty of personal growth that continues to unfold.

The Winter of Serenity: Embracing Transformation and Legacy

Winter, a season marked by serenity and introspection, represents the later stages of life. During this phase, we reflect on our accomplishments, pass down our wisdom to future generations, and leave a lasting legacy. Let's embrace the winter of life with grace, appreciating the profound impact we can make and the beauty of our journey.

As we navigate these phases, it's important to remember that life is not a linear path but a cycle of growth and transformation. Each stage brings challenges and rewards; by embracing them, we can fully appreciate the richness of our existence.

Now, Let's dive into the essential picks from the past week.

Last week, The Drop Times was privileged to conduct two insightful interviews. Our first interview was with Martin Anderson-Clutz, a distinguished speaker at EvolveDrupal in Montreal. In this interview, Martin shares his valuable insights on the challenges and prospects of Drupal. The second interview was with Marcin Maruszewski, an experienced Drupal professional. Marcin graciously shared his personal journey and extensive Drupal experience, offering valuable perspectives for newcomers and seasoned practitioners.

Kwall blog post delved into the intricacies of two of the most popular platforms in the web development realm: Drupal and WordPress. RS Websols published an informative blog titled "The Benefits of Using Drupal: Unleashing its Potential for Web Development."

A recently released research report focusing on the Content Management Systems (CMS) market for 2023-2030 provides valuable insights into the fundamental dynamics driving the sector. AltaGrade, a leading web development company, has published an informative blog post titled "Upgrading to Drupal 10: Why It's Important and How AltaGrade Can Help."

Drupar, a reliable source of Drupal tutorials, recently shared a step-by-step guide demonstrating how to create a new text format in Drupal without relying on CKEditor. DrupalCon, the premier conference for the Drupal community, has announced an enticing giveaway exclusively for attendees of DrupalCon Pittsburgh 2023. To know more information, visit here.

In a recently shared blog post, Drupal India highlights the compelling reasons for choosing Drupal CMS when developing fully functional and easily navigable travel agency websites. SystemSeed, a trusted source of Drupal expertise, has published a blog post offering tips and tricks for finding the ideal Drupal maintenance partner.

Stay tuned for more updates, interviews, and informative articles in the upcoming editions of The Drop Times. Feel free to contact us if you have any suggestions, contributions, or feedback. Thank you for being a part of our community!

That is all for the week.
Your's sincerely,

Kazima Abbas
Sub-Editor, TheDropTimes

Categories: FLOSS Project Planets

Russ Allbery: Book haul

Planet Debian - Mon, 2023-05-29 00:31

I think this is partial because I also have a stack of other books that I missed recording. At some point, I should stop using this method to track book acquisitions in favor of one of the many programs intended for this purpose, but it's in the long list of other things I really should do one of these days.

As usual, I have already read and reviewed a few of these. I might be getting marginally better at reading books shortly after I acquire them? Maybe?

Steven Brust — Tsalmoth (sff)
C.L. Clark — The Faithless (sff)
Oliver Darkshire — Once Upon a Tome (non-fiction)
Hernan Diaz — Trust (mainstream)
S.B. Divya — Meru (sff)
Kate Elliott — Furious Heaven (sff)
Steven Flavall — Before We Go Live (non-fiction)
R.F. Kuang — Babel (sff)
Laurie Marks — Dancing Jack (sff)
Arkady Martine — Rose/House (sff)
Madeline Miller — Circe (sff)
Jenny Odell — Saving Time (non-fiction)
Malka Older — The Mimicking of Known Successes (sff)
Sabaa Tahir — An Ember in the Ashes (sff)
Emily Tesh — Some Desperate Glory (sff)
Valerie Valdes — Chilling Effect (sff)

Categories: FLOSS Project Planets

Louis-Philippe Véronneau: Python 3.11, pip and (breaking) system packages

Planet Debian - Mon, 2023-05-29 00:00

As we get closer to Debian Bookworm's release, I thought I'd share one change in Python 3.11 that will surely affect many people.

Python 3.11 implements the new PEP 668, Marking Python base environments as “externally managed”1. If you use pip regularly on Debian, it's likely you'll eventually hit the externally-managed-environment error:

error: externally-managed-environment × This environment is externally managed ╰─> To install Python packages system-wide, try apt install python3-xyz, where xyz is the package you are trying to install. If you wish to install a non-Debian-packaged Python package, create a virtual environment using python3 -m venv path/to/venv. Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make sure you have python3-full installed. If you wish to install a non-Debian packaged Python application, it may be easiest to use pipx install xyz, which will manage a virtual environment for you. Make sure you have pipx installed. See /usr/share/doc/python3.11/README.venv for more information. note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages. hint: See PEP 668 for the detailed specification.

With this PEP, Python tools can now distinguish between packages that have been installed by the user with a tool like pip and ones installed using a distribution's package manager, like apt.

This is generally great news: it was previously too easy to break a system by mixing the two types of packages. This PEP will simplify our role as a distribution, as well as improve the overall Python user experience in Debian.

Sadly, it's also likely this change will break some of your scripts, especially CI that (legitimately) install packages via pip alongside system packages. For example, I use the following gitlab-ci snippet to make sure my PRs don't break my build process2:

build:flit: stage: build script: - apt-get update && apt-get install -y flit python3-pip - FLIT_ROOT_INSTALL=1 flit install - metalfinder --help

With Python 3.11, this snippet will error out, as pip will refuse to install packages alongside the system's. The fix is to tell pip it's OK to "break" your system packages, either using the --break-system-packages parameter, or the PIP_BREAK_SYSTEM_PACKAGES=1 environment variable3.

This, of course, is not something you should be using in production to restore the old behavior! The "proper" way to fix this issue, as the externally-managed-environment error message aptly (har har) informs you, is to use virtual environments.

Happy hacking!

  1. Kudos to our own Matthias Klose, Stefano Rivera and Elana Hashman, who worked on designing and implementing this PEP! 

  2. Which is something that bit me before... You push some changes to your git repository, everything seems fine and all the tests pass, so you merge it and make a new git tag. When the time comes to build and upload this tag to PyPi, you find out some minor thing broke your build system (which you weren't testing) and you have to scramble to make a point-release to fix the issue. Sad! 

Categories: FLOSS Project Planets

#! code: Drupal 10: Using A Lazy Builder To Create A Dynamic Button

Planet Drupal - Sun, 2023-05-28 13:30

Adding dynamic and interactive elements to a web page can be a challenge, and there are a few techniques available in Drupal to allow for this.

One solution might be to add a form to the page, but this can cause problems with the cache system. Adding forms actually makes it slightly difficult to cache the page properly and you can easily see poor caching levels from pages containing forms.

Rather than adding a form to the site (and the complexities that come with that) it is possible to create a fully dynamic element that can be used to perform actions by the user. This is done using a combination of different techniques, all of which are built into Drupal and just need to be plugged together.

In this article I will look at using lazy builders to create a dynamic button that won't cause problems with the page cache, and will even work for anonymous users.

The Problem

For some context I thought I would talk about some of the work that went into putting this example together.

I was recently tasked to create a button on an Event content type that would act as the registration action for that event. The button needed to take into account different factors like the role of the user, the type of event, and the remaining capacity of the room. When the user clicked on the button they would be booked onto the event and the content of the button would change to inform them of this.

The button, therefore, needed to be fully dynamic for the user and the page they were visiting. In order to allow the button to be unique to each user and event I used a lazy builder to offset the generation of the button so that it wouldn't interfere with the caching of the page.

Read more

Categories: FLOSS Project Planets

Pages