Feeds

Stefano Zacchiroli: In memory of Lunar

Planet Debian - Thu, 2024-11-14 08:56

In memory of Lunar

I've had the incredible fortune to share the geek path of Lunar through life on multiple occasions. First, in Debian, beginning some 15+ years ago, where we were fellow developers and participated in many DebConf editions together.

Then, on the deontology committee of Nos Oignons, a non-profit organization initiated by Lunar to operate Tor relays in France. This was with the goal of diversifying relay operators and increasing access to censorship-resistance technology for everyone in the world. It was something truly innovative and unheard of at the time in France.

Later, as a member of the steering committee of Reproducible Builds, a project that Lunar brought to widespread geek popularity with a seminal "Birds of a Feather" session at DebConf13 (and then many other talks with fellow members of the project in the years to come). A decade later, Reproducible Builds is having a major impact throughout the software industry, primarily due to growing fears about the security of the software supply chain.

Finally, we had the opportunity to recruit Lunar a couple of years ago at Software Heritage, where he insisted on working until he was able to, as part of a team he loved, and that loved him back. In addition to his numerous technical contributions to the initiative, he also facilitated our first ever multi-day team seminar. The event was so successful that it has been confirmed as a long-awaited yearly recurrence by all team members.

I fondly remember one of the last conversations I had with Lunar, a few months ago, when he told me how proud he was not only of having started Nos Oignons and contributed to the ignition of Reproducible Builds, but specifically about the fact that both initiatives were now thriving without being dependent on him. He was likely thinking about a future world without him, but also realizing how impactful his activism had been on the past and present world.

Lunar changed the world for the better and left behind a trail of love and fond memories.

Che la terra ti sia lieve, compagno.

--- Zack

Categories: FLOSS Project Planets

PyCharm: Inline AI Prompting, Coding Assistance for the dataclass_transform Decorator (PEP 681), and More in PyCharm 2024.3!

Planet Python - Thu, 2024-11-14 08:42

Code smarter, optimize performance, and stay focused on what matters most with the latest updates in PyCharm 2024.3. From enhanced support for AI Assistant and Jupyter notebooks to new features like no-code data filtering, there’s so much to explore.

Learn about all the updates on our What’s New page, download the latest version from our website, or update your current version through our free Toolbox App.

Download PyCharm 2024.3 Key features of PyCharm 2024.3 AI Assistant Inline AI prompting

Get help with code, generate documentation, or write tests by prompting AI directly in PyCharm’s editor. Just type your request on a new line and hit Enter.

Edits made by AI are marked in purple in the gutter, so changes are easy to spot. Need a fresh suggestion? Press Tab, Ctrl+/ ( ⌘/ on macOS), or manually edit the purple input text yourself. This feature is available for Python, JavaScript, TypeScript, JSON, YAML, and Jupyter notebooks.

For a personalized AI chat experience, you can now also choose from Google Gemini, OpenAI, or your own local models. Moreover, enhanced context management now lets you control what AI Assistant takes into consideration. The brand-new UI auto-includes open files and selected code and comes with options to add or remove files and attach project-wide instructions to guide responses across your codebase.

Ability to convert for loops into list comprehensions

Refactor your code faster with AI Assistant, which can now help you change massive for loops into list comprehensions. This feature works for all for loops, including nested and while loops.

Local multiline AI code completion PyCharm Professional

PyCharm Professional now provides local multiline AI code completion suggestions based on the proprietary JetBrains ML model used for Full Line Code Completion. Note that we don’t use your data to train the model.

Local multiline code completion typically generates 2–4 lines of code in scenarios where it can predict the next sequence of logical steps, such as within loops, when handling conditions, or when completing common code patterns and boilerplate sections.

Coding assistance for the dataclass_transform decorator (PEP 681)

PyCharm now supports intelligent coding assistance for custom data classes created with libraries using the dataclass_transform decorator. Enjoy the same support as for standard data classes, including attribute code completion and type inference for constructor signatures.

Download PyCharm 2024.3 Jupyter Notebook PyCharm Professional Auto-installation for multiple packages

PyCharm 2024.3 makes it easier to install packages that are imported in your code. A new quick-fix is available for bulk auto-installations, allowing you to download and install several packages in one click.

Ability to open Jupyter table outputs in the Data View window

View Jupyter table outputs in the Data View tool window to access powerful features like heatmaps, formatting, slicing, and AI functions for enhanced dataframe analysis. Just click on the Open in Data View icon to get started.

No-code data filtering

Effortlessly filter data in the Data View tool window or within dataframes without writing any code. Just click the Filter icon in the upper-right corner, choose your filter options and see results in the same window. This functionality works with all supported Python frameworks, including pandas, Polars, NumPy, PyTorch, TensorFlow, and Hugging Face Datasets.

Debug port specification PyCharm Professional

PyCharm now allows you to specify a single debugger port for all communications, simplifying debugging in restricted environments like Docker or WSL. After you set the port in the debugger settings, the debugger runs as a server and all communication between it and the IDE flows through the specified port.

Download PyCharm 2024.3

Visit our What’s New page or check out the full release notes for more features and additional details about the features mentioned here. Please report any bugs on our issue tracker so we can address them promptly.

Connect with us on X (formerly Twitter) to share your thoughts on PyCharm 2024.3. We look forward to hearing from you!

Categories: FLOSS Project Planets

Qt Creator 15 RC released

Planet KDE - Thu, 2024-11-14 07:45

We are happy to announce the release of Qt Creator 15 RC!

Categories: FLOSS Project Planets

Metafont, MetaPost and Malayalam font

Planet KDE - Thu, 2024-11-14 07:21

At the International TeX Users Group Conference 2023 (TUG23) in Bonn, Germany, I presented a talk about using Metafont (and its extension Metapost) to develop traditional orthography Malayalam fonts, on behalf of C.V. Radhakrishnan and K.H. Hussain, who were the co-developers and authors. And I forgot to post about it afterwards — as always, life gets in between.

In early 2022, CVR started toying with Metafont to create a few complicated letters of Malayalam script and he showed us a wonderful demonstration that piqued many of our interest. With the same code base, by adjusting the parameters, different variations of the glyphs can be generated, as seen in a screenshot of that demonstration: 16 variations of the same character ഴ generated from same Metafont source.

Hussain, quickly realizing that the characters could be programmatically assembled from a set of base/repeating components, collated an excellent list of basic shapes for Malayalam script.

Excerpts from the Malayalam character basic shape components documented by K.H. Hussain.

I bought a copy of ‘The Metafontbook’ and started learning and experimenting. We found soon that Metafont, developed by Prof. Knuth in the late 1970’s, generates bitmap/raster output; but its extension MetaPost, developed by his Ph.D. student John Hobby, generates vector output (postscript) which is required for opentype fonts. We also found that ‘Metatype1’ developed by Bogusław Jackowski et al. has very useful macros and ideas.

We had a lot of fun programmatically generating the character components and assembling them, splicing them, sometimes cutting them short, and transforming them in all useful manner. I have developed a new set of tools to generate the font from the vector output (SVG files) generated by MetaPost, which is also used in later projects like Chingam font.

At the annual TUG conference 2023 in Bonn, Germany, I have presented our work, and we received good feedback. There were three presentations about Metafont itself at the conference. Among others, I also had the pleasure to meet Linus Romer who shared some ideas about designing variable width reph-shapes for Malayalam characters.

The video of the presentation is available in YouTube.

The article was published in the TUGboat conference proceedings (volume 44): https://www.tug.org/TUGboat/tb44-2/tb137radhakrishnan-malayalam.pdf

Postscript (no pun intended): after the conference, I visited some of my good friends in Belgium and Netherlands. En route, my backpack with passport, identity cards, laptop, a phone and money etc. was stolen at Liège. I can’t thank enough my friends at Belgium and back at home for all their care and help, in the face of a terrible experience. On the day before my return, the stolen backpack with everything except the money was found by the railway authorities and I was able to claim it just in time.

I made yet another visit to the magnificent Plantin–Moretus Museum (it holds the original Garamond types!), where I myself could ink and print a metal typeset block of sonnet by Christoffel Plantijn in 1575, which now hangs at the office of a good friend.

Categories: FLOSS Project Planets

Real Python: Quiz: Namespaces and Scope in Python

Planet Python - Thu, 2024-11-14 07:00

In this quiz, you’ll test your understanding of Python Namespaces and Scope.

You’ll revisit how Python organizes symbolic names and objects in namespaces, when Python creates a new namespace, how namespaces are implemented, and how variable scope determines symbolic name visibility.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy

Open Source Initiative - Thu, 2024-11-14 06:00

BRUSSELS and WEST HOLLYWOOD, Calif. – 14 November 2024 – The Eclipse Foundation, one of the world’s largest open source foundations, and the Open Source Initiative (OSI), the global non-profit educating about and advocating for the benefits of open source and steward of the Open Source Definition, have signed a Memorandum of Understanding (MOU) to collaborate on promoting the interest of the open source community in the implementation of regulatory initiatives on Open Source Artificial Intelligence (OSAI). This agreement underscores the two organisations’ shared commitment to ensuring that emerging AI regulations align with widely recognised OSI open source definitions and open source values and principles.

“AI is arguably the most transformative technology of our generation,” said Stefano Maffulli, executive director, Open Source Initiative. “The challenge now is to craft policies that not only foster growth of AI but ensure that Open Source AI thrives within this evolving landscape. Partnering with the Eclipse Foundation and its expertise, with its experience in European open source development and regulatory compliance, is important to shape the future of Open Source AI.”

“For decades, OSI has been the ‘gold standard’ the open source community has turned to for building consensus around important issues,” said Mike Milinkovich, executive director of the Eclipse Foundation. “As AI reshapes industries and societies, there is no more pressing issue for the open source community than the regulatory recognition of open source AI systems. Our combined expertise – OSI’s global leadership in open standards and open source licences and our extensive work with open source regulatory compliance – makes this partnership a powerful advocate for the design and implementation of sound AI policies worldwide.”

Addressing the Global Challenges of AI Regulation

With AI regulation on the horizon in multiple regions, including the EU, both organisations recognise the urgency of helping policymakers understand the unique challenges and opportunities of OSAI technologies. The rapid evolution of AI technologies, together with new, upcoming complex regulatory landscapes, demand clear, consistent, and aligned guidance rooted in open source principles.

Through this partnership, the Eclipse Foundation and OSI will endeavour to bring clarity in language and terms that industry, community, civil society, and policymakers can rely upon as public policy is drafted and enforced. The organisations will collaborate by leveraging their respective public platforms and events to raise awareness and advocate on the topic. Additionally, they will work together on joint publications, presentations, and other promotional activities, while also assisting one another in educating government officials on policy considerations for OSAI and General Purpose AI (GPAI). Through this partnership, they aim to provide clear, consistent guidance that aligns with open source principles.

Key Areas of Collaboration

The MoU outlines several areas of cooperation, including:

Information Exchange: OSI and the Eclipse Foundation will share relevant insights and information related to public policy-making and regulatory activities on artificial intelligence.
Representation to Policymakers: OSI and the Eclipse Foundation will cooperate in representing the principles and values of open source licences to policymakers and civil society organisations.
Promotion of Open Source Principles: Joint efforts will be made to raise awareness of the role of open source in AI, emphasising how it can foster innovation while mitigating risks.

A Partnership for the Future

As AI continues to revolutionise industries worldwide, the need for thoughtful, balanced regulation is critical. The OSI and Eclipse Foundation are committed to providing the open source community, industry leaders, and policymakers with the tools and knowledge they need to navigate this rapidly evolving field.

This MoU marks the very beginning of a long-term collaboration, with joint initiatives and activities to be announced throughout the remainder of 2024 and into 2025.

About the Eclipse Foundation

The Eclipse Foundation provides our global community of individuals and organisations with a business-friendly environment for open source software collaboration and innovation. We host the Eclipse IDE, Adoptium, Software Defined Vehicle, Jakarta EE, and over 420 open source projects, including runtimes, tools, specifications, and frameworks for cloud and edge applications, IoT, AI, automotive, systems engineering, open processor designs, and many others. Headquartered in Brussels, Belgium, the Eclipse Foundation is an international non-profit association supported by over 385 members. To learn more, follow us on social media @EclipseFdn, LinkedIn, or visit eclipse.org.

About the Open Source Initiative

Founded in 1998, the Open Source Initiative (OSI) is a non-profit corporation with global scope formed to educate about and advocate for the benefits of Open Source and to build bridges among different constituencies in the Open Source community. It is the steward of the Open Source Definition, setting the foundation for the global Open Source ecosystem. Join and support the OSI mission today at https://opensource.org/join.

Third-party trademarks mentioned are the property of their respective owners.

###

Media contacts:

Schwartz Public Relations (Germany)
Gloria Huppert/Marita Bäumer
Sendlinger Straße 42A
80331 Munich
EclipseFoundation@schwartzpr.de
+49 (89) 211 871 -70/ -62

514 Media Ltd (France, Italy, Spain)
Benoit Simoneau
benoit@514-media.com
M: +44 (0) 7891 920 370

Nichols Communications (Global Press Contact)
Jay Nichols
jay@nicholscomm.com
+1 408-772-1551

Categories: FLOSS Research

My first in-person Akademy: Thessaloniki 2023

Planet KDE - Thu, 2024-11-14 05:00

My first in-person Akademy: Thessaloniki 2023

This year, I was finally able to participate in-person at Akademy. Apart from meeting some familiar faces from the Plasma Spring in May this year, I also met lots of new people.

When waiting for the plane in Frankfurt, a group of KDE people formed. Meaning, we had a get-together even before the Akademy had started ;). On the plane to Thessaloniki, I made a merge requests to fix a Kickoff crash due to a KRunner change. Once that was done, everything was in place for the talks!

On Saturday, I talked once again with Nico and also Volker about KF6. This included topics like the remaining challenges, the estimated timeline for KF6 and some practical porting advice. I also gave a talk about KRunner. This was the conference talk of mine that I gave alone, meaning I was a bit nervous 😅. The title was “KRunner: Past, Present, and Future” and it focused on porting, new features and future plans for KF6. Thanks to everyone who was listening to the talk, both in person and online! Some things like the multithreading refactoring are worth their own blog post, which I will do in the next weeks.

The talks from other community members were also quite interesting. Sometimes it was hard to decide to which talk to go :). Multiple talks and BoFs were about energy efficiency and doing measurements. This perfectly aligned with me doing benchmarking of KRunner and the KCoreAddons plugin infrastructure.

The view of the city from the hotel balcony was also quite nice

Our KF6 and Qt6 porting BoFs were also quite productive. On Tuesday, we had our traditional KF6 weekly. Having this in person was definitely a nice refreshment! Apart from some general questions about documentation and KF6 Phabricator tasks, we discussed the release schedule. The main takeaway is that we want to improve the release automation and have created a small team to handle the KDE Frameworks releases. This includes Harald Sitter, Nicolas Fella, David Edmundson and me. Feel free to join the weeklies in our Big Blue Button room https://meet.kde.org/b/ada-mi8-aem at 17:00 CEST each Tuesday.

Since we had so many talented KDE people in one place, I decided to have a KRunner BoF on Tuesday morning. Subject of discussion was for example the sorting of KRunner, how to better organize the categories and the revival of the so-called “single runner mode”. This mode allows you to query only one specific plugin, instead of all available ones. This was previously only available from the D-Bus interface, but I have added a command line option to KRunner. To better visualize this special mode being in use, a tool-button was added as an indicator. This can also be used to go back to querying all runners. Kai will implement clickable categories that allow you to enable this mode without any command line options being necessary!

Finally, I would like to thank everyone who made this awesome experience possible! I am already looking forward to the next Akademy and the next sprints.

Categories: FLOSS Project Planets

Unifying the KRunner sorting mechanisms for Plasma6 & further plans

Planet KDE - Thu, 2024-11-14 05:00

Unifying the KRunner sorting mechanisms for Plasma6 & further plans

In Plasma5, we had different sorting implementations for KRunner and Kicker. This had historical reasons, because Kicker only used a subset of the available KRunner plugins. Due to the increased reliability, we decided to allow all available plugins to be loaded. However, the model still hard-coded the order in which the categories are displayed. This was reported in this bug which received numerous duplicates.

To address this concern, I focused on refactoring and cleaning up KRunner as part of KDE Frameworks 6. Among the significant architectural changes was the integration of KRunner’s model responsible for sorting into the KRunner framework itself. This integration enabled easier code sharing and simplified code maintenance. Consequently, the custom sorting logic previously present in Kicker could be removed.

Further plans

Now you know some of the improvements that have been done, but more interesting might be the future plans! While the sorting in KRunner was in lots of regards better than the one from Kicker, it still has some flaws. For instance, tweaking the order of results from a plugin developer’s perspective proved challenging, since rearranging categories could occur unintentionally. Also, KRunner implements logic to prioritize often launched results. In practice, this did not work quite well, because it only changed one of two sorting factors that are basically the same (sounds messy, I know :D).

The plan is to have two separate sorting values: One for the categories and one for the results within a category. This allows KRunner to more intelligently learn which categories you use more the most and prioritize them for further queries.

Another feature request to configure the sorting of plugins. With the described change, this is far easier to implement. Some of the visuals were already discussed at the Plasma sprint last month.

Stay tuned for updates!

Categories: FLOSS Project Planets

PyPy: Guest Post: Final Encoding in RPython Interpreters

Planet Python - Thu, 2024-11-14 03:42

Introduction

This post started as a quick note summarizing a recent experiment I carried out upon a small RPython interpreter by rewriting it in an uncommon style. It is written for folks who have already written some RPython and want to take a deeper look at interpreter architecture.

Some experiments are about finding solutions to problems. This experiment is about taking a solution which is already well-understood and applying it in the context of RPython to find a new approach. As we will see, there is no real change in functionality or the number of clauses in the interpreter; it's more like a comparison between endo- and exoskeletons, a different arrangement of equivalent bones and plates.

Overview

An RPython interpreter for a programming language generally does three or four things, in order:

Read and parse input programs
Encode concrete syntax as abstract syntax
Optionally, optimize or reduce the abstract syntax
Evaluate the abstract syntax: read input data, compute, print output data, etc.

Today we'll look at abstract syntax. Most programming languages admit a concrete parse tree which is readily abstracted to provide an abstract syntax tree (AST). The AST is usually encoded with the initial style of encoding. An initial encoding can be transformed into any other encoding for the same AST, looks like a hierarchy of classes, and is implemented as a static structure on the heap.

In contrast, there is also a final encoding. A final encoding can be transformed into by any other encoding, looks like an interface for the actions of the interpreter, and is implemented as an unwinding structure on the stack. From the RPython perspective, Python builtin modules like os or sys are final encodings for features of the operating system; the underlying implementation is different when translated or untranslated, but the interface used to access those features does not change.

In RPython, an initial encoding is built from a hierarchy of classes. Each class represents a type of tree nodes, corresponding to a parser production in the concrete parse tree. Each class instance therefore represents an individual tree node. The fields of a class, particularly those filled during .__init__(), store pre-computed properties of each node; methods can be used to compute node properties on demand. This seems like an obvious and simple approach; what other approaches could there be? We need an example.

Final Encoding of Brainfuck

We will consider Brainfuck, a simple Turing-complete programming language. An example Brainfuck program might be:

[-]

This program is built from a loop and a decrement, and sets a cell to zero. In an initial encoding which follows the algebraic semantics of Brainfuck, the program could be expressed by applying class constructors to build a structure on the heap:

Loop(Plus(-1))

A final encoding is similar, except that class constructors are replaced by methods, the structure is built on the stack, and we are parameterized over the choice of class:

lambda cls: cls.loop(cls.plus(-1))

In ordinary Python, transforming between these would be trivial, and mostly is a matter of passing around the appropriate class. Indeed, initial and final encodings are equivalent; we'll return to that fact later. However, in RPython, all of the types must line up, and classes must be determined before translation. We'll need to monomorphize our final encodings, using some RPython tricks later on. Before that, let's see what an actual Brainfuck interface looks like, so that we can cover all of the difficulties with final encoding.

Before we embark, please keep in mind that local code doesn't know what cls is. There's no type-safe way to inspect an arbitrary semantic domain. In the initial-encoded version, we can ask isinstance(bf, Loop) to see whether an AST node is a loop, but there simply isn't an equivalent for final-encoded ASTs. So, there is an implicit challenge to think about: how do we evaluate a program in an arbitrary semantic domain? For bonus points, how do we optimize a program without inspecting the types of its AST nodes?

What follows is a dissection of this module at the given revision. Readers may find it satisfying to read the entire interpreter top to bottom first; it is less than 300 lines.

Core Functionality

Final encoding is given as methods on an interface. These five methods correspond precisely to the summands of the algebra of Brainfuck.

class BF(object): # Other methods elided def plus(self, i): pass def right(self, i): pass def input(self): pass def output(self): pass def loop(self, bfs): pass

Note that the .loop() method takes another program as an argument. Initial-encoded ASTs have other initial-encoded ASTs as fields on class instances; final-encoded ASTs have other final-encoded ASTs as parameters to interface methods. RPython infers all of the types, so the reader has to know that i is usually an integer while bfs is a sequence of Brainfuck operations.

We're using a class to implement this functionality. Later, we'll treat it as a mixin, rather than a superclass, to avoid typing problems.

Monoid

In order to optimize input programs, we'll need to represent the underlying monoid of Brainfuck programs. To do this, we add the signature for a monoid:

class BF(object): # Other methods elided def unit(self): pass def join(self, l, r): pass

This is technically a unital magma, since RPython doesn't support algebraic laws, but we will enforce the algebraic laws later on during optimization. We also want to make use of the folklore that free monoids are lists, allowing callers to pass a list of actions which we'll reduce with recursion:

class BF(object): # Other methods elided def joinList(self, bfs): if not bfs: return self.unit() elif len(bfs) == 1: return bfs[0] elif len(bfs) == 2: return self.join(bfs[0], bfs[1]) else: i = len(bfs) >> 1 return self.join(self.joinList(bfs[:i]), self.joinList(bfs[i:]))

.joinList() is a little bulky to implement, but Wirth's principle applies: the interpreter is shorter with it than without it.

Idioms

Finally, our interface includes a few high-level idioms, like the zero program shown earlier, which are defined in terms of low-level behaviors. In an initial encoding, these could be defined as module-level functions; here, we define them on the mixin class BF.

class BF(object): # Other methods elided def zero(self): return self.loop(self.plus(-1)) def move(self, i): return self.scalemove(i, 1) def move2(self, i, j): return self.scalemove2(i, 1, j, 1) def scalemove(self, i, s): return self.loop(self.joinList([ self.plus(-1), self.right(i), self.plus(s), self.right(-i)])) def scalemove2(self, i, s, j, t): return self.loop(self.joinList([ self.plus(-1), self.right(i), self.plus(s), self.right(j - i), self.plus(t), self.right(-j)])) Interface-oriented Architecture Applying Interfaces

Now, we hack at RPython's object model until everything translates. First, consider the task of pretty-printing. For Brainfuck, we'll simply regurgitate the input program as a Python string:

class AsStr(object): import_from_mixin(BF) def unit(self): return "" def join(self, l, r): return l + r def plus(self, i): return '+' * i if i > 0 else '-' * -i def right(self, i): return '>' * i if i > 0 else '<' * -i def loop(self, bfs): return '[' + bfs + ']' def input(self): return ',' def output(self): return '.'

Via rlib.objectmodel.import_from_mixin, no stressing with covariance of return types is required. Instead, we shift from a Java-esque view of classes and objects, to an OCaml-ish view of prebuilt classes and constructors. AsStr is monomorphic, and any caller of it will have to create their own covariance somehow. For example, here are the first few lines of the parsing function:

@specialize.argtype(1) def parse(s, domain): ops = [domain.unit()] # Parser elided to preserve the reader's attention

By invoking rlib.objectmodel.specialize.argtype, we make copies of the parsing function, up to one per call site, based on our choice of semantic domain. Oleg calls these "symantics" but I prefer "domain" in code. Also, note how the parsing stack starts with the unit of the monoid, which corresponds to the empty input string; the parser will repeatedly use the monoidal join to build up a parsed expression without inspecting it. Here's a small taste of that:

while i < len(s): char = s[i] if char == '+': ops[-1] = domain.join(ops[-1], domain.plus(1)) elif char == '-': ops[-1] = domain.join(ops[-1], domain.plus(-1)) # and so on

The reader may feel justifiably mystified; what breaks if we don't add these magic annotations? Well, the translator will throw UnionError because the low-level types don't match. RPython only wants to make one copy of functions like parse() in its low-level representation, and each copy of parse() will be compiled to monomorphic machine code. In this interpreter, in order to support parsing to an optimized string and also parsing to an evaluator, we need two copies of parse(). It is okay to not fully understand this at first.

Composing Interfaces

Earlier, we noted that an interpreter can optionally optimize input programs after parsing. To support this, we'll precompose a peephole optimizer onto an arbitrary domain. We could also postcompose with a parser instead, but that sounds more difficult. Here are the relevant parts:

def makePeephole(cls): domain = cls() def stripDomain(bfs): return domain.joinList([t[0] for t in bfs]) class Peephole(object): import_from_mixin(BF) def unit(self): return [] def join(self, l, r): return l + r # Actual definition elided... for now... return Peephole, stripDomain

Don't worry about the actual optimization yet. What's important here is the pattern of initialization of semantic domains. makePeephole is an SML-style functor on semantic domains: given a final encoding of Brainfuck, it produces another final encoding of Brainfuck which incorporates optimizations. The helper stripDomain is a finalizer which performs the extraction from the optimizer's domain to the underlying cls that was passed in at translation time. For example, let's optimize pretty-printing:

AsStr, finishStr = makePeephole(AsStr)

Now, it only takes one line to parse and print an optimized AST without ever building it on the heap. To be pedantic, fragments of the output string will be heap-allocated, but the AST's node structure will only ever be stack-allocated. Further, to be shallow, the parser is written to prevent malicious input from causing a stack overflow, and this forces it to maintain a heap-allocated RPython list of intermediate operations inside loops.

print finishStr(parse(text, AsStr())) Performance

But is it fast? Yes. It's faster than the prior version, which was initial-encoded, and also faster than Andrew Brown's classic version (part 1, part 2). Since Brown's interpreter does not perform much optimization, we will focus on how final encoding can outperform initial encoding.

JIT

First, why is it faster than the same interpreter with initial encoding? Well, it still has initial encoding from the JIT's perspective! There is an Op class with a hierarchy of subclasses implementing individual behaviors. A sincere tagless-final student, or those who remember Stop Writing Classes (2012, Pycon US), will recognize that the following classes could be plain functions, and should think of the classes as a concession to RPython's lack of support for lambdas with closures rather than an initial encoding. We aren't ever going to directly typecheck any Op, but the JIT will generate typechecking guards anyway, so we effectively get a fully-promoted AST inlined into each JIT trace. First, some simple behaviors:

class Op(object): _immutable_ = True class _Input(Op): _immutable_ = True def runOn(self, tape, position): tape[position] = ord(os.read(0, 1)[0]) return position Input = _Input() class _Output(Op): _immutable_ = True def runOn(self, tape, position): os.write(1, chr(tape[position])) return position Output = _Output() class Add(Op): _immutable_ = True _immutable_fields_ = "imm", def __init__(self, imm): self.imm = imm def runOn(self, tape, position): tape[position] += self.imm return position

The JIT does technically have less information than before; it no longer knows that a sequence of immutable operations is immutable enough to be worth unrolling, but a bit of rlib.jit.unroll_safe fixes that:

class Seq(Op): _immutable_ = True _immutable_fields_ = "ops[*]", def __init__(self, ops): self.ops = ops @unroll_safe def runOn(self, tape, position): for op in self.ops: position = op.runOn(tape, position) return position

Finally, the JIT entry point is at the head of each loop, just like with prior interpreters. Since Brainfuck doesn't support mid-loop jumps, there's no penalty for only allowing merge points at the head of the loop.

class Loop(Op): _immutable_ = True _immutable_fields_ = "op", def __init__(self, op): self.op = op def runOn(self, tape, position): op = self.op while tape[position]: jitdriver.jit_merge_point(op=op, position=position, tape=tape) position = op.runOn(tape, position) return position

That's the end of the implicit challenge. There's no secret to it; just evaluate the AST. Here's part of the semantic domain for evaluation, as well as the "functor" to optimize it. In AsOps.join() are the only isinstance() calls in the entire interpreter! This is acceptable because Seq is effectively a type wrapper for an RPython list, so that a list of operations is also an operation; its list is initial-encoded and available for inspection.

class AsOps(object): import_from_mixin(BF) def unit(self): return Shift(0) def join(self, l, r): if isinstance(l, Seq) and isinstance(r, Seq): return Seq(l.ops + r.ops) elif isinstance(l, Seq): return Seq(l.ops + [r]) elif isinstance(r, Seq): return Seq([l] + r.ops) return Seq([l, r]) # Other methods elided! AsOps, finishOps = makePeephole(AsOps)

And finally here is the actual top-level code to evaluate the input program. As before, once everything is composed, the actual invocation only takes one line.

tape = bytearray("\x00" * cells) finishOps(parse(text, AsOps())).runOn(tape, 0) Peephole Optimization

Our peephole optimizer is an abstract interpreter with one instruction of lookahead/rewrite buffer. It implements the aforementioned algebraic laws of the Brainfuck monoid. It also implements idiom recognition for loops. First, the abstract interpreter. The abstract domain has six elements:

class AbstractDomain(object): pass meh, aLoop, aZero, theIdentity, anAdd, aRight = [AbstractDomain() for _ in range(6)]

We'll also tag everything with an integer, so that anAdd or aRight can be exact annotations. This is the actual Peephole.join() method:

def join(self, l, r): if not l: return r rv = l[:] bfHead, adHead, immHead = rv.pop() for bf, ad, imm in r: if ad is theIdentity: continue elif adHead is aLoop and ad is aLoop: continue elif adHead is theIdentity: bfHead, adHead, immHead = bf, ad, imm elif adHead is anAdd and ad is aZero: bfHead, adHead, immHead = bf, ad, imm elif adHead is anAdd and ad is anAdd: immHead += imm if immHead: bfHead = domain.plus(immHead) elif rv: bfHead, adHead, immHead = rv.pop() else: bfHead = domain.unit() adHead = theIdentity elif adHead is aRight and ad is aRight: immHead += imm if immHead: bfHead = domain.right(immHead) elif rv: bfHead, adHead, immHead = rv.pop() else: bfHead = domain.unit() adHead = theIdentity else: rv.append((bfHead, adHead, immHead)) bfHead, adHead, immHead = bf, ad, imm rv.append((bfHead, adHead, immHead)) return rv

If this were to get much longer, then implementing a DSL would be worth it, but this is a short-enough method to inline. The abstract interpretation is assumed by induction for the left-hand side of the join, save for the final instruction, which is loaded into a rewrite register. Each instruction on the right-hand side is inspected exactly once. The logic for anAdd followed by anAdd is exactly the same as for aRight followed by aRight because they both have underlying Abelian groups given by the integers. The rewrite register is carefully pushed onto and popped off from the left-hand side in order to cancel out theIdentity, which itself is merely a unifier for anAdd or aRight of 0.

Note that we generate a lot of garbage. For example, parsing a string of n '+' characters will cause the peephole optimizer to allocate n instances of the underlying domain.plus() action, from domain.plus(1) up to domain.plus(n). An older initial-encoded version of this interpreter used hash consing to avoid ever building an op more than once, even loops. It appears more efficient to generate lots of immutable garbage than to repeatedly hash inputs and search mutable hash tables, at least for optimizing Brainfuck incrementally during parsing.

Finally, let's look at idiom recognition. RPython lists are initial-coded, so we can dispatch based on the length of the list, and then inspect the abstract domains of each action.

def isConstAdd(bf, i): return bf[1] is anAdd and bf[2] == i def oppositeShifts(bf1, bf2): return bf1[1] is bf2[1] is aRight and bf1[2] == -bf2[2] def oppositeShifts2(bf1, bf2, bf3): return (bf1[1] is bf2[1] is bf3[1] is aRight and bf1[2] + bf2[2] + bf3[2] == 0) def loop(self, bfs): if len(bfs) == 1: bf, ad, imm = bfs[0] if ad is anAdd and imm in (1, -1): return [(domain.zero(), aZero, 0)] elif len(bfs) == 4: if (isConstAdd(bfs[0], -1) and bfs[2][1] is anAdd and oppositeShifts(bfs[1], bfs[3])): return [(domain.scalemove(bfs[1][2], bfs[2][2]), aLoop, 0)] if (isConstAdd(bfs[3], -1) and bfs[1][1] is anAdd and oppositeShifts(bfs[0], bfs[2])): return [(domain.scalemove(bfs[0][2], bfs[1][2]), aLoop, 0)] elif len(bfs) == 6: if (isConstAdd(bfs[0], -1) and bfs[2][1] is bfs[4][1] is anAdd and oppositeShifts2(bfs[1], bfs[3], bfs[5])): return [(domain.scalemove2(bfs[1][2], bfs[2][2], bfs[1][2] + bfs[3][2], bfs[4][2]), aLoop, 0)] if (isConstAdd(bfs[5], -1) and bfs[1][1] is bfs[3][1] is anAdd and oppositeShifts2(bfs[0], bfs[2], bfs[4])): return [(domain.scalemove2(bfs[0][2], bfs[1][2], bfs[0][2] + bfs[2][2], bfs[3][2]), aLoop, 0)] return [(domain.loop(stripDomain(bfs)), aLoop, 0)]

This ends the bonus question. How do we optimize an unknown semantic domain? We must maintain an abstract context which describes elements of the domain. In initial encoding, we ask an AST about itself. In final encoding, we already know everything relevant about the AST.

The careful reader will see that I didn't really answer that opening question in the JIT section. Because the JIT still ranges over the same operations as before, it can't really be slower; but why is it now faster? Because the optimizer is now slightly better in a few edge cases. It performs the same optimizations as before, but the rigor of abstract interpretation causes it to emit slightly better operations to the JIT backend.

Concretely, improving the optimizer can shorten pretty-printed programs. The Busy Beaver Gauge measures the length of programs which search for solutions to mathematical problems. After implementing and debugging the final-encoded interpreter, I found that two of my entries on the Busy Beaver Gauge for Brainfuck had become shorter by about 2%. (Most other entries are already hand-optimized according to the standard algebra and have no optimization opportunities.)

Discussion

Given that initial and final encodings are equivalent, and noting that RPython's toolchain is written to prefer initial encodings, what did we actually gain? Did we gain anything?

One obvious downside to final encoding in RPython is interpreter size. The example interpreter shown here is a rewrite of an initial-encoded interpreter which can be seen here for comparison. Final encoding adds about 20% more code in this case.

Final encoding is not necessarily more code than initial encoding, though. All AST encodings in interpreters are subject to the Expression Problem, which states that there is generally a quadratic amount of code required to implement multiple behaviors for an AST with multiple types of nodes; specifically, n behaviors for m types of nodes require n × m methods. Initial encodings improve the cost of adding new types of nodes; final encodings improve the cost of adding new behaviors. Final encoding may tend to win in large codebases for mature languages, where the language does not change often but new behaviors are added frequently and maintained for long periods.

Optimizations in final encoding require a bit of planning. The abstract-interpretation approach is solid but relies upon the monoid and its algebraic laws. In the worst case, an entire class hierarchy could be required to encode the abstraction.

It is remarkable to find a 2% improvement in residual program size merely by reimplementing an optimizer as an abstract interpreter respecting the algebraic laws. This could be the most important lesson for compiler engineers, if it happens to generalize.

Final encoding was popularized via the tagless-final movement in OCaml and Scala, including famously in a series of tutorials by Kiselyov et al. A "tag", in this jargon, is a runtime identifier for an object's type or class; a tagless encoding effectively doesn't allow isinstance() at all. In the above presentation, tags could be hacked in, but were not materially relevant to most steps. Tags were required for the final evaluation step, though, and the tagless-final insight is that certain type systems can express type-safe evaluation without those tags. We won't go further in this direction because tags also communicate valuable information to the JIT.

Summarizing Table Initial Encoding Final Encoding hierarchy of classes signature of interfaces class constructors method calls built on the heap built on the stack traversals allocate stack traversals allocate heap tags are available with isinstance() tags are only available through hacks cost of adding a new AST node: one class cost of adding a new AST node: one method on every other class cost of adding a new behavior: one method on every other class cost of adding a new behavior: one class Credits

Thanks to folks in #pypy on Libera Chat: arigato for the idea, larstiq for pushing me to write it up, and cfbolz and mattip for reviewing and finding mistakes. The original IRC discussion leading to this blog post is available here.

This interpreter is part of the rpypkgs suite, a Nix flake for RPython interpreters. Readers with Nix installed can run this interpreter directly from the flake:

$ nix-prefetch-url https://github.com/MG-K/pypy-tutorial-ko/raw/refs/heads/master/mandel.b $ nix run github:rpypkgs/rpypkgs#bf -- /nix/store/ngnphbap9ncvz41d0fkvdh61n7j2bg21-mandel.b

Categories: FLOSS Project Planets

Python Bytes: #409 We've moved to Hetzner write-up

Planet Python - Thu, 2024-11-14 03:00

Topics covered in this episode: <ul> <li><a href="https://github.com/willmcgugan/terminal-tree?featured_on=pythonbytes">terminal-tree</a></li> <li><a href="https://posting.sh?featured_on=pythonbytes">posting: The API client that lives in your terminal</a></li> <li>Extra, extra, extra</li> <li><a href="https://micro.webology.dev/2024/11/03/uv-does-everything.html?featured_on=pythonbytes">UV does everything or enough that I'm not sure what else it needs to do</a></li> <li>Extras</li> <li>Joke</li> </ul><a href='https://www.youtube.com/watch?v=vg6VLG0jKek' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="409">Watch on YouTube</a> About the show Sponsored by: <ul> <li><a href="https://pythonbytes.fm/scout">ScoutAPM</a> - Django Application Performance Monitoring</li> <li><a href="https://pythonbytes.fm/codeium">Codeium</a> - Free AI Code Completion & Chat </li> </ul> Connect with the hosts <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy">@mkennedy@fosstodon.org</a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken">@brianokken@fosstodon.org</a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes">@pythonbytes@fosstodon.org</a></li> </ul> Join us on YouTube at <a href="https://pythonbytes.fm/stream/live">pythonbytes.fm/live</a> to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it. Michael #1: <a href="https://github.com/willmcgugan/terminal-tree?featured_on=pythonbytes">terminal-tree</a> <ul> <li>An experimental filesystem navigator for the terminal, built with <a href="https://github.com/textualize/textual?featured_on=pythonbytes">Textual</a></li> <li>Tested in macOS only at this point. Chances are very high it works on Linux. Slightly lower chance (but non-zero) that it works on Windows. <ul> <li>Can confirm it works on Linux</li> </ul></li> </ul> Brian #2: <a href="https://posting.sh?featured_on=pythonbytes">posting: The API client that lives in your terminal</a> <ul> <li>Also uses Textual</li> <li>From Darren Burns</li> <li>Interesting that the installation instructions recommends using uv: <ul> <li>uv tool install --python 3.12 posting</li> </ul></li> <li>Very cool. Great docs. Beautiful. keyboard centric, but also usable with a mouse.</li> <li>“Fly through your API workflow with an approachable yet powerful keyboard-centric interface. Run it locally or over SSH on remote machines and containers. Save your requests in a readable and version-control friendly format.”</li> <li>Able to save multiple environments</li> <li>Great colors</li> <li>Allows scripting to run Python code before and after requests to prepare headers, set variables, etc.</li> </ul> Michael #3: Extra, extra, extra <ul> <li><a href="https://training.talkpython.fm/courses/getting-started-with-spacy?featured_on=pythonbytes">spaCy course</a> swag give-away, <a href="https://forms.gle/MJPWh3VCB58Peegj7?featured_on=pythonbytes">enter for free</a></li> <li>New essay: <a href="https://mkennedy.codes/posts/opposite-of-cloud-native-is-stack-native/?featured_on=pythonbytes">Opposite of Cloud Native is?</a></li> <li>News: <a href="https://talkpython.fm/blog/posts/we-have-moved-to-hetzner/?featured_on=pythonbytes">We've moved to Hetzner</a></li> <li>New package: <a href="https://mkennedy.codes/posts/introducing-the-chameleon-flask-package/?featured_on=pythonbytes">Introducing chameleon-flask package</a></li> <li>New release: <a href="https://github.com/mikeckennedy/listmonk?featured_on=pythonbytes">Listmonk Python client</a></li> <li><a href="https://www.tiobe.com/tiobe-index/?featured_on=pythonbytes">TIOBE Update</a></li> <li><a href="https://peps.python.org/pep-0750/?featured_on=pythonbytes">PEP 750 – Template Strings</a></li> <li><a href="https://canarymail.io?featured_on=pythonbytes">Canary email</a></li> <li>Left Omnivore, for Pocket, left Pocket for, …, landed on <a href="https://www.instapaper.com?featured_on=pythonbytes">Instapaper</a> <ul> <li>Supports direct import from Omnivore and Pocket</li> <li>Though <a href="https://hoarder.app/?featured_on=pythonbytes">Hoarder</a> is compelling</li> </ul></li> <li>Trying out <a href="https://zen-browser.app/?featured_on=pythonbytes">Zen Browser</a> <ul> <li>Wasn’t a fan of Arc (<a href="https://www.yahoo.com/tech/arc-browser-creator-moving-project-151945233.html?featured_on=pythonbytes">especially</a><a href="https://www.yahoo.com/tech/arc-browser-creator-moving-project-151945233.html?featured_on=pythonbytes"> now)</a> but the news turned me on to Zen</li> </ul></li> </ul> Brian #4: <a href="https://micro.webology.dev/2024/11/03/uv-does-everything.html?featured_on=pythonbytes">UV does everything or enough that I'm not sure what else it needs to do</a> <ul> <li>Jeff Triplett</li> <li>“UV feels like one of those old infomercials where it solves everything, which is where we have landed in the Python world.”</li> <li>“My favorite feature is that UV can now bootstrap a project to run on a machine that does not previously have Python installed, along with installing any packages your application might require.”</li> <li>Partial list (see Jeff’s post for his complete list) <ul> <li>uv pip install replaces pip install</li> <li>uv venv replaces python -m venv</li> <li>uv run, uv tool run, and uv tool install replaces pipx</li> <li>uv build - Build your Python package for pypi</li> <li>uv publish - Upload your Python package to pypi, replacing twine and flit publish</li> </ul></li> </ul> Extras Brian: <ul> <li><a href="https://nedbatchelder.com/blog/202411/coveragepy_originally.html?featured_on=pythonbytes">Coverage.py originally </a>was just one file</li> <li>Trying out BlueSky <a href="https://bsky.app/profile/brianokken.bsky.social?featured_on=pythonbytes">brianokken.bsky.social</a> <ul> <li>Not because of Taylor Swift, but nice. </li> <li>There are a lot of Python people there.</li> </ul></li> </ul> Joke: <a href="https://devhumor.com/media/how-programmers-sleep?featured_on=pythonbytes">How programmers sleep</a>

Categories: FLOSS Project Planets

Stefan Scherfke: Publishing to PyPI with a Trusted Publisher from GitLab CI/CD

Planet Python - Thu, 2024-11-14 01:18

PyPA’s Trusted Publishers let you upload Python packages directly from your CI pipeline to PyPI. And you don’t need any long-lived secrets like API tokens. This makes uploading Python packages not only easier than ever and more secure, too.

In this article, we’ll look at what Trusted Publishers are and how they’re more secure than using API tokens or a username/password combo. We’ll also learn how to set up our GitLab CI/CD pipeline to:

continuously test the release processes with the TestPyPI on every push to main,
automatically perform PyPI releases on every Git tag, and
additionally secure the process with GitLab (deployment) environments.

The official documentation explains most of this, but it doesn’t go into much depth regarding GitLab pipelines and leaves a few details unexplained.

Why should I want to use this?

API tokens aren’t inherently insecure, but they do have a few drawbacks:

If they are passed as environment variables, there’s a chance they’ll leak (think of a debug env | sort command in your pipeline).
If you don’t watch out, bad co-maintainers can steal the token and do mischief with it.
You have to manually renew the token from time to time, which can be annoying in the long run.

Trusted Publishers can avoid these problems or, at the very least, reduce their risk:

You don’t have to manually renew any long-lived tokens.
All tokens are short-lived. Even if they leak, they can’t be misused for long.

After we’ve learned how Trusted Publishers and protected GitLab environments work, we will take another look at security considerations.

How do Trusted Publishers work?

The basic idea of Trusted Publishers is quite simple:

In PyPI’s project settings, you add a Trusted Publisher and configure it with the GitLab URL of your project.
PyPI will then only accept package uploads if the uploader can prove that the upload comes from a CI pipeline of that project.

The technical process behind this is based on the OpenID Connect (OIDC) standard.

Essentially, the process works like this:

In your CI pipeline, you request an ID token for PyPI.
GitLab injects the short-lived token into your pipeline as a (masked) environment variable. It is cryptographically signed by GitLab and contains, among other things, your project’s path with namespace.
You use this token to authenticate with PyPI and request another token for the actual package upload.
This API token can now be used just like “normal” project-scoped API tokens.

The Trusted Publishers documentation explains this in more detail.

One problem remains, though: An ID token can be requested in any pipeline job and in any branch. Malicious contributors could sneak in a pipeline job and make a corrupted release.

This is where environments come in.

Environments

GitLab environments represent your deployed code in your infrastructure. Think of your code running in a container in your production or testing Kubernetes cluster; or your Python package living on PyPI. :-)

The most important feature of environments in this context is access control: You can protect environments, restricting deployments to them. For protected environments, you can define users or roles that are allowed to perform deployments and that must approve deployments. For example, you could restrict deployments (uploads to PyPI) to all maintainers of your project, but only after you yourself have approved each release.

Note

Protected environments are a premium feature.

Non-profit open source projects/organizations can apply for a free ultimate subscription.

It seems that very old projects also have this feature enabled. Otherwise I can’t explain why I have it for Typed Settings but not for my other projects…

To use an environment in your CI/CD pipeline, you need to add it to a job in the .gitlab-ci.yml.

If we also store the name of the environment in the PyPI deployment settings, only uploads from that environment will be allowed, i.e. only uploads that have been authorized by selected people.

Only maintainers can deploy to the release environment and only after Stefan approved it. Only maintainers can deploy to the release environment and only after Stefan approved it. Security Considerations

The last two sections have already hinted at this: GitLab environments are only truly secure if you can protect them.

Let’s take a step back and consider what threats we’re trying to protect against, so that we’ll then be able to choose the right approach:

Random people doing a merge request for your project.
Contributors with the developer role committing directly into your project.
Co-maintainers with more permissions then a developer.
A Jia Tan which you trust even more than the other maintainers.

What can we do about it?

Code in other people’s forks doesn’t have access to your project’s CI variables nor can it request OIDC ID tokens in your project’s name. But you need to carefully review each MR!
Contributors with only developer permissions can still request ID tokens. If you cannot use protected environments, using an API token stored in a protected CI/CD variable is a more secure approach. You should also protect your main branch and all tags (using the * pattern), so that devleopers only have access to feature branches. You’ll find it under Settings → Repository → Protected branches/tags.
Protected CI/CD variables do not protect you from malicious maintainers, though. Even if you only allow yourself to create tags, other maintainers still have access to protected variables. Protected environments with only a selected set of approvers is the most secure approach.
If a very trusted co-maintainer becomes malicious, there’s very little you can do. Carefully review all commits and read the audit logs (Secure → Audit Events).

So that means for you:

If you are the only maintainer of a small open source project, just use a Trusted Publisher with (unprotected) environments.
If you belong to a larger project with multiple maintainers, consider applying for GitLab for Open Source and use a Trusted Publisher with a protected environment.
If there are multiple contributors and you don’t have access to protected environments, use an API token stored in a protected CI/CD variable and try only grant developer permissions to contributors.

Gizra.com: Drupal on Azure - Forging Docker Image and Beyond

Planet Drupal - Wed, 2024-11-13 19:00

When you switch from PaaS to IaaS, suddenly you have a series of new responsibilities. There is a lot to learn and also a lot to mimic from existing PaaS providers. We would like to share our experience after successfully migrating four larger Drupal sites from Pantheon to Azure cloud. The overview is in chronological order, detailing how we proceeded with the implementation. Infrastructure Basics We worked with an excellent infrastructure team who provisioned the following architecture for us:

Categories: FLOSS Project Planets

Seth Michael Larson: Early promising results with SBOMs and Python packages

Planet Python - Wed, 2024-11-13 19:00

Early promising results with SBOMs and Python packages About • Blog • Cool URLs Early promising results with SBOMs and Python packages

Published 2024-11-14 by Seth Larson
Reading time: minutes

This critical role would not be possible without funding from the Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

I've kicked off a project to reduce the "phantom dependency" problem for Python. The phantom dependency problem is where distinct software (sometimes written in Python, but often C, C++, Rust, etc) is included in a Python package but then isn't recorded anywhere in the package metadata.

These distinct pieces of software aren't not recorded because of lack of time or awareness, there is no standardized method to record this information in Python package metadata.

This means that when a software composition analysis (SCA) tool looks at the Python package the tool will "miss" all the software that's included in the package aside from the top-level package itself.

For example, the popular Python image manipulation library "Pillow" is not only "Pillow", the wheel files contain many more libraries to comply with the "manylinux" package platform:

# (the below libraries are bundled by auditwheel) $ unzip -l pillow-11.0.0-cp312-cp312-manylinux_2_28_x86_64.whl | grep 'pillow.libs' pillow.libs/ pillow.libs/libharfbuzz-144af51e.so.0 pillow.libs/libxcb-b8a56d01.so.1.1.0 pillow.libs/libpng16-4cc6a9fc.so.16.44.0 pillow.libs/libXau-154567c4.so.6.0.0 pillow.libs/libbrotlicommon-3ecfe81c.so.1 pillow.libs/liblzma-c9407571.so.5.6.3 pillow.libs/libfreetype-e7d5437d.so.6.20.1 pillow.libs/liblcms2-e69eef39.so.2.0.16 pillow.libs/libopenjp2-05423b53.so pillow.libs/libtiff-0a86184d.so.6.0.2 pillow.libs/libjpeg-45e70d75.so.62.4.0 pillow.libs/libbrotlidec-ba690955.so.1 pillow.libs/libwebp-2fd3cdca.so.7.1.9 pillow.libs/libsharpyuv-898c0cb5.so.0.1.0 pillow.libs/libwebpdemux-f2642bcc.so.2.0.15 pillow.libs/libwebpmux-d524b4d5.so.3.1.0

I see many recognizable projects in the list of shared objects, like libjpeg, libwebp, libpng, xz-utils (liblzma), etc. If we try to scan this installed wheel with a tool like Syft we receive this report:

$ syft dir:venv ✔ Indexed file system venv ✔ Cataloged packages [2 packages] NAME VERSION TYPE pillow 11.0.0 python pip 24.2 python

Syft isn't able to find any of the compiled libraries! So if we were to run a vulnerability scanner we would only receive vulnerability records for Pillow and pip. My plan is to help fix this problem with Software Bill-of-Materials documents (SBOMs) included in a standardized way inside of Python packages.

To test how well this proposal works with today's tools, I forked auditwheel and created a rudimentary patch which:

For each shared library which is being bundled into a wheel, record the original file path and checksum. Bundle the shared libraries into the wheel as normal.
Using platform-specific manager query each file path back to the package that provides the file. In this specific case rpm was used (rpm -qf <path>) because manylinux_2_28_x86_64 uses AlmaLinux 8 as the distribution.
Gather information about that package using rpm, such as the name, version, etc.
For each package, create the intrinsic "package URL" (PURL) software identifier for later use. This includes information about the packaging format, package name, version, but also the distro and architecture. For example, the PURL for the copy of libwebp used by the wheel is: pkg:rpm/almalinux/libwebp@1.0.0-9.el8_9.1?arch=x86_64&distro=almalinux-8
Generate a CycloneDX SBOM file containing the above gathered information split into components and with relationship links between the top-level component (Pillow) and the bundled libraries.
Embed that generated SBOM file into the wheel.

Let's run through building Pillow from source and using our forked auditwheel:

# The manylinux image may differ depending on your platform. $ docker run --rm -it -v.:/tmp/wheelhouse \ quay.io/pypa/manylinux_2_28_x86_64 # Install dependencies for Pillow $ yum install --nogpgcheck libtiff-devel \ libjpeg-devel openjpeg2-devel zlib-devel \ freetype-devel lcms2-devel libwebp-devel \ tcl-devel tk-devel harfbuzz-devel \ fribidi-devel libxcb-devel # Create a virtualenv and install auditwheel fork $ /usr/local/bin/python3.12 -m venv venv $ source venv/bin/activate $ python -m pip install build $ python -m pip install git+https://github.com/sethmlarson/auditwheel@sboms # Download the Pillow source from PyPI $ python -m pip download --no-binary=pillow pillow==11.0.0 $ tar -xzvf pillow-11.0.0.tar.gz # Build a non-manylinux wheel for Pillow $ python -m build ./pillow-11.0.0/ # Repair the wheel using auditwheel $ auditwheel repair ./pillow-11.0.0/dist/ ... Fixed-up wheel written to /wheelhouse/pillow-11.0.0-cp312-cp312-manylinux_2_28_x86_64.whl # Inspect the wheel for our SBOM, there it is! $ unzip -l /wheelhouse/pillow-11.0.0-*.whl | grep '.cdx.json' 5703 11-14-2024 19:39 pillow.libs/auditwheel.cdx.json # Move the wheel outside our container $ mv /wheelhouse/pillow-11.0.0-cp312-cp312-manylinux_2_28_x86_64.whl /tmp/wheelhouse/

So now we have a wheel file that contains an SBOM partially describing its contents. Let's try installing that wheel and running Syft:

$ syft dir:venv ✔ Indexed file system /tmp/venv-pillow ✔ Cataloged packages [13 packages] NAME VERSION TYPE Pillow 11.0.0 python bzip2-libs 1.0.6-26.el8 rpm freetype 2.9.1-9.el8 rpm jbigkit-libs 2.1-14.el8 rpm lcms2 2.9-2.el8 rpm libXau 1.0.9-3.el8 rpm libjpeg-turbo 1.5.3-12.el8 rpm libpng 1.6.34-5.el8 rpm libtiff 4.0.9-33.el8_10 rpm libwebp 1.0.0-9.el8_9.1 rpm libxcb 1.13.1-1.el8 rpm openjpeg2 2.4.0-5.el8 rpm pip 24.2 python

Woo hoo! Now the proper libraries are showing up in Syft. That means we'll be able to get vulnerability information from all the contained software components. This isn't the end, there are many many MANY ways that software ends up in a Python package. This quick validation test only shows that even with today's SBOM and SCA tools that embedding SBOM documents into wheels can be useful for downstream tools. Onwards to even more! 🚀

If you're interested in this project, follow the repository on GitHub and participate in the kick-off discussion on Python Discourse.

That's all for this post! 👋 If you're interested in more you can read the last report.

Have thoughts or questions? Let's chat over email or social:

sethmichaellarson@gmail.com
@sethmlarson@fosstodon.org

Want more articles like this one? Get notified of new posts by subscribing to the RSS feed or the email newsletter. I won't share your email or send spam, only whatever this is!

Want more content now? This blog's archive has ready-to-read articles. I also curate a list of cool URLs I find on the internet.

Find a typo? This blog is open source, pull requests are appreciated.

Thanks for reading! ♡ This work is licensed under CC BY-SA 4.0

︎

Categories: FLOSS Project Planets

FSF Blogs: PERA Act votes tomorrow - A major step back for software freedom

GNU Planet! - Wed, 2024-11-13 15:54

Categories: FLOSS Project Planets

PERA Act votes tomorrow - A major step back for software freedom

FSF Blogs - Wed, 2024-11-13 15:54

Categories: FLOSS Project Planets

drunomics: Was mir das DrupalCamp Berlin über “Volunteering” beigebracht hat

Planet Drupal - Wed, 2024-11-13 13:06

What DrupalCamp Berlin taught me about "volunteering" Alte Münze arthur.lorenz Thu, 11/14/2024 - 09:15 From hesitation to enthusiasm: when I decided to volunteer at DrupalCamp Berlin, I didn't know what to expect. In hindsight, it was an unforgettable experience that allowed me to immerse myself deeply in the community. In this post, I share behind-the-scenes insights and what it really means to help shape an event like this.

Categories: FLOSS Project Planets

Drupal Association blog: Governance in the Drupal Ecosystem

Planet Drupal - Wed, 2024-11-13 12:24

The Summary

To ensure Drupal’s stability and independence, the project is managed through a well-established, transparent governance system. Dries Buytaert, the Founder and Project Lead, helped design a model that distributes power and prevents any single person or entity — even himself — from making unilateral decisions that could alter the project unexpectedly. The independent Drupal Association oversees Drupal.org and other key infrastructure, free from commercial pressures. This approach ensures that Drupal.org is reliable and creates a fair playing field for all contributors, embodying true open-source leadership.

Just as the Drupal software has grown and changed significantly over its 23-year history, so has its governance. And, while there’s always room for improvement, it is safe to say that Drupal’s seasoned governance is what allows it to be one of the largest, independent open source projects in the world.

The Detail

Dries Buytaert, as the founder and project lead, ultimately guides the direction of Drupal, and is responsible for shaping the project’s philosophy and core principles.

While Dries started Drupal on his own, he has helped evolve the governance model over the years to be mature and resilient. To help govern the project's technical aspects, Dries established the core committer team and other supporting groups. To oversee non-technical areas, he co-founded the Drupal Association. These initiatives were intentional efforts to scale and strengthen Drupal’s governance.

On the technical side, the governance model for Drupal core is very mature, as described in the Drupal Project Governance. Technical decision-making is distributed among the core committers and other maintainers, promoting a transparent, structured, and collaborative approach to managing Drupal core.

Many other aspects of Drupal governance are managed by the Drupal Association, which is a U.S. 501(c)3 nonprofit organization formed in 2008 to support the Drupal project and the Drupal community. I am currently the Chief Executive Officer of the Association. Our mission is to drive innovation and adoption of Drupal as a high-impact digital public good, hand-in-hand with our open source community. A fundamental obligation of the Drupal Association is to ensure that Drupal is available to anyone, anywhere in the world free of charge. We primarily accomplish this task through Drupal.org.

The Drupal Association is a bona fide non-profit organization (not a pass-through), with assets of just over $3 million and an operating budget of over $4 million. We publish our finances annually (see: Find the reports in the Accountability section of D.org). The Association is not controlled or funded by any single entity nor does it pass revenues onto another entity. The Association’s revenue comes from hundreds of organizations and thousands of individuals. No single financial contributor accounts for more than 10% of our revenue. This diverse support base prevents any one entity from having too much influence.

The Drupal Association employs a full-time team of 19 professionals located throughout the world. These people include engineers, marketers, accountants, communication staff, and program administration team members. I say all this to demonstrate that we have the capacity to legitimately, and independently, carry out our mission.

The Drupal Association owns and controls important components of the Drupal ecosystem that allow Drupal to be one of the largest independent FOSS projects in the world.

The Drupal Association owns and/or controls the infrastructure that powers Drupal.org. The Drupal Association has complete control over who accesses Drupal.org, how they access it, and what they can do when accessing it. These are covered by our Terms of Service.

In administering Drupal.org, the Drupal Association controls a number of services, including:

The database of Drupal.org users/project contributors
A self-hosted GitLab instance that includes all of the Drupal code repositories for core and contrib, testing with GitLab CI and documentation through GitLab Pages
Drupal software packaging (the actual .zip and .tar.gz files containing Drupal code)
Drupal Updates (the Updates.xml feed, Automatic Updates endpoint, Secure Signing server, and Packages.Drupal.org- the composer endpoint for Drupal projects).
The Drupal namespace on GitHub
The Drupal namespace on Packagist
The Drupal namespace on NPM
The Drupal Infrastructure namespace on gitlab.com (separate from our self-hosted instance)
The contribution credit system
Usage data about Drupal core and extensions

The Drupal Association also owns and controls the primary means by which the community communicates and gathers. We organize DrupalCons and manage Drupal Slack. We issue The Drupal Association Newsletter and TheWeeklyDrop (together with Bob Kepford). We control and manage Mastodon, X/Twitter, YouTube, LinkedIn (Drupal, Drupal Association, Drupal Jobs), Facebook, Instagram, and TikTok.

Drupal has the Maker/Taker Problem that nearly all open source projects face. There are companies that profit off Drupal who don’t give back to help maintain the project. The Drupal Association has chosen to address this issue by restructuring our Drupal Certified Partner program to focus exclusively on those companies that give back to the community. The goal is to incentivize the creation of a culture of contribution within companies that work in Drupal that provide the Drupal Project with sufficient resources to innovate and grow. There is always work to be done in creating a more equitable program, but it is beginning to work as we have more than doubled the number of Drupal Certified Partners in the past 15 months.

The Drupal Association is governed by a 12-person Board of Directors that meets several times a year, including two public meetings at DrupalCons. Nine directors are selected by a Nominating Committee of the board and two directors are elected by members of the Drupal Association. The final seat is the “Founding Director”. This is a voting seat that can only be filled by Dries Buytaert. Like all board seats, this is an unpaid, voluntary role that carries with it a single vote on the board. It has to be approved annually by the Board of Directors. Except for the trademark licensing, the Drupal Association has no contracts or agreements with Dries Buytaert or the Drupal Project, and Dries receives no funding from the Drupal Association or its operation of Drupal.org.

Dries Buytaert owns the trademark “Drupal”. He has transparently communicated the Drupal Trademark and Logo Policy by which these are governed. Under the policy, any changes to the policy go into effect sixty (60) days after publication. Dries Buytaert also owns the domain names “drupal.org”, “drupal.com” and “drupalcon.org”.

Dries has granted the Drupal Association an exclusive license to use “Drupal”, “Drupal.org”, and “DrupalCon” and a non-exclusive license to use Drupal for non-commercial uses. This license allows the Drupal Association to support the Drupal Project by providing the infrastructure to host and maintain the official version of Drupal and to organize its contributors. It also allows the Association to support the Drupal Community in their work with Drupal.

The net effect of this arrangement is that Dries Buytaert retains ultimate control over what software can be named “Drupal” and what website can be named “Drupal.org.” He can thus ensure that any software that calls itself “Drupal” or website that uses “Drupal.org” conforms with his vision. This would likely cause the Drupal Association to fork the software and maintain it under a new name and url. The high cost of such an action to both parties makes this option highly unlikely and unable to execute quickly.

What the trademark does not allow him to do is to block any person or organization from using any component of Drupal core or any modules housed on Drupal.org. Those decisions are the sole discretion of the Drupal Association. To date, we have exercised this authority in a very limited manner to protect and safeguard the website and its content from attacks and misuse.

Twenty-three years ago, Dries chose to release Drupal under an open-source license, inspiring tens of thousands to build careers and champion an Open Web. However, fulfilling this vision required more than just a General Public License. By creating the Drupal Association, setting up Drupal core's governance, and licensing the trademark, Dries ensured Drupal remained open-source without commercial entanglements, securing a strong, independent foundation.

Along with Dries Buytaert and many contributors, the Drupal Association is focused on the future of Drupal (see: Starshot Initiative). How can we support its adoption through marketing and create sustainable revenue streams for Drupal to flourish? These are tough questions that confront many open source projects. Our governance allows us to move forward in this work with great certainty.

Categories: FLOSS Project Planets

Bojan Mihelac: Building docker images with private python packages

Planet Python - Wed, 2024-11-13 11:37

Installing private python packages into Docker container can be tricky because container does not have access to private repositories and you do not want to leave trace of private ssh key in docker…

Categories: FLOSS Project Planets

Bojan Mihelac: Django app name translation in admin

Planet Python - Wed, 2024-11-13 11:37

"Django app name translation in admin" is small drop-in django application that overrides few admin templates thus allowing app names in Django admin to be translated.

Categories: FLOSS Project Planets

Bojan Mihelac: Django-sites-ext

Planet Python - Wed, 2024-11-13 11:37

Categories: FLOSS Project Planets

Search form

Tag cloud

Feeds

Stefano Zacchiroli: In memory of Lunar

PyCharm: Inline AI Prompting, Coding Assistance for the dataclass_transform Decorator (PEP 681), and More in PyCharm 2024.3!

Qt Creator 15 RC released

Metafont, MetaPost and Malayalam font

Real Python: Quiz: Namespaces and Scope in Python

The Open Source Initiative and the Eclipse Foundation to Collaborate on Shaping Open Source AI (OSAI) Public Policy

My first in-person Akademy: Thessaloniki 2023

Unifying the KRunner sorting mechanisms for Plasma6 & further plans

PyPy: Guest Post: Final Encoding in RPython Interpreters

Python Bytes: #409 We've moved to Hetzner write-up

Stefan Scherfke: Publishing to PyPI with a Trusted Publisher from GitLab CI/CD

Gizra.com: Drupal on Azure - Forging Docker Image and Beyond

Seth Michael Larson: Early promising results with SBOMs and Python packages

FSF Blogs: PERA Act votes tomorrow - A major step back for software freedom

PERA Act votes tomorrow - A major step back for software freedom

drunomics: Was mir das DrupalCamp Berlin über “Volunteering” beigebracht hat

Drupal Association blog: Governance in the Drupal Ecosystem

Bojan Mihelac: Building docker images with private python packages

Bojan Mihelac: Django app name translation in admin

Bojan Mihelac: Django-sites-ext

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Feeds

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research