Feeds

Debian Brasil: Debian Day Brasil - chamada de organizadores(as)

Planet Debian - Sun, 2024-06-09 11:00

No dia 16 agosto é comemorado o aniversário do Projeto Debian, e todos os anos comunidades ao redor do mundo organizam encontros para celebrar esta data.

Chamado de Debian Day (Dia do Debian), o evento sempre conta com uma quantidade expressiva de comunidadades brasileiras organizando atividades nas suas cidades no dia 16 (ou no sábado mais próximo).

Em 2024 o Debian Day celebrará os 31 anos do Projeto Debian e o dia 16 de agosto será numa sexta-feira, por isso provavelmente a maioria das comunidades organizarão suas atividades no sábado, dia 17.

Estamos fazendo uma chamada de organizadores(as) para o Debian Day em 2024. A ideia é reunir, em um grupo no telegram, as pessoas interessadas em coordenar as atividades das suas comunidades locais para trocar experiências, ajudar os(as) novatos(as), e discutir a possibilidade do Projeto Debian ajudar financeiramente as comunidades.

O Debian Day na sua cidade pode ser desde um encontro em uma pizzaria/bar/restaurante para promover a reunião das pessoas, até um evento mais amplo com palestras/oficinas. Então não existe obrigatoriedade sobre como deve ser o encontro, tudo depende do que você e a sua comunidade querem e podem fazer.

Existe a possibilidade de solicitarmos ao líder do projeto Debian para reembolsar algumas despesas. Por exemplo, para produzir adesivos, pagar as pizzas, encomendar um bolo, etc.

Venha fazer parte do grupo Debian Day BR no telegram e discutir as ideias: https://t.me/debian_day_br

Se você topa esse desafio e vai organizar um Debina Day na sua cidade, não deixe de adicionar a sua cidade com as informações necessárias aqui.

Categories: FLOSS Project Planets

Ed Crewe: Software development with Generative AI

Planet Python - Sun, 2024-06-09 09:25

The Current State of AI Software GenerationThe user tries to describe what they want generated in terms of a snippet of high level programming language code using standard English. They submit it to the AI tool. So what are they asking the AI to generate and how does it do it?

The high level language

High level programming languages are human languages composed of english and maths symbols designed for the comprehension and composition of precise computer instructions. The language makes no more sense than English to a computer. It has to be compiled or interpreted to computer language for it to run. So it may compile to an intermediate bytecode language and then maybe to human readable assembly language - before final translation into the unreadable machine code that the computer runs.

A programmer learns the high level language and becomes fluent in it. They can read and understand the functionality of that code. With the complexity of the machine specific implementation stripped away.

Leaving just the precise functional maths and english / symbology that describes the computer functionality. They think in that code, in order to write it.
Even then, the majority of a programmers time is spent debugging the high level language - and fixing what they have written to be bug free. Because it is difficult to think clearly in code, pre-determining all edge cases etc.

Unlike English language, it can succinctly describe computer functionality in a few lines.

The AI

A detailed English language description of what functionality is required. Plus the name of a high level programming language, are submitted to the AI tool.

It does a search of the web, eg. stack overflow etc. for results for that code language. For Chatbot use (eg. ChatGPT) it applies an English language Large Language Model, LLM (a numeric encoding of learning of the English language) to generate a well phrased aggregation of the most popular results that match the English prompt.

For software use (eg. CoPilot) it works just the same, but the LLM learns English to high level software language aggregate translation. From code examples data, eg. github, to generate what the code syntax might be to match the English description of it.

Finally it returns an untested snippet of generated high level code.

The Non-Developer

The non-developer pastes it in place and tries to run the program with it in.

They may be able to puzzle out the high level language - but don't naturally think in it, just as people without mathematics skills can only think as far as basic arithmetic and are dyslexic when it comes to complex equations.

It seems to work around 50% of the time. When it fails they, go back to square one and try to rephrase their English prompt.

They patch together block after block of prompt created generated code. A crazy paving of a program that likely has a number of bugs and inappropriate features in it. But it kind of works, for the non-developer, that is good enough.

The code gets pushed out there with all its imperfections, and starts to populate the web of code data that is used to generate the next AI code snippet.

Or the Developer
The developer reads the code and understands it, determines if it should do what they want. Or if they just want to use some of it as an example.

They cut paste and rewrite it, using it as a hint tool. Or an extension to their IDE's existing auto-code generation tools that work using templated code and language / import library searches.

Hopefully their IDE is set up to clearer distinguish between real code completions and possible generative code completions. Since otherwise the percentage of nonsense code created by the generative AI pollutes the 100% reliability of IDE code completion, and harms productivity.

Then they run their code and debug as usual.

At least 75% of programming time is not on writing code, but on making sure that the high level instructions are exactly correct for generating bug free machine code. So iteratively refining the lines of code. With code a single comma out of place can break the whole program. When language has to be so carefully groomed, succinct minimal language is essential.

For many developers adding an imprecise, non mathematical language, that is entirely unsuited to defining machine code instructions, such as English, to generate such code is problematic. It introduces a whole layer of imprecision, complexity and bugs to the process. Slowing it right down. Along with requiring developers to write a lot lot more sentences (in English) rather than just quickly typing out the succinct lines of Python (or similar) programming language they have in their head.

The generative AI can help students and others who are learning to code in a computer language, but can it actually improve productivity for real, full time, developers who are fluent in that language?

I think that question is currently debatable. Because I believe the goal of adding yet another language to the stack of languages that need to be interpreted for humans authoring computer code, especially one as unsuited as English, is only useful for people who are far from fluent in the software language.

Once we move beyond error prone early releases of LLMs like ChatGPT-4 then tools such as CoPilot may start to become much more effective at authoring software, and actually produce code that is as likely to work first time and have the same amount of bugs as your average software developer's first cut of the code. We may reach that point within a year or two. At which point professional software developer will need to be adept at using it as part of their toolset.

Even so I believe the whole conception of the application of AI to writing software could benefit from more work engaged in a computer centric alternative approach to the current one focussed on generating plausible human language responses. It only dominates because of all the efforts related to NLP and human interaction. But taking that and sticking on to writing human software languages is more about creating a revenue stream than attempting to have AI do the main work of software development.

Until then, AI will never be able to replace me, as a software developer. Only be another IDE tool I need to learn ... in time when it improves sufficiently to increase productivity.

NOTE - June 2024 Update
Having come back to CoPilot 6 months later. I have come to appreciate some of its new features so have added a new blog post that accepts that it now provides utility even for the seasoned programmer.

Another WayCopilot and the like currently use the ChatGPT approach of a Chatbot front end tied to an English language LLM to generate aggregate search engine results in a human language. But there is no domain specific machine learning knowledge about the semantics of the content. So it doesn't understand, and certainly doesn't pre-check the code. Just as ChatGPT doesn't understand the search engine content. Since currently there are no domain specific trained models for the content in the loop. So if asked a question about pharmacy it doesn't plug in one of the AI models that has learnt pharmacy and is used by that industry to aid in the development of medicines. It understands nothing, it is a chatbot, just a constructor of plausible answers based on search popularity.
Similarly CoPilot has learnt how to predict what code somebody might be trying to write, but it hasn't learnt how to code.

This approach cannot lead to AI generating innovative new coding approaches, full self-coding computers, or remove the need for human readable high level programming languages.

There have been experiments with applying test driven development to AI generated code, but I have not heard of serious attempts to address the bigger picture...

Move all functional code writing to be AI only.
Remove the need for any high level computer language for humans to gain fluency in.
Have AI develop software by hundreds of thousands of iterative composition TDD cycles.
Parallel refactoring thousands of solutions to arrive at the optimum one.
Use AI that understands the machine code it is generating by training it on the results of running that code.
The ML training cycle must be running code not matching outputs to pre-ranked static result training sets.
In addition to the static LLM that encodes the learning of machine code authoring, dynamic training cycles should be run as part of the code composition. Task based ephemeral training models.
Get rid of the wasted effort training AI to understand English, Python, Java, Go or any other existing human language evolved for other tasks.
Finally we are left with the job of telling the computer what its software should do.
We do not want to use English for that, its way too verbose and inaccurate, similarly we don't want a full high level programming language to do it. We need a new half way house. A domain specific language (DSL) for defining functionality only, designed for giving software specification's to AI that it can use to generate automated test suites.

Self-Programming Computers

Exploring the last point in more detail...

Create a higher level pseudo-code language for describing the required functionality that is more English readable than even current high level languages such as Python.

Make that functional DSL focus on defining inputs and outputs - not creating the functionality, but creating the black box functional tests that describe what the working code should do.

Maybe add tools for a slightly no-code approach, with visual generators for the language, eg graphical pipeline builder tools. For people who find thinking visually easier than thinking symbolically.

The software creator uses the DSL to create an extensive set of functional definitions for a project.

The DSL language design and evolution is optimised for LLM interpretation. So it has very tight grammatical and syntactical usage that promote accurate generative outputs.

A new non-developer friendly high level pseudo code language / rigorous AI prompt writing lingo.

Some basic characteristics of the DSL:

auto-formatting (like Go) minimizing syntactical variation
To quote Python's creator - 'There should be one-- and preferably only one --obvious way to do it.'
But strictly applied, rather than as a vague principle as Python does
unlike any other high level language, the design needs to be optimized only for specifying functionality, a high level templating language from which test suites are generated.
the language will never be used to implement functionality
uses simple english vocabulary and ideally minimal mathematical symbology

These DSL prompts are written with a LLM for the DSL it helps create its own prompts and the code creator uses it to refine all the DSL definitions that specify the full functionality.

The specification DSL auto generates all the required tests in a low level language.

Since the system should also have a generative AI LLM trained for C or assembly language.
This is what creates the actual functional code by iteratively running and rewriting it against the specification encoded into the tests.

The AI tool then generates the tests for that implementation and uses TDD to generate the actual functional code - eventually the system should improve to a level better than most software developers. The code it writes no longer needs to be read by a human - because a human will be unable to debug it at anything like the speed the AI tool can.

So we use generative AI to do the part of the job that actually takes all the time. Debugging, refactoring and maintaining the code, making sure it really does what is required functionally. Rather than the quick job of writing a first cut of it that might run without crashing.

Most importantly we don't introduce the use of the full English language, the language of Shakespeare, the language of puns, double meanings, multiple interpretations, shades of grey, implied feeling and emotions, into a binary world to which it is entirely unsuited.

Also we don't need English or high level computer languages in the stack of mistranslation at all.
Because we are not training the AI to understand human languages. We are training it to write its own machine code language based on defining what behaviour it should implement.
BDD / TDD generative AI if you like.

Human's no longer learn complex mathematical process based languages that can be translated into machine code. They learn a more generic language for specifying functional behaviour.

This gives more freedom to widen the DSL to mature into a general precise AI prompt language.

Whilst allowing computers to evolve more machine learning driven software architectures that are self maintaining and not so constrained into the models imposed by current human intelligence and coding practise based programming languages.

Could AI could take my job?Perhaps if all of the above were in place, then finally we would arrive at a place where AI could replace traditional software development and high level software languages.
With concerted effort it could be in 10 years, if some big companies put some serious investment in trying to replace traditional software development.
Code monkeys will all be automated. Only software architects would be required and they would use a new functional specification AI prompt language, not a programming language.

Of course if politicians are scared that dumb ChatGPT can already write as good a speech as they can. Plus replicate all the prejudices and errors of its training data and trainers.
Then setting AI free to fully write software, and itself ... will be way more scary in its long term implications.
Meanwhile we are currently at a place where it arguably doesn't even improve productivity for an experienced software developer, only allows non-developers, students and other language newbies to have a go at writing one of the many dialects of human languages, known as computer languages.

Their mix of math, english, symbols, logic and process may appear more like English than Musical notation or pure maths, but sadly they are no more suited to creation by an English language Chatbot approach.

Categories: FLOSS Project Planets

Jeremy Epstein: Introducing: Floyd-Warshall CSV Generator

Planet Python - Sat, 2024-06-08 20:00

I built a little Python script called the Floyd-Warshall CSV Generator. It takes a CSV of graph edges as input, and generates a CSV of the edges that are the shortest paths between all pairs of vertices.

The script is a simple wrapper of the SciPy floyd_warshall function, which in turn implements the Floyd-Warshall Algorithm. Hope you find it useful for all your directed (or undirected) weighted graph needs.

Given an input CSV of the following graph edges:

point_a,point_b,cost a,b,5 b,c,8 c,d,23 d,e,6

When the script is called as follows:

floyd-warshall-csv-generator &bsol /path/to/input_data.csv &bsol --vertex-i-column-name point_a &bsol --vertex-j-column-name point_b &bsol --weight-column-name cost &bsol --no-directed &bsol --max-weight 35

It generates an output CSV that looks like this:

point_a,point_b,cost a,b,5.0 a,c,13.0 b,c,8.0 b,d,31.0 c,d,23.0 c,e,29.0 d,e,6.0

That is, it generates all the possible (indirect) paths from one point to all other points, based on the (direct) paths that are already known, with duplicate (undirected) paths filtered out, and with paths whose cost is more than max-weight filtered out.

I wrote this script in order to generate the "all edges" data that's shown in the World Locality Transit Graph, which I'll also be blogging about real soon. Let me know if you put this script to any other interesting uses!

Categories: FLOSS Project Planets

Pythonicity: GraphQL cursors

Planet Python - Sat, 2024-06-08 20:00

Contrarian view on cursor-based pagination.

GraphQL documentation recommends cursor-based pagination, and it has subsequently become a popular standard.

In general, we’ve found that cursor-based pagination is the most powerful of those designed. Especially if the cursors are opaque, either offset or ID-based pagination can be implemented using cursor-based pagination (by making the cursor the offset or the ID), and using cursors gives additional flexibility if the pagination model changes in the future. As a reminder that the cursors are opaque and that their format should not be relied upon, we suggest base64 encoding them. …

{ hero { name friends(first: 2) { totalCount edges { node { name } cursor } pageInfo { endCursor hasNextPage } } } }

There are several oversights with this well-intentioned advice.

Cursors and state

Cursors imply state, at least they used to. A database cursor is used for iterating over a result set. Meaning it has transactional integrity to pick up where it left off.

The vast majority of GraphQL APIs are inherently stateless. The “cursor” is being decoded as input to a new request, and offers no guarantees. From this observation, the advice falls apart.

The problem with stateless pagination is inconsistency; items may shift, appear, or disappear. Which gives the client the perception of missing or duplicate items. This happens regardless of whether the pagination is offset or ID based. Arguably worse in the case of IDs, since the reference can move arbitrarily or be gone.

Cursors don’t solve the consistency problem; they give the client the false impression of solving the problem.

Opaqueness and compatibility

The claim is that an opaque cursor is compatible across changes. Changed to do what exactly, would be the more relevant question.

Taking a step back, what is the problem being solved here? We assume there is a list of items, with an inherent ordering, and too many to return to the client with acceptable performance.

Given those assumptions, the first obvious step is an optional size limit. That is not in dispute; the disagreement if over the “offset”. A simple and versatile solution is a range filter over whatever field(s) is relevant to ordering. This is not even remotely controversial when the field in question has a name like date. In other words, “pagination” is not necessarily the problem that needs solving.

Range filters with a size limit are sufficient to implement pagination, and new optional filters are always backwards compatible. They also offer the flexibility of search, whereas cursors can only be used iteratively. And what if the client does not want visibility into the range filters? That is exactly what offset is for; offset is a range filter over an implied index field.

There is a reason why the recommendation does not offer a useful example of this supposed compatibility; there isn’t one. The advice is equivocating on the ambiguity of an after: $ID filter. Is the ID field relevant to the ordering?

If yes, then it is just another range filter
If no, then it is just another placeholder for index

There is no third case. There is no future secret field that relates to ordering, is relevant to the client, but somehow still opaque to the client.

Stateless pagination is a combination of range filters and size limits. No matter what the input fields are called. A true stateful is cursor is opaque precisely because it does not represent any known field.

Next optimization

The “next” piece of advice is that the cursor implementation should indicate whether another request is worthwhile. Again, in a stateless API, the server can make no such guarantee.

If the server can provide a total count, by all means do so. It solves the “next” problem, and is more generally useful.

If it is not feasible for the server to provide a total count, how is it going to implement whether there are more items? At the data layer, it is going to stop processing at N + 1 items instead of the requested N. The client could do that too. Instead of requesting the next 10, it could go to 11.

Better yet, why stop at the server optimizing for N + 0? If it knows there is just 1 more item, why not go ahead and include that last one too. N + 2 anyone? Obsessing over the last “next” is a pointless micro-optimization, all the more so because it is irrelevant whenever the total count is not coincidentally a multiple of N. If N is arbitrary, then optimizing for a particular residue mod N is clearly arbitrary.

API design

Not only is there no good reason to blindly add opaque cursors, there is also no reason to add range filters before needed. A size limit alone solves the first order of magnitude of performance issues. If a client requests the first 10 items, then needs the next 10, actually pressure test whether it is unreasonable to request the first 20. The advantage is the client then has a consistent snapshot of the first 20 regardless of changes, which could provide a better user experience.

A simple strategy for pagination: start with none. Then proceed to next steps as performance warrants.

size limit
range filter on known field(s)
offset

In the unlikely event your API is stateful, you didn’t need this advice because you already had a cursor. Otherwise, cursors are an overly-complicated useless abstraction.

Categories: FLOSS Project Planets

Gaël Varoquaux: Promoting open-source, from inria to :probabl.

Planet Python - Sat, 2024-06-08 18:00

Note

Open-source efforts around scikit-learn at Inria are spinning off to a new enterprise, Probabl, in charge of sustainable development of a data-science commons.

Contents

Prelude: funding scikit-learn is hard

Scikit-learn is a central software component in today’s machine learning landscape, and it is open source, governed by a community, easy to install, and well documented. It started many years ago as a project that we did on the side, and we were joined by many volunteers, which was key to the success of the project. We soon decided to ensure that scikit-learn was not only a volunteer-based effort. Over more than a decade, I’ve dedicated a lot of energy to this, using a variety of funding mechanisms: first grants (as an academic), then sponsoring and related contracts with various actors.

Digital commons eliminate scarcity and exclusivity

Funding digital commons is really hard. People build fortunes by leveraging competitive advantages, by creating lock-ins, or selling access to data. What makes a great open-source library, as scikit-learn, is exactly what prevents these tricks: we are committed to being independent, easy to use and install, lightweight…

The birth of a new ambition

Scikit-learn is very successful, but it could be more. For instance, it does not facilitate pushing to production as much as tensorflow, which can be served, deployed to android… And scikit-learn is not very visible to top decision makers: it’s not a line on their budget, a brand that they know. As a consequence, it is not reaping the benefit of its success [1].

[1]Many commercial tools are sitting on top of open source software like scikit-learn (splunk, sagemaker, to name only a few), making profits, and not helping in any way the open source world that they build upon. The French government is backing us to push the envelope

3 years ago, the French government challenged us to go further, to consolidate the ecosystem into a consistent data-science commons. The strategic interest of France is to preserve some technological autonomy on data, eg sensitive data. Thus, the government offered us, at Inria, a funding opportunity to go further.

They promised us a lot of money (dozens of millions of Euros), but with a specific mission to develop a sustainable “data-science commons” [2] ecosystem around scikit-learn. I’ll spare you the details of the amount of meetings we had, documents that we wrote, to sketch the outline of the project. I pushed forward a vision of technical components that fit in the broader open-source ecosystem, complementing it.

[2]The letter that we received from the French government specifically defines the objective in these words: “data-science common” (“Communs numériques pour la Science des Données”)

As I moved forward, I faced a difficulty: the French government wanted a sustainability plan, and private investment to back it. To be honest, this is not what I’m good at. François Goupil, the COO of the scikit-learn consortium, was helping me, but we needed more for our ambitions. And this is when we started talking to Yann Lechelle, a tech entrepreneur with an impressive track record interested in the impact of France on the global tech world.

Probabl, a mission-driven enterprise

With Yann, we built a new vision. Our challenge is to be long-term sustainable and virtuous for scikit-learn, its broader ecosystem, and its community. Yann brought in a business point of view, and I tried to bring that of open-source communities beyond probabl [3], for instance avoiding to getting in the way of others building businesses that contribute to scikit-learn. Indeed, we are convinced that having a broad and diverse community around scikit-learn is central to its future.

[3]One of the first things that Probabl did (Guillaume Lemaître, to be specific), was submit a grant application (to the Chang-Zuckenberg Institute), to fund, via NumFocus, a developer employed by Quantsight, with no money transiting via Probabl (one reason being that we have no operations outside of Europe so far).

Our sustainability model is still being finetuned. What I can tell is that it will involve a mix of professional service, support & sponsorship agreement, as well as a product-based offer, where we supplement scikit-learn with enterprise features. Our focus will be on features that are typically not the focus of open-source developers: integration in large structures, such as access control, LDAP connection, regulatory compliance. We will not shoehorn scikit-learn in open core or dual licensing approaches: we want our incentives to be aligned with scikit-learn, and its ecosystem, being as complete as possible.

Foster growth and adoption of our open-source stack

In a sense, our inspiration is that of RedHat, where the growth of the company fosters the growth and adoption of the software (Linux in the case of RedHat), beyond the company, in an ecosystem, and for a wide variety of applications.

Strong growth will mean external capital. To ensure that we do not lose the focus on our mission, building data-science commons, Yann penciled down a specific governance of the company (and then validated it with many people, as we are a spin-off from a governmental organization). The ultimate share structure, and the board, are divided in three electoral colleges: one for outside investors, one for founders and employees, and one for public institutions. This ensures a balance of power that hopefully will keep us aligned to our mission. I think that this structure sends a strong signal that we are not just another for-profit that will go from creating useful tech to dark money-generating patterns.

Probabl is already having an impact

A strong open-source team In February, the whole team developing scikit-learn at Inria moved to Probabl, joined by Adrin Jalali, a Berlin-based core developer of scikit-learn and fairlearn. We’ve been hiring excellent people, and we now have 9 people on open-source (see the Probabl team), spending their time contributing to open source (Jérémie, for instance, has been doing the last releases for scikit-learn).

Fostering an ecosystem Probabl is not only about scikit-learn. We are prioritizing 8 libraries, central to the machine-learning and data science ecosystem: joblib, fairlearn, imbalanced-learn… In general, as we have always done, we will not hesitate contributing to upstream or related projects. Our goal is to have a healthy open-source ecosystem around data-science.

Not only software Not everybody sees the important lines of code. I’ve become increasingly aware of the need to do outreach and communication, to coders, but also to decision makers. At Probabl we dedicate energy to be in business meetings, to participate in the tech narrative, to teach how to best do data science, eg with didactic videos. We’re starting a mentioning program, we’ll be organizing sprints… I am convinced that all this is a useful long-term investment.

My position within Probabl, my vested interests

I am a French civil servant (a researcher at Inria, one of our national research institute). Such a position comes with strong responsibilities to control conflicts of interest. The creation of Probabl underwent strict scrutiny (that took a long long time). I have been recently cleared to take an active role: 10% of my time is allocated to be a scientific and open-source advisor for Probabl.

I am not paid by Probabl. 100% of my salary comes from Inria (and I was not given a raise because of my involvement in Probabl). I do have financial interests as a founder, but given that I have a small active part, I have one of the smallest amount of shares among founders.

My main interest in Probabl is really the success of its mission: the long-term growth of an open-source data-science ecosystem. Spinning-off from Inria actually continues my efforts in this direction, but with more agility and breadth. And having on top of open source a variety of complementary commercial activities makes it stronger, by answering better the needs of some actors.

More to come

There are many things that we are still ironing. Clearing out specific details takes time (for instance, clearing my role took a while). We are still to announce the future of the sponsorship program that we had set up at the Inria foundation. Its mission has been transferred to Probabl. Currently, Probabl’s open source team is ensuring continuity of our work with the existing sponsors. But we will set up broader partnership opportunities, with a similar governance, that enable third-parties to invest in open source on a roadmap decided jointly with the open-source community.

I believe that we need a lot of transparency in how we decide upon priorities in our open source team. Our 2024 priorities for scikit-learn are visible here.

I look forward to when Probabl will start adding value to scikit-learn for enterprises with an offer enriching scikit-learn and the broader open-source ecosystem.

I am acutely aware that good open source is made of communities, and that communities need trust and understanding of big players such as Probabl (well, so far we are not that big). I hope that with time our actions will become easy to read and speak of themselves.

Categories: FLOSS Project Planets

Trey Hunner: A beautiful Python monstrosity

Planet Python - Sat, 2024-06-08 17:30

Creating performance tests for Python Morsels exercises is a frequent annoyance

I loathe writing automated tests for performance-related exercises because they’re always flaky. How flaky depends on the exercise, what I’m testing, and the time variability inherent in the particular Python features that a learner might use.

I came up with a solution for flaky tests recently, but it also makes my tests less readable. I then came up with a tool to improve the readability, but that has its own trade-offs.

The code I eventually came up with is a beautiful Python monstrosity.

1 2 3 4 5 6 @attempt_n_times(10) def _(): nonlocal micro_time, tiny_time micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n)

I’ll explain what that code does, but first let’s talk about why it’s needed.

The flaky performance tests

My flaky performance tests initially looked like this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def test_some_test(self): n, m = 2.45, 2.04 micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n)

The first block runs a performance test for the user’s function on a very small list and on a slightly larger list and then asserting that the slightly larger list didn’t take too much longer to run. The next two blocks run the same code on even larger lists and make further assertions about the relative times that the code took to run.

This roughly approximates the time complexity of this code.

Running performance checks in a loop

These performance checks need to:

Predictably fail for inefficient solutions
Predictably pass for efficient solutions
Run fast (within just a few seconds) even when the code is inefficient
Avoid the use of threading because they’ll be running on WebAssembly in the browser
Run consistently on pretty much any computer

These 5 requirements together have caused me countless headaches. I get the tests passing well, but they don’t always fail when they should. I get the tests failing and passing when they should, but then they’re too slow. And so on…

Notice the n and m factors in the above assertions:

1 self.assertLess(small_time, micro_time*n*m)

If n and m are too big, we’ll get false positives (tests passing when they should fail). If n and m are too small, we’ll get false negatives (tests failing when they should pass).

To avoid both Type I and Type II errors, I decided to keep n and m small but attempt the assertion block multiple times.

Here’s the (far less flaky) revised code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 def test_some_test(self): n, m = 2.45, 2.04 for attempts_left in reversed(range(10)): try: micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(5)): try: small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) break except AssertionError: if attempts_left == 0: raise for attempts_left in reversed(range(3)): try: medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n) break except AssertionError: if attempts_left == 0: raise

The for loop runs the code multiple times, the break statement stops the code as soon as the assertions all pass, and the except and if ensure that any assertion errors are suppressed until/unless we’re on the final iteration of the loop.

Let’s call this a for-try-break-except-if-raise pattern. It’s an absurdly verbose name fitting of absurdly verbose code.

This for-try-break-except-if-raise pattern works pretty well! But it’s not pretty.

Like many programmers, I believe that Don’t Repeat Yourself (DRY) need not apply to tests. Tests are allowed to be repetitive if the verbosity improves readability.

But there is so much noise in that code! I decided that removing some noise might improve readability. So I devised a helper utility to reduce the repetition.

In search of a solution

While pondering the repetitive noise in this code, I wondered what Python features I could use to abstract away this for-try-break-except-if-raise pattern.

Could I make a context manager and use a with block? That might help with the try-except, but context managers can’t run their code block multiple times, so that wouldn’t help with the for and the break. So a context manager is out.

Could I abstract this away into a looping helper by implementing a generator function? We are looping and generator functions can break early. But, a generator function can’t catch an exception that’s raised within the body of a loop. So a generator function wouldn’t work either.

What about a decorator? 🤔

Context managers and decorators both sandwich a block of code. But decorators sandwich functions and they have the power to run the same function repeatedly. A decorator might work!

Here’s a decorator that will run a given function up to 10 times (until no AssertionError is raised):

1 2 3 4 5 6 7 8 9 def try_10_times(function): def wrapper(): for attempts_left in reversed(range(10)): try: return function() except AssertionError: if attempts_left == 0: raise return wrapper

To use this decorator, we would need to define a function and then call that function:

1 2 3 4 5 6 7 @try_10_times def assertions(): micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) assertions()

This isn’t quite good enough though…

We need a pattern to run code N times (not necessarily exactly 10)
We reference the variables defined in each block in later blocks, so micro_time and tiny_time will need to be available outside that function
We need this function to run just one time right after it’s defined… could we do that automatically?

All 3 of these problems are solvable:

We need a decorator that accepts arguments
We need to use rarely seen nonlocal statement
We could have the decorator automatically call the decorated function

The final weird decorator

Here’s the decorator I ended up with:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 def attempt_n_times(n): """ Run tests multiple times if assertions are raised. Allows for more forgiving tests when assertions may be a bit flaky. """ def decorator(function): """This looks like a decorator, but it actually runs the function!""" for attempts_left in reversed(range(n)): try: return function() except AssertionError: if attempts_left == 0: raise return decorator

This decorator accepts an n argument which determines the maximum number of times the decorated function should be called. The decorator then calls the function repeatedly in a for loop and a try-except block. As soon as an AssertionError is not raised during one of these function calls, the looping stops.

The weirdest part about this decorator is that it calls the decorated function. Note that the decorator function doesn’t define a wrapper function within itself… it just runs code right away!

The resulting beautiful Python monstrosity

Here’s the final refactored test code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 def test_some_test(self): n, m = 2.45, 2.04 micro_time = tiny_time = small_time = medium_time = 0 @attempt_n_times(10) def _(): nonlocal micro_time, tiny_time micro_time = time(micro_numbers) tiny_time = time(tiny_numbers) self.assertLess(tiny_time, micro_time*n) @attempt_n_times(5) def _(): nonlocal small_time small_time = time(small_numbers) self.assertLess(small_time, tiny_time*n) self.assertLess(small_time, micro_time*n*m) @attempt_n_times(3) def _(): nonlocal medium_time medium_time = time(medium_numbers) self.assertLess(medium_time, micro_time*n*m*m) self.assertLess(medium_time, tiny_time*n*m) self.assertLess(medium_time, small_time*n)

The attempt_n_times decorator immediately calls the function it decorates. Each function is defined and immediately called one or more times, in a try-except block within a loop.

That’s why we’ve named these functions with the throwaway _ name: we don’t care about the name of a function we’re never going to refer to again.

Also note the use of the nonlocal statement. Each function in Python has its own scope and all assignments assign to the local scope by default. That nonlocal variable pulls those variables to the scope of the outer function instead.

Compare the above code to the code just before this refactor:

I find the refactored version easier to skim.

But that attempt_n_times decorator does abuse the decorator syntax. Decorators aren’t meant to call the function they’re decorating.

Is this misuse of decorators worth it?

Is this worth it?

Decorators aren’t supposed to immediately call the function they decorate. But there’s nothing stopping them from doing so. I feel that I’ve traded “normal code” for a beautiful monstrosity that’s easier to skim at a glance.

The attempt_n_times decorator is pretending that it’s a block-level tool by using a function because there’s no other way to invent such a tool in Python.

I think abstracting away the for-try-break-except-if-raise pattern was worth it, even though I ended up abusing Python’s decorator syntax in the process.

What do you think? Was that attempt_n_times abstraction worth it?

Categories: FLOSS Project Planets

A Selenium Primer - Part 1: An Introduction to Selenium

Planet KDE - Sat, 2024-06-08 15:31

In this video I introduce Selenium AT-SPI for testing KDE applications. I present the KDE goals of sustainable software, accessibility, and automation in system testing, and how Selenium helps achieve all of them.

Selenium AT-SPI is an amazing piece of software written by KDE developer Harald Sitter. It is a tool used in KDE to automate tests of GUI applications. This enables developers to design applications that are accessible for all and increase their energy efficiency. As part of Season of KDE 2024 I decided to make a video tutorial for KDE developers.

If you find this video helpful, you can reach out to me on Gitter. I would love to hear back from you 😃

This video was made by Pradyot Ranjan (@pradyotranjan:gitter.im).

Categories: FLOSS Project Planets

A Selenium Primer - Part 4: Writing Selenium Tests

Planet KDE - Sat, 2024-06-08 15:20

In this video I deep dive into writing tests with Selenium. I create a simple test for the KCalc calculator application and run it with Selenium AT-SPI. Similar steps can be followed to write GUI tests for any KDE application.

If you find this video helpful, you can reach out to me on Gitter. I would love to hear back from you 😃

This video was made by Pradyot Ranjan (@pradyotranjan:gitter.im).

Categories: FLOSS Project Planets

A Selenium Primer - Part 3: Identifying Accessibility Issues

Planet KDE - Sat, 2024-06-08 15:19

In this video I explain how the accerciser utility works. Accerciser is a tool used to identify and test GUI accessibility elements. I also run accerciser on KCalc, the KDE calculator application.

If you find this video helpful, you can reach out to me on Gitter. I would love to hear back from you 😃

This video was made by Pradyot Ranjan (@pradyotranjan:gitter.im).

Categories: FLOSS Project Planets

A Selenium Primer - Part 2: Setting up Selenium

Planet KDE - Sat, 2024-06-08 15:19

In this video I set up Selenium AT-SPI locally on KDE Neon. This video follows the Selenium setup guide found here: https://community.kde.org/Selenium.

If you find this video helpful, you can reach out to me on Gitter. I would love to hear back from you 😃

This video was made by Pradyot Ranjan (@pradyotranjan:gitter.im).

Categories: FLOSS Project Planets

Kate Fun Logo

Planet KDE - Sat, 2024-06-08 15:11

G2 posted some fun logos for Kate on reddit.

I think they are nice and flashy and well suited if you want to show your appreciation for Kate and like that art style and a good addition to our awesome icon and mascot.

Static Version Animated Version Licensing

G2 licensed these files under the CC BY-NC-SA 4.0. Feel free to share the stuff with this license and credit for G2.

Comments?

A matching thread for this can be found here on r/KDE.

Categories: FLOSS Project Planets

Thorsten Alteholz: My Debian Activities in May 2024

Planet Debian - Sat, 2024-06-08 13:58

FTP master

This month I accepted 347 and rejected 49 packages. The overall number of packages that got accepted was 348.

Debian LTS

This was my hundred-nineteenth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian.

During my allocated time I uploaded or worked on:

[#1070154] bullseye-pu: qtbase-opensource-src/5.15.2+dfsg-9+deb11u1 package upload
[#1064550] bullseye-pu: libjwt 1.10.2-1+deb11u1 has been marked for accept
[#1067544] bullseye-pu: libmicrohttpd 0.9.72-2+deb11u1 has been marked for accept

I also continued to work on tiff and last but not least did a week of FD and attended the monthly LTS/ELTS meeting.

Unfortunately I used lots of time to debug an issue with nghttp2. Please see my odyssey below.

Debian ELTS

This month was the seventieth ELTS month. During my allocated time I uploaded:

[ELA-1104-1-1]nghttp2 security update for one CVEs to fix an DoS resulting from bad handling of CONTINUATION frames in Stretch

For some tests I installed the new nghttp2 package on my Stretch VM and started the daemon. Unfortunately I got an unexpected error from getaddrinfo() about ai_socktype not supported. The daemon was configured to listen on lo, the device was available, but the error remained. I was pretty sure that my patch was not the reason for this and indeed the unpatched version showed this error as well. I didn’t want to release an untested package, so nghttp2 had to start at least! Therefore I built a minimal example to reproduce the issue. getaddrinfo() failed for hints.ai_socktype=SOCK_STREAM and a numerical IP address. Having no hints at all or “localhost” instead of “127.0.0.1” made the error disappear (as a remark: “localhost” resolves to 127.0.0.1, the ipv6 variant is “ip6-localhost”). I could see that in nghttp2 as well. Configuring it with “localhost” let the error vanish but the daemon still exited due to other reasons. After some time of debugging, I added another network interface to my VM and configured it with a dummy IPv4 address. Voila, everything worked as expected. According to Wikipedia, IPv6 was ratified as standard in 2017 and Stretch was also released in 2017. No wonder that a IPv6-only-VM had problems back then and these problems survived to the present.

I also continued to work on an update for tiff in Jessie and Stretch, did a week of FD and attended the LTS/ELTS meeting.

Debian Printing

This month I uploaded new upstream or bugfix versions of:

… cups-bjnp

This work is generously funded by Freexian!

Debian Astro

This month I uploaded a new upstream or bugfix version of:

Debian IoT

This month I uploaded new upstream or bugfix versions of:

Debian Mobcom

Due to more and more problems with time_t, I removed osmo-iuh and all dependencies from armel, armhf and i386, sorry. If there is really anybody using this software on 32-bit architectures don’t hesitate to get in touch.

It is official now, the GSoC student working on the Mobcom packages is Nathan Doris. He already finished the hardest part of the job and I could upload the latest version of libosmocore. I really enjoy working with him and look forward to a pleasant SoC :-).

misc

This month I uploaded new upstream or bugfix versions of:

Did I already mention that I love lists with topics I can work on. I print out such lists and enjoy checking off one after the other. End of May Helmut told me that I am a bit lazy and gave me such a list with all my packages that have one or the other issue with /usr-move. Most of the uploads above are packages on that list and I could check off a lot :-).

Categories: FLOSS Project Planets

Reproducible Builds: Reproducible Builds in May 2024

Planet Debian - Sat, 2024-06-08 06:30

Welcome to the May 2024 report from the Reproducible Builds project! In these reports, we try to outline what we have been up to over the past month and highlight news items in software supply-chain security more broadly. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

Table of contents:

A peek into build provenance for Homebrew

Joe Sweeney and William Woodruff on the Trail of Bits blog wrote an extensive post about build provenance for Homebrew, the third-party package manager for MacOS. Their post details how each “bottle” (i.e. each release):

[…] built by Homebrew will come with a cryptographically verifiable statement binding the bottle’s content to the specific workflow and other build-time metadata that produced it. […] In effect, this injects greater transparency into the Homebrew build process, and diminishes the threat posed by a compromised or malicious insider by making it impossible to trick ordinary users into installing non-CI-built bottles.

The post also briefly touches on future work, including work on source provenance:

Homebrew’s formulae already hash-pin their source artifacts, but we can go a step further and additionally assert that source artifacts are produced by the repository (or other signing identity) that’s latent in their URL or otherwise embedded into the formula specification.

Distribution news

In Debian this month, Johannes Schauer Marin Rodrigues (aka josch) noticed that the Debian binary package bash version 5.2.15-2+b3 was “uploaded to the archive twice. Once to bookworm and once to sid but with differing content.” This is problem for reproducible builds in Debian due its assumption that the package name, version and architecture triplet is unique. However, josch highlighted that

This example with bash is especially problematic since bash is Essential:yes, so there will now be a large portion of .buildinfo files where it is not possible to figure out with which of the two differing bash packages the sources were compiled.

In response to this, Holger Levsen performed an analysis of all .buildinfo files and found that this needs almost 1,500 binNMUs to fix the fallout from this bug.

Elsewhere in Debian, Vagrant Cascadian posted about a Non-Maintainer Upload (NMU) sprint to take place during early June, and it was announced that there is now a #debian-snapshot IRC channel on OFTC to discuss the creation of a new source code archiving service to, perhaps, replace snapshot.debian.org. Lastly, 11 reviews of Debian packages were added, 15 were updated and 48 were removed this month adding to our extensive knowledge about identified issues. A number of issue types have been updated by Chris Lamb as well. […][…]

Elsewhere in the world of distributions, deep within a larger announcement from Colin Percival about the release of version 14.1-BETA2, it was mentioned that the FreeBSD kernels are now built reproducibly.

In Fedora, however, the change proposal mentioned in our report for April 2024 was approved, so, per the ReproduciblePackageBuilds wiki page, the add-determinism tool is now running in new builds for Fedora 41 (‘rawhide’). The add-determinism tool is a Rust program which, as its name suggests, adds determinism to files that are given as input by “attempting to standardize metadata contained in binary or source files to ensure consistency and clamping to $SOURCE_DATE_EPOCH in all instances”. This is essentially the Fedora version of Debian’s strip-nondeterminism. However, strip-nondeterminism is written in Perl, and Fedora did not want to pull Perl in the buildroot for every package. The add-determinism tool eliminates many causes of non-determinism and work is ongoing to continue the scope of packages it can operate on.

Mailing list news

On our mailing list this month, regular contributor kpcyrd wrote to the list with an update on their source code indexing project, whatsrc.org. The whatsrc.org project, which was launched last month in response to the XZ Utils backdoor, now contains and indexes almost 250,000 unique source code archives. In their post, kpcyrd gives an example of its intended purpose, noting that it shown that whilst “there seems to be consensus about [the] source code for zsh 5.9” in various Linux distributions, it “does not align with the contents of the zsh Git repository”.

Holger Levsen also posted to the list with a ‘pre-announcement’ of sorts for the 2024 Reproducible Builds summit. In particular:

[Whilst] the dates and location are not fixed yet, however if you don’ help us with finding a suitable location soon, it is very likely that we’ll meet again in Hamburg in the 2nd half of September 2024 […].

Lastly, Frederic-Emmanuel Picca wrote to the list asking for help understanding the “non-reproducible status of the Debian silx package” and received replies from both Vagrant Cascadian and Chris Lamb.

Miscellaneous news

strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month strip-nondeterminism version 1.14.0-1 was uploaded to Debian unstable by Chris Lamb chiefly to incorporate a change from Alex Muntada to avoid a dependency on Sub::Override to perform monkey-patching and break circular dependencies related to debhelper […]. Elsewhere in our tooling, Jelle van der Waa modified reprotest because the pipes module will be removed in Python version 3.13 […].

It was also noticed that a new blog post by Daniel Stenberg detailing “How to verify a Curl release” mentions the SOURCE_DATE_EPOCH environment variable. This is because:

The [curl] release tools document also contains another key component: the exact time stamp at which the release was done – using integer second resolution. In order to generate a correct tarball clone, you need to also generate the new version using the old version’s timestamp. Because the modification date of all files in the produced tarball will be set to this timestamp.

Furthermore, Fay Stegerman filed a bug against the Signal messenger app for Android to report that their ‘reproducible’ builds cannot, in fact, be reproduced. However, Fay is quick to note that she has:

… found zero evidence of any kind of compromise. Some differences are yet unexplained but everything I found seems to be benign. I am disappointed that Reproducible Builds have been broken for months but I have zero reason to doubt Signal’s security in any way.

Lastly, it was observed that there was a concise and diagrammatic overview of “supply chain threats” on the SLSA website.

Two new academic papers

Two new scholarly papers were published this month.

Firstly, Mathieu Acher, Benoît Combemale, Georges Aaron Randrianaina and Jean-Marc Jézéquel of University of Rennes on Embracing Deep Variability For Reproducibility & Replicability. The authors describe their approach as follows:

In this short [vision] paper we delve into the application of software engineering techniques, specifically variability management, to systematically identify and explicit points of variability that may give rise to reproducibility issues (e.g., language, libraries, compiler, virtual machine, OS, environment variables, etc.). The primary objectives are: i) gaining insights into the variability layers and their possible interactions, ii) capturing and documenting configurations for the sake of reproducibility, and iii) exploring diverse configurations to replicate, and hence validate and ensure the robustness of results. By adopting these methodologies, we aim to address the complexities associated with reproducibility and replicability in modern software systems and environments, facilitating a more comprehensive and nuanced perspective on these critical aspects.

(A PDF of this article is available.)

Secondly, Ludovic Courtès, Timothy Sample, Simon Tournier and Stefano Zacchiroli have collaborated to publish a paper on Source Code Archiving to the Rescue of Reproducible Deployment. Their paper was motivated because:

The ability to verify research results and to experiment with methodologies are core tenets of science. As research results are increasingly the outcome of computational processes, software plays a central role. GNU Guix is a software deployment tool that supports reproducible software deployment, making it a foundation for computational research workflows. To achieve reproducibility, we must first ensure the source code of software packages Guix deploys remains available.

(A PDF of this article is also available.)

diffoscope

diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 266, 267, 268 and 269 to Debian, making the following changes:

New features:
- Use xz --list to supplement output when comparing .xz archives; essential when metadata differs. (#1069329)
- Include xz --verbose --verbose (ie. double) output. (#1069329)
- Strip the first line from the xz --list output. […]
- Only include xz --list --verbose output if the xz has no other differences. […]
- Actually append the xz --list after the container differences, as it simplifies a lot. […]
Testing improvements:
- Allow Debian testing to fail right now. […]
- Drop apktool from Build-Depends; we can still test APK functionality via autopkgtests. (#1071410)
- Add a versioned dependency for at least version 5.4.5 for the xz tests as they fail under (at least) version 5.2.8. (#374)
- Fix tests for 7zip 24.05. […][…]
- Fix all tests after additon of xz --list. […][…]
Misc:
- Update copyright years. […]

In addition, James Addison fixed an issue where the HTML output showed only the first difference in a file, while the text output shows all differences […][…][…], Sergei Trofimovich amended the 7zip version test for older 7z versions that include the string “[64]“ […][…] and Vagrant Cascadian relaxed the versioned dependency to allow version 5.4.1 for the xz tests […] and proposed updates to guix for versions 267, 268 and pushed version 269 to Guix. Furthermore, Eli Schwartz updated the diffoscope.org website in order to explain how to install diffoscope on Gentoo […].

Website updates

There were a number of improvements made to our website this month, including Chris Lamb making the “print” CSS stylesheet nicer […]. Fay Stegerman made a number of updates to the page about the SOURCE_DATE_EPOCH environment variable […][…][…] and Holger Levsen added some of their presentations to the “Resources” page. Furthermore, IOhannes zmölnig stipulated support for SOURCE_DATE_EPOCH in clang version 16.0.0+ […], Jan Zerebecki expanded the “Formal definition” page and fixed a number of typos on the “Buy-in” page […] and Simon Josefsson fixed the link to Trisquel GNU/Linux on the “Projects” page […].

Upstream patches

This month, we wrote a number of patches to fix specific reproducibility issues, including:

Bernhard M. Wiedemann:
- nauty (CPU-detection issue)
- emacs (ASLR)
Chris Lamb:
- #1070754 filed against gensio.
- #1071064 filed against tkgate.
- #1072094 filed against ruby-pgplot.

Reproducibility testing framework

The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:

Debian-related changes:
- Enable the rebuilder-snapshot API on osuosl4. […]
- Schedule the i386 architecture a bit more often. […]
- Adapt cleanup_nodes.sh to the new way of running our build services. […]
- Add 8 more workers for the i386 architecture. […]
- Update configuration now that the infom07 and infom08 nodes have been reinstalled as “real” i386 systems. […]
- Make diffoscope timeouts more visible on the #debian-reproducible-changes IRC channel. […]
- Mark the cbxi4a-armhf node as down. […][…]
- Only install the hdmi2usb-mode-switch package only on Debian bookworm and earlier […] and only install the haskell-platform package on Debian bullseye […].
Misc:
- Install the ntpdate utility as we need it later. […]
- Document the progress on the i386 architecture nodes at Infomaniak. […]
- Drop an outdated and unnoticed notice. […]
- Add live_setup_schroot to the list of so-called “zombie” jobs. […]

In addition, Mattia Rizzolo reinstalled the infom07 and infom08 nodes […] and Vagrant Cascadian marked the cbxi4a node as online […].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: #reproducible-builds on irc.oftc.net.
Twitter: @ReproBuilds
Mastodon: @reproducible_builds@fosstodon.org
Mailing list: rb-general@lists.reproducible-builds.org

Categories: FLOSS Project Planets

Talk Python to Me: #465: The AI Revolution Won't Be Monopolized

Planet Python - Sat, 2024-06-08 04:00

There hasn't been a boom like the AI boom since the .com days. And it may look like a space destined to be controlled by a couple of tech giants. But Ines Montani thinks open source will play an important role in the future of AI. I hope you join us for this excellent conversation about the future of AI and open source. Episode sponsors <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a> <a href='https://talkpython.fm/porkbun'>Porkbun</a> <a href='https://talkpython.fm/training'>Talk Python Courses</a> Links from the show <div>Ines Montani on Twitter: <a href="https://twitter.com/_inesmontani" target="_blank" rel="noopener">@_inesmontani</a> spaCy: <a href="https://spacy.io" target="_blank" rel="noopener">spacy.io</a> Prodigy App: <a href="https://prodi.gy" target="_blank" rel="noopener">prodi.gy</a> Ines' presentation at PyCon Lithuania: <a href="https://www.youtube.com/watch?v=SsnDN7LI7IY" target="_blank" rel="noopener">youtube.com</a> LM Studio: <a href="https://lmstudio.ai" target="_blank" rel="noopener">lmstudio.ai</a> Little Bobby Tables: <a href="https://xkcd.com/327/" target="_blank" rel="noopener">xkcd.com</a> spaCy and NLP course: <a href="https://talkpython.fm/spacy" target="_blank" rel="noopener">talkpython.fm</a> Watch this episode on YouTube: <a href="https://www.youtube.com/watch?v=zaZrWZwKJH4" target="_blank" rel="noopener">youtube.com</a> Episode transcripts: <a href="https://talkpython.fm/episodes/transcript/465/the-ai-revolution-wont-be-monopolized" target="_blank" rel="noopener">talkpython.fm</a> --- Stay in touch with us --- Subscribe to us on YouTube: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a> Follow Talk Python on Mastodon: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener">talkpython</a> Follow Michael on Mastodon: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener">mkennedy</a> </div>

Categories: FLOSS Project Planets

GPN22

Planet KDE - Sat, 2024-06-08 03:30

A week ago I attended the 22. Gulaschprogrammiernacht (GPN22) in Karlsruhe, Germany. That’s a bit of a smaller version of the Chaos Communication Congress, although with 1000+ attendees “small” doesn’t really do it justice. Below are some of my notes from conversations I had there.

Occasionally Itinerary produces weird results in its timeline for complex trips, and this can be hard to debug remotely without asking for a full export of all the personal data in there. Meeting users and contributors at events is a good opportunity to look at such issues on their devices and analyze the problems together.

Some things could be fixed on the spot, and verified shortly afterwards on the affected installations:

Improved trip grouping for incomplete trips and closely adjacent trips.
Improved recovery from wrong timezone information in manually added entries.
Fixed timeline ordering with location change elements without arrival time.

Data syncing via Matrix

Syncing multiple Itinerary instances is an often requested feature. One of the most promising approaches is doing this via Matrix. An in-depth discussion on that at least gave me a much more refined idea on how that could look like, and what we’d need in order to get there.

Specifically:

Use a special room type for this, so it’s not cluttering the room list in “normal” chat clients.
Use one room per trip rather than one room for the entire data set. This avoids scalability problems with state events, and would allow sharing individual trips with other people for joint planning and traveling together.
Use one state event per timeline element, so that there is always a known latest state without having to load/replay the entire room history.

iOS port

I also met someone looking into an iOS port of Itinerary. Obviously that’s in a very early exploratory state, but should that turn out viable it would probably also benefit many other KDE applications.

Project management

We also talks about how to clean up and revive task tracking for Itinerary. For various historic reasons relevant tasks are currently spread over two Phabricator boards, a Gitlab board, tasks in Itinerary’s Gitlab project and Bugzilla, neither of which is perfectly maintained, current nor complete.

Not sure yet how to best structure this, but it clearly needs to change, and I need to more actively use this rather than having things in my local notes.

KDE Platform & Frameworks Password and credential management

How could credential/secrets storage/management look like for our platform in a world beyond KWallet? KWallet’s design, UX and security model goes back more than 20 years by now, and it shows. And since this is a sensitive and complex topic to touch, it’s probably worth working out first how we would want this to look like eventually.

Things to consider include:

Standardized access APIs for applications like XDG portals or the secret service API.
Data exchange formats with other password management systems.
Integration with cloud-based password managers for synchronization between different devices.
Supporting external hardware tokens and TPMs.
Supporting Passkeys and WebAuthn.
Unlocking UI nicely integrated into the shell, and also supporting e.g. biometric unlocking.
Hiding the complexity of all that from the user during setup to the extend possible.
Browser integration and integration with SSH and GPG key stores.
Should 2FA tools like Keysmith use this or remain separate?
Minimizing the need for configuration, while keeping much of the above opt-in.

Given the complexity we probably should find a place to collect all the ideas and requirements, and then get a few interested people together for a few days to turn this into a plan.

Application debug and support infrastructure

We do have functionality in Frameworks to help us diagnosing issues happening on a user’s device, most prominently KCrash. Some of our applications have additional custom-built diagnostic functions as well though, such as:

NeoChat has some debug output capturing system.
Itinerary has a “debug mode” that shows some additional diagnostics and test functions.

Would it make sense to generalize/standardize some of this and upstream it to Frameworks?

Emergency and weather alerts

One of my main agenda points was meeting with FOSS Warn again, for our joint work on an emergency and weather alert aggregation server.

Topcis included:

Quality metrics and monitoring of the input feeds. While we have feeds from 100+ countries, the activity on those ranges between 10+k alerts in the last 30 days to none. The latter could be valid for a small island nation with stable weather recently, but for a larger country this raises questions on how well that feed is maintained.
The data format and data model for feed metadata and API to query that, something my prototype didn’t do at all. This is useful to show in clients what kinds of alerts (or any at all) could be expected for an area of interest.
How to deal with split-up multilingual feeds. The Common Alerting Protocol (CAP) has support for multilingual entries, but in at least two cases (Russia and Saudi Arabia) those seem to be provided in separate single-language feeds instead.

Transitous

Improvements for Transitous also came out of this.

Chatting with a bunch of people the acquisition of Greyhound by FlixBus was mentioned, leading to somebody raising the question whether that would mean Transitous includes the Greyhound network (given FlixBus in Europe is covered). That turned out to not be the case, so the next person started to do a quick web search finding data for North America as well. Yet another person who had recently travelled that way was able to confirm that to be valid and current. A pull request for Transitous followed soon after, and within less than an hour this was available on the production instance.

Barely four months after this all started at FOSDEM this was just awesome to watch.

And that’s probably the biggest value of such events, a random hallway conversation can result in a significant improvement to public transport routing connectivity on another continent :)

Categories: FLOSS Project Planets

These past two weeks in KDE: massive stability work for Plasma 6.1

Planet KDE - Sat, 2024-06-08 01:33

Sorry for the interruption last week; I was on vacation. While I was vacating, my colleagues were in full-on fix-everything mode in preparation for the upcoming Plasma 6.1 release in a little over a week. And what a release it promises to be! I think this is going to be a good one, folks. Lots of great features, improved performance and smoothness, and oodles of fixes for all kinds of strange bugs with your wild and wacky hardware devices!

New Features

Plasma’s Networks widget now supports WebAuth for SAML-based authentication. I don’t know what this means, but if you do, that probably means you’re happy about it! (Joel Holdsworth, Plasma 6.2.0 Link)

The network QR code that you can show in Plasma’s Networks widget is now draggable, so you can get an image file out of it wherever you want! (Fushan Wen, Plasma 6.2.0. Link)

System Monitor’s data backend now supports getting CPU/memory/IO/etc. pressure data from /proc/pressure/ (when supported by the Linux kernel) and displaying it in sensors. Hopefully this data will start to show up in user-facing features soon! (Adrian Edwards, Plasma 6.2.0 Link)

UI Improvements

Dolphin once again recommends installing Filelight if you try to get information about free space but it’s not installed (Felix Ernst, Dolphin 24.08.0. Link)

Passwords copied from KWalletManager no longer appear visible in the Clipboard widget (Weng Xuetian, KWalletManager 24.08.0. Link)

Plasma’s Panel Settings dialog no longer obscures the hover pop-ups for individual widgets while it’s open (David Edmundson, Plasma 6.1.0. Link)

In Plasma’s Sticky Notes widget, the colors of the inline buttons already change with the system color scheme. Now they also change appropriately based on the color scheme you select for the sticky note itself! (Evgeniy Harchenko, Plasma 6.1.0. Link)

Removed the “Hide utility windows for inactive applications” option, because it was broken and apparently nobody had noticed or reported this, and also it’s kind of a weird thing to exist in the first place. This also fixes a Plasma panel auto-hide bug caused by using it (Xaver Hugl, Plasma 6.1.0. Link 1 and link 2)

Searching in System Settings no longer sometimes returns nonsensical matches based on keywords that got implicitly joined together in ways that didn’t make sense (Harald Sitter, Plasma 6.1.0. Link)

A better cursor icon is now used when dragging windows (Vlad Zahorodnii, Plasma 6.1.0. Link)

On X11 with an NVIDIA GPU, a floating panel, and adaptive panel transparency, an unfortunate bug in the NVIDIA driver causes windows to lag when moved and resized. For now we’ve added a warning in the panel settings dialog about this (Ivan Tkachenko, Plasma 6.2.0. Link)

Improved the accessibility of the common Kirigami.PlaceholderMessage UI component (Aleix Pol Gonzalez, Frameworks 6.4. Link)

The custom accent color feature (including “accent color from wallpaper”) now does a better job of picking colors for links that will be readable, no matter what colors your base color scheme uses (Akseli Lahtinen, Plasma 6.1.0. Link)

Bug Fixes

Fixed an unusual issue in Elisa that could cause it to not launch when certain DBus setups on certain distros (Jack Hill, Elisa 24.05.1. Link)

Fixed Elisa not launching at all on Windows after a Windows integration library we were using was removed (Jack Hill, Elisa 24.05.1. Link)

Elisa no longer crashes when you try to enqueue the contents of filesystem folders that don’t actually have any music in them (Jack Hill, Elisa 24.05.1. Link)

Spectacle’s highlight annotation once again actually highlights (Noah Davis, Spectacle 24.05.1. Link)

Fixed a way that KWin could crash after the system wakes from sleep with weird screens that turn on strangely (Vlad Zahorodnii, Plasma 6.1.0. Link)

KWin should no longer crash when apps do weird things and for some reason two drag operations end up happening at once (Vlad Zahorodnii, Plasma 6.1.0. Link)

Fixed a way that KWin could crash on X11 when you change Global Themes (Xaver Hugl, Plasma 6.1.0. Link)

Fixed a weird crash in KWin that could happen when you’re pressing keys on the keyboard during the exact moment that XWayland disconnects (David Edmundson, Plasma 6.1.0. Link)

Fixed a cause of Plasma using too much memory and/or freezing due to system notifications containing certain sizes of images (Akseli Lahtinen, Plasma 6.1.0. Link)

Changing a System Monitor widget located on a thick panel to the “Application Table” or “Process Table” display style no longer causes Plasma to freeze (Akseli Lahtinen, Plasma 6.1.0. Link)

Fixed the most common crash in System Settings’ Firewall page which could be experienced by changing settings and then switching pages (Nicolas Fella, Plasma 6.1.0. Link)

Fixed an extremely esoteric Plasma crash that was caused by repeatedly wiggling the pointer over certain menu items of the context menu of the sub-folder popup of a folder located on the desktop (Akseli Lahtinen, Plasma 6.1.0. Link)

When you try to add a sensor to System Monitor or one of its so-named widgets that you’ve already added, you’re now warned and prevented from doing so, instead of being allowed to continue, which would silently break the display of all sensors (Arjen Hiemstra, Plasma 6.1.0. Link)

Connected devices that are nice enough to report battery information to Plasma no longer sometimes randomly get disconnected for no good reason (Ivan Tkachenko, Plasma 6.1.0. Link)

Some monitors that do unusual and exotic things under the hood no longer get their resolutions set to 640×480 after resuming from sleep (Xaver Hugl, Plasma 6.1.0. Link)

The “Allow restore on future sessions” checkbox on the screen sharing dialog once again works (Nicolas Fella, Plasma 6.1.0. Link)

Titlebar context menus of non-maximized Aurorae-decorated windows are no longer misplaced on Wayland (Vlad Zahorodnii, Plasma 6.1.0. Link)

Kicker’s panel button is no longer too large with a thin panel, which was a regression caused by me trying to keep it from getting too large with thick panels! (Niccolò Venerandi, Plasma 6.1.0. Link)

Connecting a new screen that’s larger than any of the current screens no longer causes it to get a small constrained version of the default wallpaper; now it gets the correct size (Marco Martin, Plasma 6.1.0. Link)

When you’ve got a floating Plasma panel on a screen edge that’s shared with another screen, it no longer pushes an odd blurry rectangle onto that adjacent screen when it de-floats (Niccolò Venerandi, Plasma 6.1.0. Link)

Using the “Application” or “Window” layout switching mode no longer causes Plasma’s System Tray popup to close immediately when you try to open it while the widget is located on a panel on the bottom or right screen edge (Marco Martin, Plasma 6.1.0. Link)

Shutting down, restarting, etc. from within KWin’s Overview effect no longer causes it to freeze for 45 seconds before completing the operation (David Edmundson, Plasma 6.2.0. Link)

Fixed a bug that caused items on the Plasma desktop to disappear until Plasma was restarted after their icons changed, either because you changed them manually, or they got automatically changed by something else automatically—for example by putting items into the trash (Akseli Lahtinen, Frameworks 6.4. Link)

Worked around a Qt regression that prevented toolbar buttons in QtQuick apps from showing that they have keyboard focus (Aleix Pol Gonzales, Frameworks 6.4. Link)

Other bug information of note:

5 Very high priority Plasma bugs (up from 3 two weeks ago). Current list of bugs
33 15-minute Plasma bugs (down from 36 two weeks ago). Current list of bugs
294 KDE bugs of all kinds fixed over the last two weeks. Full list of bugs

Performance & Technical

Improved the performance of basically everything for cases where the ~/.cache folder is located on a slow disk, such as a hard drive (David Edmundson, Plasma 6.1.0. Link)

Plasma and KWin no longer rocket up in CPU usage for no good reason when using the using “Keep the selection and clipboard the same” and “Text selection: Always save in history” Clipboard widget settings (David Edmundson, Plasma 6.1.0. Link)

Re-enabled hardware-accelerated cursor support for Intel GPUs, since the bugs that were causing us to disable it have since been resolved. This should make cursors more responsive and not increase CPU usage when moving it (Vlad Zahorodnii, Plasma 6.2.0. Link)

Implemented Breeze styling for movable splitters in QML software, so we can now start porting things to use it. This will eventually result in, for example, resizable sidebars in places where you’d expect for sidebars to be resizable—imagine that! (Ivan Tkachenko, Frameworks 6.4. Link)

Automation & Systematization

Turned a manual test for the Plasma System Tray into an automatic one, so now it actually gets run regularly (Fushan Wen, link)

Added a few autotests for the Plasma Clipboard widget to verify recent fixes (Fushan Wen, link)

…And Everything Else

This blog only covers the tip of the iceberg! If you’re hungry for more, check out https://planet.kde.org, where you can find more news from other KDE contributors.

How You Can Help

The KDE organization has become important in the world, and your time and labor have helped to bring it there! But as we grow, it’s going to be equally important that this stream of labor be made sustainable, which primarily means paying for it. Right now the vast majority of KDE runs on labor not paid for by KDE e.V. (the nonprofit foundation behind KDE, of which I am a board member), and that’s a problem. We’ve taken steps to change this with paid technical contractors — but those steps are small due to growing but still limited financial resources. If you’d like to help change that, consider donating today!

Otherwise, visit https://community.kde.org/Get_Involved to discover other ways to be part of a project that really matters. Each contributor makes a huge difference in KDE; you are not a number or a cog in a machine! You don’t have to already be a programmer, either. I wasn’t when I got started. Try it, you’ll like it! We don’t bite!

Categories: FLOSS Project Planets

GNUnet News: GNUnet 0.21.2

GNU Planet! - Fri, 2024-06-07 18:00

GNUnet 0.21.2

This is a bugfix release for gnunet 0.21.1. It primarily addresses some connectivity issues introduced with our new transport subsystem.

Links

Source: https://ftpmirror.gnu.org/gnunet/gnunet-0.21.2.tar.gz ( https://ftpmirror.gnu.org/gnunet/gnunet-0.21.2.tar.gz.sig )
Source (meson): https://buildbot.gnunet.org/gnunet-0.21.2-meson.tar.gz ( https://buildbot.gnunet.org/gnunet-0.21.2-meson.tar.gz.sig )
Detailed list of changes: https://git.gnunet.org/gnunet.git/log/?h=v0.21.2
NEWS: https://git.gnunet.org/gnunet.git/tree/NEWS?h=v0.21.2
The list of closed issues in the bug tracker: https://bugs.gnunet.org/changelog_page.php?version_id=440

The GPG key used to sign is: 3D11063C10F98D14BD24D1470B0998EF86F59B6A

Note that due to mirror synchronization, not all links may be functional early after the release. For direct access try https://ftp.gnu.org/gnu/gnunet/

Categories: FLOSS Project Planets

Anwesha Das: Event Driven Ansible, what, why and how?

Planet Python - Fri, 2024-06-07 14:02

Ansible Playbooks is the known term, now there is a new term which is being floted in the project, which is Ansible Rulebooks. Today we are going to discuss about Ansible&aposs journey from Playbook to Rulebook rather Playbook with Rulebook.

What is Event Driven Ansible?

What is Event Driven Ansible? In simple terms, some action is triggered by some events. The idea of EDA comes from Event driven architecture. Event driven ansible runs code automatically based on received event notifications.

Some important terms:

What is event in Event Driven Ansible?

The event is the notification of a certain incident.

Where do we get the events from?

We get the events from event sources. Ansible EDA provides different pulgins to support various event sources. There are several event source plugins such as :
url_check (checking the http status code), webhook (providing and checking events from webhook), journald (monitoring the journald logs) and the list goes on.

When to take actions?

Rulebook defines conditions and actions in case of fulfilling those actions. Conditions use operators as strings, boolean and numerical data. And actions are occurrence of events once the conditions are met. Running a playbook, setting a fact, running a module etc.

Small example Project

Here is a small example of Event Driven Ansible and how it is run. The idea is on receiving of a message (here the number 42) a playbook will run in the host. There are the following 3 files :

demo_rule.yml --- - name: Listen for events on a webhook hosts: all sources: - ansible.eda.webhook: host: 0.0.0.0 port: 8000 rules: - name: Say thank you condition: event.payload.message == "42" action: run_playbook: name: demo.yml

This is the rulebook. We are using the webhook plugin here as the event source. As a rule in the event of receiving the message 42 as json payload in the webhook, we run the playbook called demo.yml

demo.yml - hosts: localhost connection: local tasks: - debug: msg: "Thank you for the answer."

demo.yml, the playbook which run on the occurrence of the event mentioned in the rulebook and prints a debug message.

--- local: hosts: localhost

inventory.yml mentions the hosts to run the action against.

Further there are 2 files to one to test 42.json and 43.json to test the code.

{ "message" : "42" } { "message" : "43" }

First we have to install all related dependencies before we can run the rulebook.

$ python -m venv .venv $ source .venv/bin/activate $ python -m pip install ansible ansible-rulebook ansible-runner psycopg $ ansible-galaxy collection install ansible.eda $ ansible-rulebook --rulebook demo_rule.yml -i inventory.yml --verbose

Go to another terminal and on the same directory path and run the following command to test the Rulebook. After receiving the message, the playbook runs.

curl -X POST -H "Content-Type: application/json" -d @42.json 127.0.0.1:8000/endpoint Output 2024-06-07 16:48:53,868 - ansible_rulebook.app - INFO - Starting sources 2024-06-07 16:48:53,868 - ansible_rulebook.app - INFO - Starting rules ... TASK [debug] ******************************************************************* ok: [localhost] => { "msg": "Thank you for the answer." } PLAY RECAP ********************************************************************* localhost : ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0 2024-06-07 16:50:08,224 - ansible_rulebook.action.runner - INFO - Ansible runner Queue task cancelled 2024-06-07 16:50:08,225 - ansible_rulebook.action.run_playbook - INFO - Ansible runner rc: 0, status: successful

Now if we run the other json file 43.json we see that the playbook does not run even after the http status code being 200.

curl -X POST -H "Content-Type: application/json" -d @43.json 127.0.0.1:8000/endpoint

Output :

2024-06-07 18:20:37,633 - aiohttp.access - INFO - 127.0.0.1 [07/Jun/2024:17:20:37 +0100] "POST /endpoint HTTP/1.1" 200 159 "-" "curl/8.2.1"

You can try this yourself follwoing this git repository.

Categories: FLOSS Project Planets

The Drop Times: PHPCamp 2024 in Pune: A Premier Event for PHP Developers

Planet Drupal - Fri, 2024-06-07 08:07

PHPCamp 2024 in Pune offers an immersive experience where PHP developers of all levels can enhance their skills, expand their knowledge, and connect with a vibrant community. Attendees will engage in hands-on workshops, explore the latest trends, and network with industry leaders in a collaborative and inclusive environment. This event is a unique opportunity for developers to be part of a groundbreaking gathering that fosters innovation and professional growth.

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #207: Decomposing Software Problems & Avoiding the Trap of Clever Code

Planet Python - Fri, 2024-06-07 08:00

How do you effectively break a software problem into individual steps? What are signs you're writing overly clever code? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Search form

Tag cloud

Feeds

Debian Brasil: Debian Day Brasil - chamada de organizadores(as)

Ed Crewe: Software development with Generative AI

Jeremy Epstein: Introducing: Floyd-Warshall CSV Generator

Pythonicity: GraphQL cursors

Gaël Varoquaux: Promoting open-source, from inria to :probabl.

Trey Hunner: A beautiful Python monstrosity

A Selenium Primer - Part 1: An Introduction to Selenium

A Selenium Primer - Part 4: Writing Selenium Tests

A Selenium Primer - Part 3: Identifying Accessibility Issues

A Selenium Primer - Part 2: Setting up Selenium

Kate Fun Logo

Thorsten Alteholz: My Debian Activities in May 2024

Reproducible Builds: Reproducible Builds in May 2024

Talk Python to Me: #465: The AI Revolution Won't Be Monopolized

GPN22

These past two weeks in KDE: massive stability work for Plasma 6.1

GNUnet News: GNUnet 0.21.2

Anwesha Das: Event Driven Ansible, what, why and how?

The Drop Times: PHPCamp 2024 in Pune: A Premier Event for PHP Developers

Real Python: The Real Python Podcast – Episode #207: Decomposing Software Problems & Avoiding the Trap of Clever Code

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Feeds

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research