Feeds

Python Morsels: Using "else" in a comprehension

Planet Python - Sat, 2024-07-20 12:53

While list comprehesions in Python don't support the else keyword directly, conditional expressions can be embedded within list comprehension.

Table of contents

Do list comprehensions support else?

This list comprehension isn't valid:

>>> counts = [2, -1, 4, 7, -3, 6] >>> sanitized_counts = [n for n in counts if n > 0 else 0] File "<stdin>", line 1 sanitized_counts = [n for n in counts if n > 0 else 0] ^^^^ SyntaxError: invalid syntax

Python's list comprehensions don't have any support for an else keyword.

But else does work in comprehensions...

But wait! I've seen else …

Read the full article: https://www.pythonmorsels.com/comprehension-with-else/

Categories: FLOSS Project Planets

GSOC: Accident Week!

Planet KDE - Sat, 2024-07-20 07:34

Yes, that’s right. The title just goes perfectly, with these long weeks! From Week 3 to Week 7! So, first a small back story on the title. On 25th June, my semester exam ended. I was returning to my hometown, and on my way, I got into a bike accident. My right hand got bruised onto the road. Luckily, it didn’t break. But, the sheer pain was enough to make me cry.

Categories: FLOSS Project Planets

This past two weeks in KDE: fixing sticky keys and the worst crashes

Planet KDE - Sat, 2024-07-20 01:07

These past two weeks were big for Wayland accessibility support, as Nicolas Fella did a lot of work to improve support for sticky keys to equal the state they were in on X11. This work is not complete, but so far it’s taken a big bite out of the open issues! The work lands in a mixture of Plasma 6.1.3 and 6.2.0 (link 1, link 2, link 3, link 4, link 5, link 6).

Beyond this, it’s notable that Plasma developers fixed the five most common Plasma crashes, as well as a bunch of less common ones, which you’ll see mentioned below. These were caused by a mix of Qt regressions and our own code defects. The new automatic crash reporting system has been a huge boon here, allowing us to see which crashes are actually affecting people most. So please do continue to report them!

And of course there’s lots more too. Check it out:

New Features

Elisa now offers a feature to add a track, album, etc. directly to the playlist after current song, playing it next (Jack Hill, Elisa 24.08.0. Link):

System Settings’ Drawing Tablet page now has a calibration tool (Joshua Goins, Plasma 6.2.0. Link)

The default width of Icons-and-Text Task Manager items is now user- configurable, and should also exhibit slightly smarter shrinking behavior as space gets filled up (Kisaragi Hiu, Plasma 6.2.0. Link)

Added support for Plasma’s charging threshold feature to OpenBSD (Rafael Sadowski, Plasma 6.2.0. Link)

UI Improvements

Dolphin now features a lovely super premium user experience for installing Filelight if it’s not already installed (Felix Ernst, Dolphin 24.08.0. Link):

https://i.imgur.com/My2yyWy.mp4

Filelight now has a more illuminating and welcoming homepage (me: Nate Graham, Filelight 24.08.0. Link):

Spectacle has now adopted the “inline messages touch the header” paradigm (me: Nate Graham, Spectacle 24.08.0. Link):

System Settings’ Night Light page now accepts creative custom times, no longer internally clamping them to a set of acceptable times (Vlad Zahorodnii, Plasma 6.1.3. Link)

Improved the smoothness of resizing Plasma widgets (Vlad Zahorodnii, Plasma 6.1.4. Link)

Made it impossible to remove administrator privileges from your current user unless there’s at least one other admin user on the system, to ensure that someone is an admin and can reverse the decision if needed! (Thomas Duckworth, Plasma 6.2.0. Link)

The “launch this app” shortcut for apps on System Settings’ Shortcuts page is now named “Launch”, making its purpose clear (me: Nate Graham, Plasma 6.2.0. Link)

Added some relevant clipboard-related keywords to System Settings’ General Behavior page, so you can find it in a search for things like “paste” and “selection” and stuff like that (Christoph Wolk, Plasma 6.2.0. Link)

Plasma’s Digital Clock widget now uses a typographic space to separate the time from the date when in single-row mode, so that it looks better especially when using a monospace font (Kisaragi Hiu, Plasma 6.2.0. Link)

Removed the filter for different job types from Plasma’s Printers widget, because it never worked (seriously) and apparently no one ever noticed because we didn’t even have any bug reports about it! (Mike Noe, Plasma 6.2.0. Link)

Bug Fixes

Filelight no longer fails to initiate a second scan after leaving the first one and going back to the Overview page (Harald Sitter, Filelight 24.08.0. Link)

Fixed one of the most common ways that Plasma could crash randomly, out of the blue (Akseli Lahtinen, Plasma 6.1.3. Link)

Fixed one of the most common ways that Plasma could crash on Wayland when a screen turns off (Marco Martin, Plasma 6.1.3. Link)

Fixed another one of the apparently many ways that Plasma can crash while handling various types of clipboard data, which appeared to be very common (David Edmundson, Plasma 6.1.3. Link)

Fixed a somewhat common way that Powerdevil could crash when waking the system from sleep with certain types of monitors (Jakob Petsovits, Plasma 6.1.3. Link)

Fixed an issue that caused Spectacle to crash after finishing a screen recording on systems using PipeWire 1.2.0 (Arjen Hiemstra, Plasma 6.1.3. Link)

Fixed an issue that caused Plasma on Wayland to crash when dragging a Task Manager task with no .desktop file associated with it (such as a Steam game) onto the desktop (Akseli Lahtinen, Plasma 6.1.3. Link)

Fixed a recent regression that caused System Tray icons for GTK2 apps to stop responding after the first time they’re clicked. Added an autotest to make sure this doesn’t regress again, too (David Edmundson and Fushan Wen, Plasma 6.1.3. Link)

It’s now possible to set your user avatar image to a file those full path contains “special characters” like spaces, ampersands, etc. (Daniil-Viktor Ratkin, Plasma 6.1.3. Link)

It’s now more reliable to change the date or time using System Settings on a distro not using Systemd (Fabio Bas, Plasma 6.1.3. Link)

Plasma’s RDP server now works properly after again a prior failed connection using a non-H.264-capable client app (Arjen Hiemstra, Plasma 6.1.3. Link)

The Fcitx input method’s “show input method info when switching input focus” setting is now compatible with Plasma’s zoom-out style edit mode, and no longer causes it to exit immediately when the input method popup is shown (Weng Xuetian, Plasma 6.1.3. Link)

Fixed a different somewhat common out-of-the-blue Plasma crash (Méven Car, Plasma 6.1.4. Link)

Fixed a case where KWin could crash on X11 when compositing gets toggled on or off (Vlad Zahorodnii, Plasma 6.1.4. Link)

Fixed a regression in Discover that caused various alerts and information items in certain apps’ description pages to not appear as expected (Aleix Pol Gonzalez, Plasma 6.1.4. Link)

Fixed a regression in Plasma 6.0 that caused the “remove this item” hover icons in KRunner’s history view to be invisible (Ivan Tkachenko, Plasma 6.1.4. Link)

OpenVPN VPNs requiring a challenge-response work again with NetworkManager 1.64 or later (Benjamin Robin, Plasma 6.2.0. Link)

“Text Only” System Monitor sensors on horizontal Plasma panels are once again correctly-sized with centered text (Arjen Hiemstra, Plasma 6.2.0. Link)

“Line Chart” System Monitor sensors now show their legends as expected when placed on a wide vertical Plasma panel (Arjen Hiemstra, Plasma 6.2.0. Link)

When using Plasma’s “Raise maximum volume” setting, it now applies to the per-app volume sliders in the System Tray popup as well as the device volume sliders (Roberto Chamorro, Plasma 6.2.0. Link)

Using KWin’s wide color gamut or ICC profile features no longer increases the transparency of already-transparent windows (Xaver Hugl, Plasma 6.2.0. Link)

Fixed the source of bizarre view corruption in Dolphin introduced in the recently-released Frameworks 6.4 (Vlad Zahorodnii, Frameworks 6.4.1. Link)

Fixed a case where Plasma could crash when trying to re-assign a shortcut for a widget to one already used by something else (Arjen Hiemstra, Frameworks 6.5. Link)

On the “Get New [thing]” dialogs, downloading one file from an entry that includes many now works from the details page (Akseli Lahtinen, Frameworks 6.5. Link)

Fixed another fairly common way that Plasma could crash on Wayland when a screen turns off (David Edmundson, Qt 6.7.3. Link)

Fixed yet another common way that Plasma could crash on Wayland, this time when showing notifications (David Edmundson, Qt 6.7.3. Link)

Fixed a case where Plasma could crash when you trigger the Meta+V keyboard shortcut to open the clipboard history menu over and over again in rapid succession (Vlad Zahorodnii, Qt 6.7.3. Link)

Fixed a case where trying to save a file in a Flatpak or Snap app using the standard Save dialog could fail and cause the saving app to quit instead! (Nicolas Fella, Qt 6.7.3. Link)

Other bug information of note:

2 Very high priority Plasma bugs (down from 3 two weeks ago). Current list of bugs
31 15-minute Plasma bugs (same as two weeks ago). Current list of bugs
185 KDE bugs of all kinds fixed over the last two weeks. Full list of bugs

Performance & Technical

Receiving a Plasma notification no longer blocks KWin’s “direct scan-out” feature, e.g. while playing a game, so it should no longer briefly reduce performance (Xaver Hugl, Plasma 6.1.3. Link)

Improved KWin’s detection for whether triple buffering on Wayland will improve things so that it won’t occasionally turn on and off repeatedly, impairing performance (Xaver Hugl, Plasma 6.1.3. Link)

Plasma’s RDP server is now capable of listening for IPv6 connections (Arjen Hiemstra, Plasma 6.1.3. Link)

KWin now disables 10 bits-per-color (BPC) support for monitors plugged into a dock, as these often limit the signal to 8 BPC but don’t tell KWin, causing issues when KWin tries to enable 10 BPC mode because it thinks it should be possible (Xaver Hugl, Plasma 6.1.4. Link)

KWin now operates with “realtime” capabilities on systemd using musl instead of glibc (Vlad Zahorodnii, Plasma 6.1.4. Link)

Plasma’s RDP server now also works as expected on systems using the OpenH264 video codec (Fabian Vogt, Plasma 6.2.0. Link)

Relevant only for cutting-edge distro-builders: It’s now possible to compile KWin with support for only the Wayland session, so support for X11 apps would be provided exclusively through XWayland (Neal Gompa, Plasma 6.2.0. Link)

Improved performance for everything in KDE that uses KFileItem::isHidden (Volker Krause, Frameworks 6.5. Link)

Created a new WindowStateSaver QML object you can add to apps’ windows to make them remember their size, maximization state (and position, on X11) (Joshua Goins, Frameworks 6.5. Link)

Apps storing their transient state data separately from their persistent configuration data now do so by putting the state config file in the standard XDG state folder of ~/.local/state/ (Harald Sitter, Frameworks 6.5. Link)

Automation & Systematization

Added an autotest to test clearing the clipboard history (Fushan Wen, link)

…And Everything Else

This blog only covers the tip of the iceberg! If you’re hungry for more, check out https://planet.kde.org, where you can find more news from other KDE contributors.

How You Can Help

If you use have multiple systems or an adventurous personality, you can really help us out by installing beta versions of Plasma using your distro’s available repos and reporting bugs. Arch, Fedora, and openSUSE Tumbleweed are examples of great distros for this purpose. So please please do try out Plasma beta versions. It truly does help us! Heck, if you’re very adventurous, live on the nightly repos. I’ve been doing this full-time for 5 years with my sole computer and it’s surprisingly stable.

Does that sound too scary? Consider donating today instead! That helps too.

Otherwise, visit https://community.kde.org/Get_Involved to discover other ways to be part of a project that really matters. Each contributor makes a huge difference in KDE; you are not a number or a cog in a machine! You don’t have to already be a programmer, either. I wasn’t when I got started. Try it, you’ll like it! We don’t bite!

Categories: FLOSS Project Planets

KDE signs petition urging European Union to continue funding free software

Planet KDE - Fri, 2024-07-19 20:00

The European Union must keep funding free software

Initially publishead by petites singularités. English translation provided by OW2.

If you want to sign this letter, please publish this text on your website and add yourself or your organization to the table you will find on the original site.

Open Letter to the European Commission

Since 2020, Next Generation Internet (NGI) programmes, part of European Commission's Horizon programme, fund free software in Europe using a cascade funding mechanism (see for example NLnet's calls).

Quite a few of KDE's projects have benefited from NGI's funding, including NeoChat, Kaidan, KDE Connect, KMail, and many others. KDE e.V. is a European non-profit with limited resources that relies on donations, sponsors and funding like that offered by NGI, to push the development of our projects forward.

However, this year, according to the Horizon Europe working draft detailing funding programmes for 2025, we notice that Next Generation Internet is not mentioned any more as part of Cluster 4.

NGI programmes have shown their strength and importance to supporting the European software infrastructure, as a generic funding instrument to fund digital commons and ensure their long-term sustainability. We find this transformation incomprehensible, moreover when NGI has proven efficient and economical to support free software as a whole, from the smallest to the most established initiatives. This ecosystem diversity backs the strength of European technological innovation, and maintaining the NGI initiative to provide structural support to software projects at the heart of worldwide innovation is key to enforce the sovereignty of a European infrastructure. Contrary to common perception, technical innovations often originate from European rather than North American programming communities, and are mostly initiated by small-scaled organizations.

Previous Cluster 4 allocated 27 million euros to:

"Human centric Internet aligned with values and principles commonly shared in Europe" ;
"A flourishing internet, based on common building blocks created within NGI, that enables better control of our digital life" ;
"A structured ecosystem of talented contributors driving the creation of new internet commons and the evolution of existing internet commons".

In the name of these challenges, more than 500 projects received NGI funding in the first 5 years, backed by 18 organisations managing these European funding consortia.

NGI contributes to a vast ecosystem, as most of its budget is allocated to fund third parties by the means of open calls, to structure commons that cover the whole Internet scope - from hardware to application, operating systems, digital identities or data traffic supervision. This third-party funding is not renewed in the current program, leaving many projects short on resources for research and innovation in Europe.

Moreover, NGI allows exchanges and collaborations across all the Euro zone countries as well as "widening countries"1, currently both a success and an ongoing progress, likewise the Erasmus programme before us. NGI also contributes to opening and supporting longer relationships than strict project funding does. It encourages implementing projects funded as pilots, backing collaboration, identification and reuse of common elements across projects, interoperability in identification systems and beyond, and setting up development models that mix diverse scales and types of European funding schemes.

While the USA, China or Russia deploy huge public and private resources to develop software and infrastructure that massively capture private consumer data, the EU can't afford this renunciation. Free and open source software, as supported by NGI since 2020, is by design the opposite of potential vectors for foreign interference. It lets us keep our data local and favors a community-wide economy and know-how, while allowing an international collaboration.

This is all the more essential in the current geopolitical context: the challenge of technological sovereignty is central, and free software allows to address it while acting for peace and sovereignty in the digital world as a whole.

1 As defined by Horizon Europe, widening Member States are Bulgaria, Croatia, Cyprus, the Czech Republic, Estonia, Greece, Hungary, Latvia, Lituania, Malta, Poland, Portugal, Romania, Slovakia and Slovenia. Widening associated countries (under condition of an association agreement) include Albania, Armenia, Bosnia, Feroe Islands, Georgia, Kosovo, Moldavia, Montenegro, Morocco, North Macedonia, Serbia, Tunisia, Turkey and Ukraine. Widening overseas regions are : Guadeloupe, French Guyana, Martinique, Reunion Island, Mayotte, Saint-Martin, The Azores, Madeira, the Canary Islands.

Categories: FLOSS Project Planets

Dirk Eddelbuettel: dtts 0.1.3 on CRAN: More Maintenance

Planet Debian - Fri, 2024-07-19 16:49

Leonardo and I are happy to announce the release of another maintenance release 0.1.3 of our dtts package which has been on CRAN for a good two years now.

dtts builds upon our nanotime package as well as the beloved data.table to bring high-performance and high-resolution indexing at the nanosecond level to data frames. dtts aims to offers the time-series indexing versatility of xts (and zoo) to the immense power of data.table while supporting highest nanosecond resolution.

This release contains two nice and focussed contributed pull requests. Tomas Kalibera, who as part of R Core looks after everything concerning R on Windows, and then some, needed an adjustment for pending / upcoming R on Windows changes for builds with LLVM which is what Arm-on-Windows uses. We happily obliged: neither Leonardo nor I see much of Windows these decades. (Easy thing to say on a day like today with its crowdstrike hammer falling!) Similarly, Michael Chirico supplied a PR updating one of our tests to an upcoming change at data.table which we are of course happy to support.

The short list of changes follows.

Changes in version 0.1.3 (2024-07-18)

Windows builds use localtime_s with LLVM (Tomas Kalibera in #16)
Tests code has been adjusted for an upstream change in data.table tests for all.equal (Michael Chirico in #18 addressing #17)

Courtesy of my CRANberries, there is also a report with diffstat for this release. Questions, comments, issue tickets can be brought to the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: FLOSS Project Planets

PyPy: Mining JIT traces for missing optimizations with Z3

Planet Python - Fri, 2024-07-19 13:01

In my last post I've described how to use Z3 to find simple local peephole optimization patterns for the integer operations in PyPy's JIT. An example is int_and(x, 0) -> 0. In this post I want to scale up the problem of identifying possible optimizations to much bigger instruction sequences, also using Z3. For that, I am starting with the JIT traces of real benchmarks, after they have been optimized by the optimizer of PyPy's JIT. Then we can ask Z3 to find inefficient integer operations in those traces.

Starting from the optimized traces of real programs has some big advantages over the "classical" superoptimization approach of generating and then trying all possible sequences of instructions. It avoids the combinatorical explosion that happens with the latter approach. Also, starting from the traces of benchmarks or (even better) actual programs makes sure that we actually care about the missing optimizations that are found in this way. And because the traces are analyzed after they have been optimized by PyPy's optimizer, we only get reports for missing optimizations, that the JIT isn't able to do (yet).

The techniques and experiments I describe in this post are again the result of a bunch of discussions with John Regehr at a conference a few weeks ago, as well as reading his blog posts and papers. Thanks John! Also thanks to Max Bernstein for super helpful feedback on the drafts of this blog post (and for poking me to write things in general).

High-Level Approach

The approach that I took works as follows:

Run benchmarks or other interesting programs and then dump the IR of the JIT traces into a file. The traces have at that point been already optimized by the PyPy JIT's optimizer.
For every trace, ignore all the operations on non-integer variables.
Translate every integer operation into a Z3 formula.
For every operation, use Z3 to find out whether the operation is redundant (how that is done is described below).
If the operation is redundant, the trace is less efficient than it could have been, because the optimizer could also have removed the operation. Report the inefficiency.
Minimize the inefficient programs by removing as many operations as possible to make the problem easier to understand.

In the post I will describe the details and show some pseudocode of the approach. I'll also make the proper code public eventually (but it needs a healthy dose of cleanups first).

Dumping PyPy Traces

PyPy will write its JIT traces into the file out if the environment variable PYPYLOG is set as follows:

PYPYLOG=jit-log-opt:out pypy <program.py>

This environment variable works for PyPy, but also for other virtual machines built with RPython.

(This is really a side point for the rest of the blog post, but since the question came up I wanted to clarify it: Operations on integers in the Python program that the JIT is running don't all correspond 1-to-1 with the int_... operations in the traces. The int_... trace operations always operate on machine words. The Python int type supports arbitrarily large integers. PyPy will optimistically try to lower the operations on Python integers into machine word operations, but adds the necessary guards into the trace to make sure that overflow outside of the range of machine words is caught. In case one of these guards fails the interpreter switches to a big integer heap-allocated representation.)

Encoding Traces as Z3 formulas

The last blog post already contained the code to encode the results of individual trace operations into Z3 formulas, so we don't need to repeat that here. To encode traces of operations we introduce a Z3 variable for every operation in the trace and then call the z3_expression function for every single one of the operations in the trace.

For example, for the following trace:

[i1] i2 = uint_rshift(i1, 32) i3 = int_and(i2, 65535) i4 = uint_rshift(i1, 48) i5 = int_lshift(i4, 16) i6 = int_or(i5, i3) jump(i6, i2) # equal

We would get the Z3 formula:

z3.And(i2 == LShR(i1, 32), i3 == i2 & 65535, i4 == LShR(i1, 48), i5 == i4 << 16)

Usually we won't ask for the formula of the whole trace at once. Instead we go through the trace operation by operation and try to find inefficiencies in the current one we are looking at. Roughly like this (pseudo-)code:

def newvar(name): return z3.BitVec(name, INTEGER_WIDTH) def find_inefficiencies(trace): solver = z3.Solver() var_to_z3var = {} for input_argument in trace.inputargs: var_to_z3var[input_argument] = newz3var(input_argument) for op in trace: var_to_z3var[op] = z3resultvar = newz3var(op.resultvarname) arg0 = op.args[0] z3arg0 = var_to_z3var[arg0] if len(op.args) == 2: arg1 = op.args[1] z3arg1 = var_to_z3var[arg1] else: z3arg1 = None res, valid_if = z3_expression(op.name, z3arg0, z3arg1) # checking for inefficiencies, see the next sections ... if ...: return "inefficient", op # not inefficient, assert op into the solver and continue with the next op solver.add(z3resultvar == res) return None # no inefficiency found Identifying constant booleans with Z3

To get started finding inefficiencies in a trace, we can first focus on boolean variables. For every operation in the trace that returns a bool we can ask Z3 to prove that this variable must be always True or always False. Most of the time, neither of these proofs will succeed. But if Z3 manages to prove one of them, we know have found an ineffiency: instead of computing the boolean result (eg by executing a comparison) the JIT's optimizer could have replaced the operation with the corresponding boolean constant.

Here's an example of an inefficiency found that way: if x < y and y < z are both true, PyPy's JIT could conclude that x < z must also be true. However, currently the JIT cannot make that conclusion because it only reasons about the concrete ranges (lower and upper bounds) for every integer variable, but it has no way to remember anything about relationships between different variables. This kind of reasoning would quite often be useful to remove list/string bounds checks. Here's a talk about how LLVM does this (but it might be too heavyweight for a JIT setting).

Here are some more examples found that way:

x - 1 == x is always False
x - (x == -1) == -1 is always False. The pattern x - (x == -1) happens a lot in PyPy's hash computations: To be compatible with the CPython hashes we need to make sure that no object's hash is -1 (CPython uses -1 as an error value on the C level).

Here's pseudo-code for how to implement checking boolean operations for inefficiencies:

def find_inefficiencies(trace): ... for op in trace: ... res, valid_if = z3_expression(op.name, z3arg0, z3arg1) # check for boolean constant result if op.has_boolean_result(): if prove(solver, res == 0): return "inefficient", op, 0 if prove(solver, res == 1): return "inefficient", op, 1 # checking for other inefficiencies, see the next sections ... # not inefficient, add op to the solver and continue with the next op solver.add(z3resultvar == res) return None # no inefficiency found Identifying redundant operations

A more interesting class of redundancy is to try to find two operations in a trace that compute the same result. We can do that by asking Z3 to prove for each pair of different operations in the trace to prove that the result is always the same. If a previous operation returns the same result, the JIT could have re-used that result instead of re-computing it, saving time. Doing this search for equivalent operations with Z3 is quadratic in the number of operations, but since traces have a maximum length it is not too bad in practice.

This is the real workhorse of my script so far, it's what finds most of the inefficiencies. Here's a few examples:

The very first and super useful example the script found is int_eq(b, 1) == b if b is known to be a boolean (ie and integer 0 or 1). I have already implemented this optimization in the JIT.
Similarly, int_and(b, 1) == b for booleans.
(x << 4) & -0xf == x << 4
((x >> 63) << 1) << 2) >> 3 == x >> 63. In general the JIT is quite bad at optimizing repeated shifts (the infrastructure for doing better with that is already in place, so this will be a relatively easy fix).
(x & 0xffffffff) | ((x >> 32) << 32) == x. Having the JIT optimize this would maybe require first recognizing that (x >> 32) << 32 can be expressed as a mask: (x & 0xffffffff00000000), and then using (x & c1) | (x & c1) == x & (c1 | c2)
A commonly occurring pattern is variations of this one: ((x & 1345) ^ 2048) - 2048 == x & 1345 (with different constants, of course). xor is add without carry, and x & 1345 does not have the bit 2048 set. Therefore the ^ 2048 is equivalent to + 2048, which the - 2048 cancels. More generally, if a & b == 0, then a + b == a | b == a ^ b. I don't understand at all why this appears so often in the traces, but I see variations of it a lot. LLVM can optimize this, but GCC can't, thanks to Andrew Pinski for filing the bug!

And here's some implementation pseudo-code again:

def find_inefficiencies(trace): ... for op in trace: ... res, valid_if = z3_expression(op.name, z3arg0, z3arg1) # check for boolean constant result ... # searching for redundant operations for previous_op in trace: if previous_op is op: break # done, reached the current op previous_op_z3var = var_to_z3var[previous_op] if prove(solver, previous_op_z3var == res): return "inefficient", op, previous_op ... # more code here later ... # not inefficient, add op to the solver and continue with the next op solver.add(z3resultvar == res) return None # no inefficiency found Synthesizing more complicated constants with exists-forall

To find out whether some integer operations always return a constant result, we can't simply use the same trick as for those operations that return boolean results, because enumerating 2⁶⁴ possible constants and checking them all would take too long. Like in the last post, we can use z3.ForAll to find out whether Z3 can synthesize a constant for the result of an operation for us. If such a constant exists, the JIT could have removed the operation, and replaced it with the constant that Z3 provides.

Here a few examples of inefficiencies found this way:

(x ^ 1) ^ x == 1 (or, more generally: (x ^ y) ^ x == y)
if x | y == 0, it follows that x == 0 and y == 0
if x != MAXINT, then x + 1 > x

Implementing this is actually slightly annoying. The solver.add calls for non-inefficient ops add assertions to the solver, which are now confusing the z3.ForAll query. We could remove all assertion from the solver, then do the ForAll query, then add the assertions back. What I ended doing instead was instantiating a second solver object that I'm using for the ForAll queries, that remains empty the whole time.

def find_inefficiencies(trace): solver = z3.Solver() empty_solver = z3.Solver() var_to_z3var = {} ... for op in trace: ... res, valid_if = z3_expression(op.name, z3arg0, z3arg1) # check for boolean constant result ... # searching for redundant operations ... # checking for constant results constvar = z3.BitVec('find_const', INTEGER_WIDTH) condition = z3.ForAll( var_to_z3var.values(), z3.Implies( *solver.assertions(), expr == constvar ) ) if empty_solver.check(condition) == z3.sat: model = empty_solver.model() const = model[constvar].as_signed_long() return "inefficient", op, const # not inefficient, add op to the solver and continue with the next op solver.add(z3resultvar == res) return None # no inefficiency found Minimization

Analyzing an inefficiency by hand in the context of a larger trace is quite tedious. Therefore I've implemented a (super inefficient) script to try to make the examples smaller. Here's how that works:

First throw out all the operations that occur after the inefficient operation in the trace.
Then we remove all "dead" operations, ie operations that don't have their results used (all the operations that we can analyze with Z3 are without side effects).
Now we try to remove every guard in the trace one by one and check afterwards, whether the resulting trace still has an inefficiency.
We also try to replace every single operation with a new argument to the trace, to see whether the inefficiency is still present.

The minimization process is sort of inefficient and I should probably be using shrinkray or C-Reduce instead. However, it seems to work well in practice and the runtime isn't too bad.

Results

So far I am using the JIT traces of three programs: 1) Booting Linux on the Pydrofoil RISC-V emulator, 2) booting Linux on the Pydrofoil ARM emulator, and 3) running the PyPy bootstrap process on top of PyPy.

I picked these programs because most Python programs don't contain interesting amounts of integer operations, and the traces of the emulators contain a lot of them. I also used the bootstrap process because I still wanted to try a big Python program and personally care about the runtime of this program a lot.

The script identifies 94 inefficiencies in the traces, a lot of them come from repeating patterns. My next steps will be to manually inspect them all, categorize them, and implement easy optimizations identified that way. I also want a way to sort the examples by execution count in the benchmarks, to get a feeling for which of them are most important.

I didn't investigate the full set of Python benchmarks that PyPy uses yet, because I don't expect them to contain interesting amounts of integer operations, but maybe I am wrong about that? Will have to try eventually.

Conclusion

This was again much easier to do than I would have expected! Given that I had the translation of trace ops to Z3 already in place, it was a matter of about a day's of programming to use this infrastructure to find the first problems and minimizing them.

Reusing the results of existing operations or replacing operations by constants can be seen as "zero-instruction superoptimization". I'll probably be rather busy for a while to add the missing optimizations identified by my simple script. But later extensions to actually synthesize one or several operations in the attempt to optimize the traces more and find more opportunities should be possible.

Finding inefficiencies in traces with Z3 is significantly less annoying and also less error-prone than just manually inspecting traces and trying to spot optimization opportunities.

Random Notes and Sources

Again, John's blog posts:

and papers:

I remembered recently that I had seen the approach of optimizing the traces of a tracing JIT with Z3 a long time ago, as part of the (now long dead, I think) SPUR project. There's a workshop paper from 2010 about this. SPUR was trying to use Z3 built into the actual JIT (as opposed to using Z3 only to find places where the regular optimizers could be improved). In addition to bitvectors, SPUR also used the Z3 support for arrays to model the C# heap and remove redundant stores. This is still another future extension for all the Z3 work I've been doing in the context of the PyPy JIT.

Categories: FLOSS Project Planets

mark.ie: My Drupal Core Contributions for week-ending July 19th, 2024

Planet Drupal - Fri, 2024-07-19 11:24

Here's what I've been working on for my Drupal contributions this week. Thanks to Code Enigma for sponsoring the time to work on these.

Categories: FLOSS Project Planets

Web Review, Week 2024-29

Planet KDE - Fri, 2024-07-19 10:11

Let’s go for my web review for the week 2024-29.

The graying open source community needs fresh blood • The Register

Tags: tech, foss, community

This is indeed a problem. Somehow it became much harder to attract younger developers.

https://www.theregister.com/2024/07/15/opinion_open_source_attract_devs/

“Privacy-Preserving” Attribution: Mozilla Disappoints Us Yet Again

Tags: tech, browser, mozilla, privacy

You’d expect Mozilla to know better. This is disappointing, they’re no living up to their responsibility.

https://blog.privacyguides.org/2024/07/14/mozilla-disappoints-us-yet-again-2/

Commission sends preliminary findings to X for breach of DSA

Tags: tech, twitter, social-media, law

The European Commission starts showing it’s muscles. Twitter is an obvious one to pursue since it became the X cesspool.

https://ec.europa.eu/commission/presscorner/detail/en/IP_24_3761

Goldman Sachs: AI Is Overhyped, Wildly Expensive, and Unreliable

Tags: tech, ai, machine-learning, gpt, economics, ecology, criticism

I’m rarely on the side of a Goldman Sachs… Still this paper seems to be spot on. The equation between the costs (financial and ecological) and the value we get out of generative AI isn’t balanced at all. Also, since it is stuck on trying to improve mostly on model scale and amount of data it is doomed to plateau in its current form.

https://www.404media.co/goldman-sachs-ai-is-overhyped-wildly-expensive-and-unreliable/

Facebook Is the ‘Zombie Internet’

Tags: tech, ai, social-media, facebook

Or examples of the collapse of a shared reality. This has nothing to do with “social” media anymore. Very nice investigation in any case.

https://www.404media.co/email/24eb6cea-6fa6-4b98-a2d2-8c4ba33d6c04/

AT&T says criminals stole phone records of ‘nearly all’ customers in new data breach

Tags: tech, security, leak

Wow! This is a really bad data breach. Apparently related to the recent data theft on the Snowflake end.

https://techcrunch.com/2024/07/12/att-phone-records-stolen-data-breach/

git-pr: A new git collaboration service

Tags: tech, git, codereview, tools

Interesting approach to building a new code review system. I somehow doubt it’ll get traction unfortunately but it has nice ideas baked in.

https://pr.pico.sh/

gpu.cpp: portable GPU compute for C++ with WebGPU – Answer.AI

Tags: tech, c++, gpu, computation

Looks like an interesting library to build portable GPU compute workloads. Cleverly tries to leverage WebGPU.

https://www.answer.ai/posts/2024-07-11–gpu-cpp.html

C++ Design Patterns For Low-Latency Applications

Tags: tech, c++, performance, optimization, pattern

A paper listing patterns to reduce latency as much as possible. There are lesser known tricks in there.

https://hackaday.com/2024/07/13/c-design-patterns-for-low-latency-applications/

22 Common Filesystem Tasks in C++20

Tags: tech, c++

Nice little reference of what can be done with std::filesystem.

https://www.cppstories.com/2024/common-filesystem-cpp20/

C++ Must Become Safer

Tags: tech, c++, safety, memory

Definitely this. C++ isn’t going away anytime soon. Rewrites won’t be worth it in important cases, so improving the safety of the language matters.

https://www.alilleybrinker.com/blog/cpp-must-become-safer/

Django Migration Operations aka how to rename Models

Tags: tech, django, databases

Django doesn’t always generate the migration you’d expect. Read them before going to production. Also it’s fine to adjust them.

https://micro.webology.dev/2024/07/15/django-migration-operations.html

Gotchas with SQLite in Production

Tags: tech, backend, sqlite, databases

Where are the limitations of using SQLite in production for web applications? Here is a good list.

https://blog.pecar.me/sqlite-prod

SQLite Transactions

Tags: tech, databases, sqlite

Some improvements coming in SQLite transactions. Here are some early tests.

https://reorchestrate.com/posts/sqlite-transactions/

Tests you love to read, write and change

Tags: tech, tests

Three good advices on writing automated tests. This is necessary but not sufficient though.

https://jaywhy13.hashnode.dev/tests-you-love-to-read-write-and-change

Lessons learned in 35 years of making software – Jim Grey

Tags: tech, career

Quite a few good lessons in there. Again it’s more about social skills than technical skills.

https://dev.jimgrey.net/2024/07/03/lessons-learned-in-35-years-of-making-software/

Story Points are Pointless, Measure Queues

Tags: tech, project-management, product-management, estimates, agile

A bit long and a couple of mistakes when pointing out the flaws of story points. Still, it’s definitely a worthwhile read. Quite a lot of the criticism of story points is warranted and the proposed approach based on queue theory is welcome. This is stuff you can find in Kanban like approaches and mature XP.

https://www.brightball.com/articles/story-points-are-pointless-measure-queues

Managing Underperformers | Jack Danger

Tags: management

Nice advices to deal with underperforming teams or individuals. Making the distinction between refusal to align or failure to execute is particularly useful.

https://jackdanger.com/managing-underperformers/

All I Need to Know About Engineering Leadership I Learned From Leave No Trace - Jacob Kaplan-Moss

Tags: tech, leadership, engineering, ecology, funny

Funny experiment at drawing parallels between engineering leadership and how you should behave when hiking in nature. This works surprisingly well.

https://jacobian.org/2024/jul/12/lnt-for-engineering-leadership/

Progress can be slow

Tags: work, life, improving, coaching, habits

This one is more self-help than I’m usually comfortable with… somehow something rung true to me with it. It’s indeed a good reminder that changing habits takes a while. It’s an exercise in patience and there are good reasons for it.

https://jeanhsu.substack.com/p/progress-can-be-slow?isFreemail=true&post_id=146457673

German Navy still uses 8-inch floppy disks, working on emulating a replacement | Ars Technica

Tags: tech, hardware, storage, history

We keep finding floppies in use at surprising places. There’s clearly lot of inertia for technologies getting replaced.

https://arstechnica.com/gadgets/2024/07/german-navy-still-uses-8-inch-floppy-disks-working-on-emulating-a-replacement/

Bye for now!

Categories: FLOSS Project Planets

Bits from Debian: New Debian Developers and Maintainers (May and June 2024)

Planet Debian - Fri, 2024-07-19 10:00

The following contributors got their Debian Developer accounts in the last two months:

Dennis van Dok (dvandok)
Peter Wienemann (wiene)
Quentin Lejard (valde)
Sven Geuer (sge)
Taavi Väänänen (taavi)
Hilmar Preusse (hille42)
Matthias Geiger (werdahias)
Yogeswaran Umasankar (yogu)

The following contributors were added as Debian Maintainers in the last two months:

Bernhard Miklautz
Felix Moessbauer
Maytham Alsudany
Aquila Macedo
David Lamparter
Tim Theisen
Stefano Brivio
Shengqi Chen

Congratulations!

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #213: Constraint Programming & Exploring Python's Built-in Functions

Planet Python - Fri, 2024-07-19 08:00

What are discrete optimization problems? How do you solve them with constraint programming in Python? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Quiz: Python Strings and Character Data

Planet Python - Fri, 2024-07-19 08:00

This quiz will evaluate your understanding of Python’s string data type and test your knowledge about manipulating textual data with string objects. You’ll cover the basics of creating strings using literals and the str() function, applying string methods, using operators and built-in functions with strings, indexing and slicing strings, and more!

Take this quiz after reading our Strings and Character Data in Python tutorial.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Debug Academy: Evaluating Acquia storage limits - "Emergency upsize" notification

Planet Drupal - Fri, 2024-07-19 06:54

Evaluating Acquia storage limits - "Emergency upsize" notification

In addition to best-in-class Drupal training courses (pun intended), we at Debug Academy provide Drupal development services and Drupal 7 migration services. Our clients may self-host, host with Acquia, Pantheon, or even host with us.

Recently, a client who hosts their Drupal 9 website on Acquia reached out to to ask us to investigate an alert they had received from Acquia. Acquia sent an email which said "This email is to inform you that we emergency upsized your storage from 200GB to 300GB FS in Case #[redacted]. The cost to keep this upsize in place [is..]"

ashrafabed Fri, 07/19/2024

Categories: FLOSS Project Planets

PyBites: How to convert a Python script into a web app, a product others can use

Planet Python - Fri, 2024-07-19 06:02

So, you’re a Python developer or you just use Python to make your life easier by writing utility scripts to automate repetitive tasks and boost your productivity.
Such Python scripts are usually local to your machine, run from the command line and require some Python skills to use including for instance setting up a virtual environment and knowing the basics of using the CLI.
You may have wondered what it involves to convert a script like this into a web app to make it available for other users such as your family members and co-workers.

In this article, I’ll walk through the general process I followed to “convert” a Python utility script into a full-blown web app by building a backend API using Django and Django Rest Framework and a single-page-app frontend using React.js.

But first, here is a bit of a background…

My family often needed me to merge (combine) PDF files for them. When they apply for online services they’re often required to upload all documents as a single PDF file.

As they only use Windows, they couldn’t find a reliable tool of merging PDFs. The apps they found worked only sometimes, they also didn’t want to upload sensitive docs to random PDF merging websites.
Being the family tech guy, they would ask me for help. They would send me the files to merge and I’d send them back a single PDF file; merged and ready to upload.

I used pdftk for a while. It’s available only on Linux (as far as I know) and the command looks like this:

pdftk 1.pdf 2.pdf 3.pdf cat output out.pdf

I then upgraded to a tiny Python script using PyPDF. Something like this:

from pathlib import Path from pypdf import PdfWriter merger = PdfWriter() path = Path.home() / "Documents" / "uni-application" with merger as PdfWriter: for file in path.glob('*.pdf'): merger.append(file) merger.write("merged.pdf")

I had to set the path manually every time and rename the PDFs so that they are merged in the desired order.

At some point, I felt I was merging PDFs for the family too often that I needed a way to expose this script to them somehow. I thought I should just build a web interface for it, host it on my local home server and let them merge as many files as they want without needing my help.

There are many ways to package and distribute a Python script including making a stand-alone executable. However, unless you build an intuitive GUI for it, average users may not be able to benefit from it.
From all GUI options out there, I chose to make it a web app mainly because web apps have a common look and an expected functionality, users are used to interacting with them almost everyday and they run (hopefully) everywhere.

So, as a side little project over months, I built a basic REST-ish API for the backend that allows you to create a merge request, add PDF files to it, merge the files and get back a single PDF file to download and use.
I also built a simple frontend for it using React.js.

For deployment, I used docker to run the app on my home server and expose it on the LAN with a custom “.home” local domain. Now, my family use it almost everyday to merge PDFs on their own and everyone is happy.

I’ll go over the general process I followed to build this small web app that serves only one purpose: merging PDF files.
You can swap PDF merging for any functionality like: converting audio/video files, resizing images, extracting text from PDFs or images, etc. The process is pretty much the same.

This is not a step-by-step guide. I’ll just explain the overall process. Tutorials on any technology mentioned here can be easily found online. Also, I won’t clutter the article with code examples as you can refer to the source code in the Github repo.

Backend API and Auth

I chose to go with Django and Django Rest Framework for building the backend as I’m more familiar with these two than other Python options like Flask and FastAPI. In addition, being a battery-included framework, Django comes with a lot of nice features I needed for this specific app, the most important of which are authentication and ORM.

Why the API? Why not just full-stack Django?

I could have just built a typical Django app that works as the backend and frontend at the same time. But I knew for sure I’m going to build a REST API alongside it at some point down the road, so why not just start by an API?
Moreover, the API-based backend is frontend-agnostic. It doesn’t assume a specific client-side technology. I can swap React for Vue for example. I can also build more clients to consume the API like a mobile or desktop app.

Building the API:

I started by thinking what the original Python script does and what the user usually wants to get back using it. In this case the user wants a way to:

upload at least 2 PDF files,
merge the PDF files on the server,
and download the single merged PDF file.

Models

To satisfy these requirements, I started a new Django project with a “merger” app, installed Django Rest Framework, added the models to save everything in the DB. Meaning, when a user creates a merge request, it’s saved to the DB to track various things about it like how many files it has and whether it’s been merged or not. Later, we can utilize this data and build upon it if we decide for example to put each user on a plan/package with a limited number of merges/files per merge etc.

I used the name “order” for the merge request effectively treating it as a transaction. My idea is to make things easier later if the app turns into a paid service.

Serializers

Then I created DRF serializers which are responsible for converting data from Python to JSON format and back (APIs expect and return JSON, not Python data types like dicts).

They say: don’t trust user uploads. So, in serializers also, I validate the uploaded PDF file to make sure they’re indeed a PDF file by verifying that the content type is ‘pdf’ and checking MIME type by reading the first 2084 bits of the file.
This is not a bullet-proof validation system. It may keep away casual malicious users but it won’t probably protect against determined attackers. Nevertheless, it prevents common user errors like uploading the wrong file format.

The validation also checks the file size to make sure it doesn’t exceed the limit set in settings.py, otherwise users can intentionally or naively upload huge files and block server response while uploading.
If the file fails to validate, the API will return a message containing the reason for rejection.

Although you can leave some of this validation for the frontend, you should always expect that some advanced users may go around frontend validation by trying to send requests directly to the API or even build their own frontend. After all, the API is open and can be consumed from any client. This off course doesn’t mean that they can bypass auth or permissions but by having validation in the backend, if they use a custom-built client, they can’t bypass the backend validation.

Views & Endpoints

I then switched to building the views which are essentially the endpoints the frontend will call to interact with the backend.
Based on the requirements of this specific app, following are the API endpoints with a short description of what each does:

GET: /orders/

when a GET request is sent to this endpoint, it returns back a list all orders that belong to the currently authenticated user.

POST: /orders/

A POST request with a payload containing the merge name like: {“name”: “my merge”} creates a new order (merge request).

GET: /orders/{uuid}/

returns details of an order identified by its id

DELETE /orders/{uuid}/

deletes an order

POST /orders/{uuid}/add_files

A POST request with a payload containing a file add files to an order after validating file type and max files limit. It also checks to see if the order has already been merged in which case no files can be added.

GET /orders/{uuid}/files/

lists files of an order

GET /orders/{uuid}/merge/

merges order identified by its id.
This is where the core feature of the app lives. This view/endpoint does some initial checks to verify that the order has at least 2 files to merge and has not been previously merged then it hands work over to a utility function that preforms the actual merging of all PDF files associated with the order:

import os import uuid from pypdf import PdfWriter from pypdf.errors import PdfReadError, PyPdfError from django.conf import settings from .exceptions import MergeException def merge_pdf_files(pdf_files): merger = PdfWriter() for f in pdf_files: try: merger.append(f.file) except FileNotFoundError: raise MergeException merged_path = os.path.join(settings.MEDIA_ROOT, f"merged_pdfs/merged_pdf_{uuid.uuid4()}.pdf") try: merger.write(merged_path) except FileNotFoundError: raise MergeException except PdfReadError: raise MergeException except PyPdfError: raise MergeException merger.close() if os.path.isfile(merged_path): return merged_path return None

Nothing fancy here, the function takes a list of PDF files, merges them using PyPDF and returns the path for the merged PDF.

On a successful merge, the view takes the path and sets it as the value for download_url property of the order instance for the next endpoint to use. It also marks the order and all of its files as merged. This can be used to cleanup all merged orders and their associated files to save server space.

GET /orders/{uuid}/download/

download the merged PDF of an order after verifying that it has been merged and ready for download. The API allows a max number of downloads of each order and max time on server. This prevents users from keeping merged files on the server forever and sharing the links, turning the app basically into a free file sharing service.

DELETE /files/{uuid}/

delete a file identified by its id

I then wired the urls to the views connecting each endpoint to the corresponding view.

Auth and Permissions:

To insure privacy (not necessarily security), and to allow the app to be used as a paid service later, I decided to require an account for every user to use the app. Thanks to Django’s built-in features, this can be done fairly easily with the auth module. However, I didn’t want to use the traditional session-based authentication flow. I just prefer the token-based auth as it’s more suitable for API-based apps.

I chose to use JWT (JSON Web Token). Basically it’s a way to allow the user to exchange their credentials for a set of tokens: an access token and a refresh token.
The access token will be attached in the header of every request (prefixed with “Bearer “) that requires authentication. Otherwise the request will fail. This token has shorter life span (usually hours or even minutes) and when it expires, the user can send the refresh token to get a new access token and continue using the app.

There are a number of packages that can add JWT auth flow to Django. Most of them use simple-jwt which you can use directly but it requires you to write more code yourself to implement the minimum register/login/out flow.
I went with Djoser which is a REST implementation of Django’s auth system.
It allowed me to use a custom user model to have extra fields in the user table and most importantly, utilize email/password for registration and login instead of Django’s default username/password although I had to tweak the models in the example project to work as intended.
Djoser also gave me the following endpoints for free:

GET /auth/users/ : lists all users if you’re an admin. Only returns your user info if you aren’t admin.
POST /auth/users/ : a POST request with a payload containing: name, email and password will register a new user.
POST /auth/jwt/create: a POST request with a payload containing: email and password will return access and refresh tokens to use for authenticating subsequent requests.
POST /auth/refresh : a POST request with a payload containing: the refresh token will return a new access token.
as well as some other useful endpoints for changing and resetting passwords.

On top of Django’s auth, DRF uses permissions to decide what API resources a user can access. In short, in the API, I check if the user is logged in first. Then I have only two permissions:

check if the user is the owner of the order before allowing them to access it, upload files to it, merge it or download it.
check if the user is the owner of the order that a file is associated with before allowing them to view the file details or delete it.
Failure to meet required permissions causes the API to raise a permission error and return a descriptive message.

During development of the backend, I used the browser extension “ModHeader” to include the access token in all requests I made through the browser (for testing via DRF built-in web UI).

Frontend

I chose React.js for building the frontend as a single-page-app (SPA) because it’s relatively easy to use for small apps and has a huge community. I won’t go into much details of how the frontend is built (this is a Python blog after all) but I will touch on the main points.

Auth in the frontend

First, here is a brief description of the auth workflow in the frontend:

when a user first visits the app, they’re asked to login or signup to continue.
the user can signup by filling out their name, email, password. This form data will be sent via an API “POST” request to the backend endpoint /auth/users and register the user.
for existing users, email and password are sent in a “POST” request to /auth/jwt/create which will return a pair of tokens: refresh and access. These tokens are saved in browser cookies. The access token will be sent in the header of all subsequent request to authenticate the user. When it expires, the frontend will request a new token on behalf of the user by sending the refresh token to /auth/refresh. If both expire, the user will be redirected to login again to obtain a new set of tokens.
when a user navigates to /logout, all tokens are cleared from browser cookies and the user is redirected to /login route.

Routes:

I built the necessary React component and routes for the app to roughly match the backend endpoints I discussed earlier. These are the routes the app has for authenticated users:

/ => home screen to choose to create a new merge or list previous merges.
/orders => list all merges for the currently logged in user with links to edit or delete any order.
/create => create new merge.
/order/{id} =>details of a merge by id.
/logout => logs the user out.

Merging workflow:

The workflow for merging a PDF file from the frontend is as follows:

when the user creates a new merge, they’re redirected to the merge detail route showing the merge they’ve just created with three buttons: add PDFs, merge, download.
merge detail route allows the user to upload PDF files to the merge. If the files are less than 2, only “Add PDF files” is enabled. When the user adds 2 files the “merge” button is activated. When the files reach the max number of files allowed in a single merge (5 currently) the “Add PDFs” button is disabled and the upload form is hidden.
When the user is done adding PDFs, they can click “merge” which will merge the files on the server and activate only the download button.
clicking the download button opens a new browser tab with the merged PDF.

UI & CSS:

For UI, the app uses daisyUI, a free component library that makes it easier to use TailwindCSS. The latter is super popular in the frontend world as a utility-first CSS framework.

Deployment:

I’ve not deployed the app to a real production server yet as the home server environment is very forgiving and you can skip some steps that you wouldn’t skip in a production deployment.
For now, I just have a basic Dockerfile and a docker-compose file to spin up the backend API (a regular Django project) and have it ready to accept calls from the frontend.

Likewise, a set of docker files is used to spin up the frontend. After building it using “npm run build”, the docker file copies the deployable app from the “dist” folder to the Nginx document root folder inside the docker container and just runs it as any other website hosted on a web server.
This setup is probably enough for development and hosting locally. When it comes time to publish the on the web, “real” deployment considerations must be taken.

It’s worth noting that I have a separate repo for backend and frontend to keep both ends decoupled from each other. The backend API can be consumed from any frontend be it a web, mobile or desktop app.

Further improvements:

The app in its current state works and does the intended job. It’s far from perfect and can use some improvements. I’ll include these in the README of the repos.

Source code:

Backend: https://github.com/ahmedlemine/pdf-merger
Frontend: https://github.com/ahmedlemine/pdfmerger-frontend

Conclusion

In this article I walked through the general process of how I built a web app for a Python script to make it available for use by average end users.
Python scripts are great starting points for apps. They’re also a source of inspiration for app ideas. If you have a script that performs a common daily-life task, consider building an app for it. The process will teach you a ton on the lifecycle of app development. It forces you to think of and account for new aspects you don’t usually consider when writing a stand-alone script.
As you build the app though, always remember to:

keep it simple. Don’t over complicate things.
ship fast. Aim at building an MVP (Minimum Viable Product) with the necessary functionality. Don’t wait until you’ve built every feature you think the app should have. Instead, ship it and then iterate on it and add features slowly as they’re needed.
not to feel intimidated by other mature projects out there. They’ve been built most likely over a long period of time and have been iterated over tens or even thousands of times before they reached their mature state they are in today.

I hope you found this article helpful and I look forward to seeing you in a future one.

Categories: FLOSS Project Planets

GNU Taler news: Video interview with Mikolai Gütschow on payments for the Internet of Things

GNU Planet! - Fri, 2024-07-19 04:52

On the occasion of the Point Zero Forum's Innovation Tour, Evgeny Grin has interviewed Mikolai Gütschow who designed and implemented solutions for the payments in the Internet of Things (IoT).

Categories: FLOSS Project Planets

Matt Layman: Activation Email Job - Building SaaS #196

Planet Python - Thu, 2024-07-18 20:00

In this episode, we chatted about managing dependencies and the cost of maintenance. Then we got into some feature work and began building a job that will send users an email as reminder to activate their account shortly before it expires.

Categories: FLOSS Project Planets

Quansight Labs Blog: The convoluted story behind `np.top_k`

Planet Python - Thu, 2024-07-18 20:00

In this blog post, I describe my experience as a first-time contributor to NumPy and talk about the story behind `np.top_k`.

Categories: FLOSS Project Planets

GNUnet News: The European Union must keep funding free software

GNU Planet! - Thu, 2024-07-18 18:00

The European Union must keep funding free software

The GNUnet project was granted NGI funding via NLnet . Other FOSS related projects also benefit from NGI funding. This funding is now at risk for future projects.

The following is an open letter initially published in French by the Petites Singularités association. To co-sign it, please publish it on your website in your preferred language, then add yourself to this table .

Open Letter to the European Commission.

Since 2020, Next Generation Internet ( NGI ) programmes, part of European Commission’s Horizon programme, fund free software in Europe using a cascade funding mechanism (see for example NLnet’s calls ). This year, according to the Horizon Europe working draft detailing funding programmes for 2025, we notice that Next Generation Internet is not mentioned any more as part of Cluster 4.

Previous Cluster 4 allocated 27 million euros to:

“Human centric Internet aligned with values and principles commonly shared in Europe” ;
“A flourishing internet, based on common building blocks created within NGI, that enables better control of our digital life” ;
“A structured ecosystem of talented contributors driving the creation of new internet commons and the evolution of existing internet commons”.

In the name of these challenges, more than 500 projects received NGI funding in the first 5 years, backed by 18 organisations managing these European funding consortia.

Moreover, NGI allows exchanges and collaborations across all the Euro zone countries as well as “widening countries” 1 , currently both a success and an ongoing progress, likewise the Erasmus programme before us. NGI also contributes to opening and supporting longer relationships than strict project funding does. It encourages implementing projects funded as pilots, backing collaboration, identification and reuse of common elements across projects, interoperability in identification systems and beyond, and setting up development models that mix diverse scales and types of European funding schemes.

While the USA, China or Russia deploy huge public and private resources to develop software and infrastructure that massively capture private consumer data, the EU can’t afford this renunciation. Free and open source software, as supported by NGI since 2020, is by design the opposite of potential vectors for foreign interference. It lets us keep our data local and favors a community-wide economy and know-how, while allowing an international collaboration. This is all the more essential in the current geopolitical context: the challenge of technological sovereignty is central, and free software allows addressing it while acting for peace and sovereignty in the digital world as a whole.

As defined by Horizon Europe, widening Member States are Bulgaria, Croatia, Cyprus, Czechia, Estonia, Greece, Hungary, Latvia, Lituania, Malta, Poland, Portugal, Romania, Slovakia, and Slovenia. Widening associated countries (under condition of an association agreement) include Albania, Armenia, Bosnia, Feroe Islands, Georgia, Kosovo, Moldavia, Montenegro, Morocco, North Macedonia, Serbia, Tunisia, Turkeye, and Ukraine. Widening overseas regions are Guadeloupe, French Guyana, Martinique, Reunion Island, Mayotte, Saint-Martin, The Azores, Madeira, the Canary Islands. ↩︎

Categories: FLOSS Project Planets

Cailean Osborne: voices of the Open Source AI Definition

Open Source Initiative - Thu, 2024-07-18 13:09

The Open Source Initiative (OSI) is running a series of stories about a few of the people involved in the Open Source AI Definition (OSAID) co-design process. Today, we are featuring Cailean Osborne, one of the volunteers who has helped to shape and are shaping the OSAID.

Question: What’s your background related to Open Source and AI?

My interest in Open Source AI began around 2020 when I was working in AI policy at the UK Government. I was surprised that Open Source never came up in policy discussions, given its crucial role in AI R&D. Having been a regular user of libraries like scikit-learn and PyTorch in my previous studies. I followed Open Source AI trends in my own time and eventually I decided to do a PhD on the topic. When I started my PhD back in 2021, Open Source AI still felt like a niche topic, so it’s been exciting to watch it become a major talking point over the years.

Beyond my PhD, I’ve been involved in Open Source AI community as a contributor to scikit-learn and as a co-developer of the Model Openness Framework (MOF) with peers from the Generative AI Commons community. Our goal with the MOF is to provide guidance for AI researchers and developers to evaluate the completeness and openness of “Open Source” models based on open science principles. We were chuffed that the OSI team chose to use the 16 components from the MOF as the rubric for reviewing models in the co-design process.

Question: What motivated you to join this co-design process to define Open Source AI?

The short answer is: to contribute to establishing an accurate definition for “Open Source AI” and to learn from all the other experts involved in the co-design process. The longer answer is: There’s been a lot of confusion about what is or is not “Open Source AI,” which hasn’t been helped by open-washing. “Open source” has a specific definition (i.e. the right to use, study, modify, and redistribute source code) and what is being promoted as “Open Source AI” deviates significantly from this definition. Rather than being pedantic, getting the definition right matters for several reasons; for example, for the “Open Source” exemptions in the EU AI Act to work (or not work), we need to know precisely what “Open Source” models actually are. Andreas Liesenfeld and Mark Dingemanse have written a great piece about the issues of open-washing and how they relate to the AI Act, which I recommend reading if you haven’t yet. So, I got involved to help develop a definition and to learn from all the other experts involved. It hasn’t been easy (it’s a pretty divisive topic!), but I think we’ve made good progress.

Question: Can you describe your experience participating in this process? What did you most enjoy about it and what were some of the challenges you faced?

First off, I have to give credit to Stef and Mer for maintaining momentum throughout the process. Coordinating a co-design effort with volunteers scattered around the globe, each with varying levels of availability and (strong) opinions on the matter, is no small feat. So, well done! I also enjoyed seeing how others agreed or disagreed when reviewing models. The moments of disagreement were the most interesting; for example, about whether training data should be available versus documented and if so, in how much detail… Personally, the main challenge was searching for information about the various components of models that were apparently “Open Source” and observing how little information was actually provided beyond weights, a model card, and if you’re lucky an arXiv preprint or technical report.

Question: Why do you think AI should be Open Source?

When talking about the benefits of Open Source AI, I like to point folks to a 2007 paper, in which 16 researchers highlighted “The Need for Open Source Software in Machine Learning” due to basically the complete lack of OSS for ML/AI at the time. Fast forward to today, AI R&D is practically unthinkable without OSS, from data tooling to the deep learning frameworks used to build LLMs. Open source and openness in general have many benefits for AI, from enabling access to SOTA AI technologies and transparency which is key for reproducibility, scrutiny, and accountability to widening participation in their design, development, and governance.

Question: What do you think is the role of data in Open Source AI?

If the question is strictly about the role of data in developing open AI models, the answer is pretty simple: Data plays a crucial role because it is needed for training, testing, aligning, and auditing models. But if the question is asking “should the release of data be a condition for an open model to qualify as Open Source AI,” then the answer is obviously much more complicated.

Companies are in no rush to share training data due to a handful of reasons: be it competitive advantage, data protection, or frankly being sued for copyright infringement. The copyright concern isn’t limited to companies: EleutherAI has also been sued and had to take down the Books3 dataset from The Pile. There are also many social and cultural concerns that restrict data sharing; for example, the Kōrero Kaitiakitanga license has been developed to protect the interests of indigenous communities in New Zealand. So, the data question isn’t easy and perhaps we shouldn’t be too dogmatic about it.

Personally, I think the compromise in v. 0.0.8, which states that model developers should provide sufficiently detailed information about data if they can’t release the training dataset itself, is a reasonable halfway house. I also hope to see more open pre-training datasets like the one developed by the community-driven BigScience Project, which involved open deliberation about the design of the dataset and provides extensive documentation about data provenance and processing decisions (e.g. check out their Data Catalogue). The FineWeb dataset by Hugging Face is another good example of an open pre-training dataset, which they released with pre-processing code, evaluation results, and super detailed documentation.

Question: Has your personal definition of Open Source AI changed along the way? What new perspectives or ideas did you encounter while participating in the co-design process?

To be honest, my personal definition hasn’t changed much. I am not a big fan of the use of “Open Source AI” when folks specifically mean “open models” or “open-weight models”. What we need to do is raise awareness about appropriate terminology and point out “open-washing”, as people have done, and I must say that subjectively I’ve seen improvements: less “Open Source models” and more “open models”. But I will say that I do find “Open Source AI” a useful umbrella term for the various communities of practice that intertwine in the development of open models, including OSS, open data, and AI researchers and developers, who all bring different perspectives and ways of working to the overarching “Open Source AI” community.

Question: What do you think the primary benefit will be once there is a clear definition of Open Source AI?

We’ll be able to reduce confusion about what is or isn’t “Open Source AI” and more easily combat open-washing efforts. As I mentioned before, this clarity will be beneficial for compliance with regulations like the AI Act which includes exemptions for “Open Source” AI.

Question: What do you think are the next steps for the community involved in Open Source AI?

We still have many steps to take but I’ll share three for now.

First, we urgently need to improve the auditability and therefore the safety of open models. With OSS, we know that (1) the availability of source code and (2) open development enable the distributed scrutiny of source code. Think Linus’ Law: “Given enough eyeballs, all bugs are shallow.” Yet open models are more complex than just source code, and the lack of openness of many key components like training data is holding back adoption because would-be adopters can’t adequately run due diligence tests on the models. If we want to realise the benefits of “Open Source AI,” we need to figure out how to increase the transparency and openness of models —we hope the Model Openness Framework can help with this.

Second, I’m really excited about grassroots initiatives that are leading community-driven approaches to developing open models and open datasets like the BigScience project. They’re setting an example of how to do “Open Source AI” in a way that promotes open collaboration, transparency, reproducibility, and safety from the ground up. I can still count such initiatives with my fingers but I am hopeful that we will see more community-driven efforts in the future.
Third, I hope to see the public sector and non-profit foundations get more involved in supporting public interest and grassroots initiatives. France has been a role model on this front: providing a public grant to train the BigScience project’s BLOOM model on the Jean Zay supercomputer, as well as funding the scikit-learn team to build out a data science commons.

Categories: FLOSS Research

The Drop Times: The Chief Who Drives and Is Driven by Drupal: A Talk with Dries Buytaert

Planet Drupal - Thu, 2024-07-18 12:07

Join The DropTimes (TDT) for its milestone 100th interview with Dries Buytaert, the innovative founder of Drupal. Interviewed by Anoop John, Founder and Lead of The DropTimes, this conversation explores Drupal’s rich history and transformative journey. Dries shares key moments that boosted Drupal’s adoption, insights on community growth through events like DrupalCon, and the impact of the Drupal Starshot initiative. He discusses strategies for making Drupal more accessible, integrating AI, and effective community communication. This interview captures Drupal’s evolution and future aspirations, offering valuable insights for both seasoned users and newcomers. Don’t miss this engaging discussion celebrating Drupal’s ongoing impact and future.

Categories: FLOSS Project Planets

mark.ie: My LocalGov Drupal contributions for week-ending July 19th, 2024

Planet Drupal - Thu, 2024-07-18 12:00

Here's what I've been working on for my LocalGov Drupal contributions this week. Thanks to Big Blue Door for sponsoring the time to work on these.

Categories: FLOSS Project Planets

Search form

Tag cloud

Feeds

Python Morsels: Using "else" in a comprehension

GSOC: Accident Week!

This past two weeks in KDE: fixing sticky keys and the worst crashes

KDE signs petition urging European Union to continue funding free software

Dirk Eddelbuettel: dtts 0.1.3 on CRAN: More Maintenance

PyPy: Mining JIT traces for missing optimizations with Z3

mark.ie: My Drupal Core Contributions for week-ending July 19th, 2024

Web Review, Week 2024-29

Bits from Debian: New Debian Developers and Maintainers (May and June 2024)

Real Python: The Real Python Podcast – Episode #213: Constraint Programming & Exploring Python's Built-in Functions

Real Python: Quiz: Python Strings and Character Data

Debug Academy: Evaluating Acquia storage limits - "Emergency upsize" notification

PyBites: How to convert a Python script into a web app, a product others can use

GNU Taler news: Video interview with Mikolai Gütschow on payments for the Internet of Things

Matt Layman: Activation Email Job - Building SaaS #196

Quansight Labs Blog: The convoluted story behind `np.top_k`

GNUnet News: The European Union must keep funding free software

Cailean Osborne: voices of the Open Source AI Definition

The Drop Times: The Chief Who Drives and Is Driven by Drupal: A Talk with Dries Buytaert

mark.ie: My LocalGov Drupal contributions for week-ending July 19th, 2024

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research

Search form

Tag cloud

You are here

Feeds

Pages

Recent Publications

FLOSS Project Planets

FLOSS Research