Feeds

The Drop Times: Detailed Overview of the 2024 Drupal Developer Survey Results

Planet Drupal - Tue, 2024-05-07 09:08
The 2024 Drupal Developer Survey, led by Jeff Geerling, Chris Urban, and Michael Richardson, provided a comprehensive overview of the global Drupal community. With 648 developers from 65 countries, including significant contributions from the United States, France, and India, the survey showcased a mature developer base, with 76% aged between 30 and 49. Despite regional variations in sentiment, community engagement remained strong, with 65% participating in Drupal events, and 80% expressing optimism for Drupal's future. Notable trends included the rise of decoupled architectures and the endorsement of DDEV as the preferred choice for integrated development environments (IDEs) and local environment managers. The survey also highlighted opportunities for the Drupal Association to enhance its visibility and communication efforts. These insights will inform strategies for fostering growth and innovation within the ecosystem.
Categories: FLOSS Project Planets

GNU Guix: Authenticate your Git checkouts!

GNU Planet! - Tue, 2024-05-07 08:08

You clone a Git repository, then pull from it. How can you tell its contents are “authentic”—i.e., coming from the “genuine” project you think you’re pulling from, written by the fine human beings you’ve been working with? With commit signatures and “verified” badges ✅ flourishing, you’d think this has long been solved—but nope!

Four years after Guix deployed its own tool to allow users to authenticate updates fetched with guix pull (which uses Git under the hood), the situation hasn’t changed all that much: the vast majority of developers using Git simply do not authenticate the code they pull. That’s pretty bad. It’s the modern-day equivalent of sharing unsigned tarballs and packages like we’d blissfully do in the past century.

The authentication mechanism Guix uses for channels is available to any Git user through the guix git authenticate command. This post is a guide for Git users who are not necessarily Guix users but are interested in using this command for their own repositories. Before looking into the command-line interface and how we improved it to make it more convenient, let’s dispel any misunderstandings or misconceptions.

Why you should care

When you run git pull, you’re fetching a bunch of commits from a server. If it’s over HTTPS, you’re authenticating the server itself, which is nice, but that does not tell you who the code actually comes from—the server might be compromised and an attacker pushed code to the repository. Not helpful. At all.

But hey, maybe you think you’re good because everyone on your project is signing commits and tags, and because you’re disciplined, you routinely run git log --show-signature and check those “Good signature” GPG messages. Maybe you even have those fancy “✅ verified” badges as found on GitLab and on GitHub.

Signing commits is part of the solution, but it’s not enough to authenticate a set of commits that you pull; all it shows is that, well, those commits are signed. Badges aren’t much better: the presence of a “verified” badge only shows that the commit is signed by the OpenPGP key currently registered for the corresponding GitLab/GitHub account. It’s another source of lock-in and makes the hosting platform a trusted third-party. Worse, there’s no notion of authorization (which keys are authorized), let alone tracking of the history of authorization changes (which keys were authorized at the time a given commit was made). Not helpful either.

Being able to ensure that when you run git pull, you’re getting code that genuinely comes from authorized developers of the project is basic security hygiene. Obviously it cannot protect against efforts to infiltrate a project to eventually get commit access and insert malicious code—the kind of multi-year plot that led to the xz backdoor—but if you don’t even protect against unauthorized commits, then all bets are off.

Authentication is something we naturally expect from apt update, pip, guix pull, and similar tools; why not treat git pull to the same standard?

Initial setup

The guix git authenticate command authenticates Git checkouts, unsurprisingly. It’s currently part of Guix because that’s where it was brought to life, but it can be used on any Git repository. This section focuses on how to use it; you can learn about the motivation, its design, and its implementation in the 2020 blog post, in the 2022 peer-reviewed academic paper entitled Building a Secure Software Supply Chain with GNU Guix, or in this 20mn presentation.

To support authentication of your repository with guix git authenticate, you need to follow these steps:

  1. Enable commit signing on your repo: git config commit.gpgSign true. (Git now supports other signing methods but here we need OpenPGP signatures.)

  2. Create a keyring branch containing all the OpenPGP keys of all the committers, along these lines:

    git checkout --orphan keyring git reset --hard gpg --export alice@example.org > alice.key gpg --export bob@example.org > bob.key … git add *.key git commit -m "Add committer keys."

    All the files must end in .key. You must never remove keys from that branch: keys of users who left the project are necessary to authenticate past commits.

  3. Back to the main branch, add a .guix-authorizations file, listing the OpenPGP keys of authorized committers—we’ll get back to its format below.

  4. Commit! This becomes the introductory commit from which authentication can proceed. The introduction of your repository is the ID of this commit and the OpenPGP fingerprint of the key used to sign it.

That’s it. From now on, anyone who clones the repository can authenticate it. The first time, run:

guix git authenticate COMMIT SIGNER

… where COMMIT is the commit ID of the introductory commit, and SIGNER is the OpenPGP fingerprint of the key used to sign that commit (make sure to enclose it in double quotes if there are spaces!). As a repo maintainer, you must advertise this introductory commit ID and fingerprint on a web page or in a README file so others know what to pass to guix git authenticate.

The commit and signer are now recorded on the first run in .git/config; next time, you can run it without any arguments:

guix git authenticate

The other new feature is that the first time you run it, the command installs pre-push and pre-merge hooks (unless preexisting hooks are found) such that your repository is automatically authenticated from there on every time you run git pull or git push.

guix git authenticate exits with a non-zero code and an error message when it stumbles upon a commit that lacks a signature, that is signed by a key not in the keyring branch, or that is signed by a key not listed in .guix-authorizations.

Maintaining the list of authorized committers

The .guix-authorizations file in the repository is central: it lists the OpenPGP fingerprints of authorized committers. Any commit that is not signed by a key listed in the .guix-authorizations file of its parent commit(s) is considered inauthentic—and an error is reported. The format of .guix-authorizations is based on S-expressions and looks like this:

;; Example ‘.guix-authorizations’ file. (authorizations (version 0) ;current file format version (("AD17 A21E F8AE D8F1 CC02 DBD9 F8AE D8F1 765C 61E3" (name "alice")) ("2A39 3FFF 68F4 EF7A 3D29 12AF 68F4 EF7A 22FB B2D5" (name "bob")) ("CABB A931 C0FF EEC6 900D 0CFB 090B 1199 3D9A EBB5" (name "charlie"))))

The name bits are hints and do not have any effect; what matters is the fingerprints that are listed. You can obtain them with GnuPG by running commands like:

gpg --fingerprint charlie@example.org

At any time you can add or remove keys from .guix-authorizations and commit the changes; those changes take effect for child commits. For example, if we add Billie’s fingerprint to the file in commit A, then Billie becomes an authorized committer in descendants of commit A (we must make sure to add Billie’s key as a file in the keyring branch, too, as we saw above); Billie is still unauthorized in branches that lack A. If we remove Charlie’s key from the file in commit B, then Charlie is no longer an authorized committer, except in branches that start before B. This should feel rather natural.

That’s pretty much all you need to know to get started! Check the manual for more info.

All the information needed to authenticate the repository is contained in the repository itself—it does not depend on a forge or key server. That’s a good property to allow anyone to authenticate it, to ensure determinism and transparency, and to avoid lock-in.

Interested? You can help!

guix git authenticate is a great tool that you can start using today so you and fellow co-workers can be sure you’re getting the right code! It solves an important problem that, to my knowledge, hasn’t really been addressed by any other tool.

Maybe you’re interested but don’t feel like installing Guix “just” for this tool. Maybe you’re not into Scheme and Lisp and would rather use a tool written in your favorite language. Or maybe you think—and rightfully so—that such a tool ought to be part of Git proper.

That’s OK, we can talk! We’re open to discussing with folks who’d like to come up with alternative implementations—check out the articles mentioned above if you’d like to take that route. And we’re open to contributing to a standardization effort. Let’s get in touch!

Acknowledgments

Thanks to Florian Pelz and Simon Tournier for their insightful comments on an earlier draft of this post.

Categories: FLOSS Project Planets

Shannon -jj Behrens: Python: My Favorite Python Tricks for LeetCode Questions

Planet Python - Tue, 2024-05-07 07:44

I've been spending a lot of time practicing on LeetCode recently, so I thought I'd share some of my favorite intermediate-level Python tricks. I'll also cover some newer features of Python you may not have started using yet. I'll start with basic tips and then move to more advanced ones.

Get help()

Python's documentation is pretty great, and some of these examples are taken from there.

For instance, if you just google "heapq", you'll see the official docs for heapq, which are often enough.

However, it's also helpful to sometimes just quickly use help() in the shell. Here, I can't remember that push() is actually called append().

>>> help([]) >>> dir([]) >>> help([].append) enumerate()

If you need to loop over a list, you can use enumerate() to get both the item as well as the index. As a mnemonic, I like to think for (i, x) in enumerate(...):

for (i, x) in enumerate(some_list): ... items()

Similarly, you can get both the key and the value at the same time when looping over a dict using items():

for (k, v) in some_dict.items(): ... [] vs. get()

Remember, when you use [] with a dict, if the value doesn't exist, you'll get a KeyError. Rather than see if an item is in the dict and then look up its value, you can use get():

val = some_dict.get(key) # It defaults to None. if val is None: ...

Similarly, .setdefault() is sometimes helpful.

Some people prefer to just use [] and handle the KeyError since exceptions aren't as expensive in Python as they are in other languages.

range() is smarter than you think for item in range(items): ... for index in range(len(items)): ... # Count by 2s. for i in range(0, 100, 2): ... # Count backward from 100 to 0 inclusive. for i in range(100, -1, -1): ... # Okay, Mr. Smarty Pants, I'm sure you knew all that, but did you know # that you can pass a range object around, and it knows how to reverse # itself via slice notation? :-P r = range(100) r = r[::-1] # range(99, -1, -1) print(f'') debugging

Have you switched to Python's new format strings yet? They're more convenient and safer (from injection vulnerabilities) than % and .format(). They even have a syntax for outputing the thing as well as its value:

# Got 2+2=4 print(f'Got {2+2=}') for else

Python has a feature that I haven't seen in other programming languages. Both for and while can be followed by an else clause, which is useful when you're searching for something.

for item in some_list: if is_what_im_looking_for(item): print(f"Yay! It's {item}.") break else: print("I couldn't find what I was looking for.") Use a list as a stack

The cost of using a list as a stack is (amortized) O(1):

elements = [] elements.append(element) # Not push element = elements.pop()

Note that inserting something at the beginning of the list or in the middle is more expensive it has to shift everything to the right--see deque below.

sort() vs. sorted() # sort() sorts a list in place. my_list.sort() # Whereas sorted() returns a sorted *copy* of an iterable: my_sorted_list = sorted(some_iterable)

And, both of these can take a key function if you need to sort objects.

set and frozenset

Sets are so useful for so many problems! Just in case you didn't know some of these tricks:

# There is now syntax for creating sets. s = {'Von'} # There are set "comprehensions" which are like list comprehensions, but for sets. s2 = {f'{name} the III' for name in s} {'Von the III'} # If you can't remember how to use union, intersection, difference, etc. help(set()) # If you need an immutable set, for instance, to use as a dict key, use frozenset. frozenset((1, 2, 3)) deque

If you find yourself needing a queue or a list that you can push and pop from either side, use a deque:

>>> from collections import deque >>> >>> d = deque() >>> d.append(3) >>> d.append(4) >>> d.appendleft(2) >>> d.appendleft(1) >>> d deque([1, 2, 3, 4]) >>> d.popleft() 1 >>> d.pop() 4 Using a stack instead of recursion

Instead of using recursion (which has a depth of about 1024 frames), you can use a while loop and manually manage a stack yourself. Here's a slightly contrived example:

work = [create_initial_work()] while work: work_item = work.pop() result = process(work_item) if is_done(result): return result work.append(result.pieces[0]) work.append(result.pieces[1]) Using yield from

If you don't know about yield, you can go spend some time learning about that. It's awesome.

Sometimes, when you're in one generator, you need to call another generator. Python now has yield from for that:

def my_generator(): yield 1 yield from some_other_generator() yield 6

So, here's an example of backtracking:

class Solution: def problem(self, digits: str) -> List[str]: def generate_possibilities(work_so_far, remaining_work): if not remaining_work: if work_so_far: yield work_so_far return first_part, remaining_part = remaining_work[0], remaining_work[1:] for i in things_to_try: yield from generate_possibilities(work_so_far + i, remaining_part) output = list(generate_possibilities(no_work_so_far, its_all_remaining_work)) return output

This is appropriate if you have less than 1000 "levels" but a ton of possibilities for each of those levels. This won't work if you're going to need more than 1000 layers of recursion. In that case, switch to "Using a stack instead of recursion".

Updated: On the other hand, if you can have the recursive function append to some list of answers instead of yielding it all the way back to the caller, that's faster.

Pre-initialize your list

If you know how long your list is going to be ahead of time, you can avoid needing to resize it multiple times by just pre-initializing it:

dp = [None] * len(items) collections.Counter()

How many times have you used a dict to count up something? It's built-in in Python:

>>> from collections import Counter >>> c = Counter('abcabcabcaaa') >>> c Counter({'a': 6, 'b': 3, 'c': 3}) defaultdict

Similarly, there's defaultdict:

>>> from collections import defaultdict >>> d = defaultdict(list) >>> d['girls'].append('Jocylenn') >>> d['boys'].append('Greggory') >>> d defaultdict(<class 'list'>, {'girls': ['Jocylenn'], 'boys': ['Greggory']})

Notice that I didn't need to set d['girls'] to an empty list before I started appending to it.

heapq

I had heard of heaps in school, but I didn't really know what they were. Well, it turns out they're pretty helpful for several of the problems, and Python has a list-based heap implementation built-in.

If you don't know what a heap is, I recommend this video and this video. They'll explain what a heap is and how to implement one using a list.

The heapq module is a built-in module for managing a heap. It builds on top of an existing list:

import heapq some_list = ... heapq.heapify(some_list) # The head of the heap is some_list[0]. # The len of the heap is still len(some_list). heapq.heappush(some_list, item) head_item = heapq.heappop(some_list)

The heapq module also has nlargest and nsmallest built-in so you don't have to implement those things yourself.

Keep in mind that heapq is a minheap. Let's say that what you really want is a maxheap, and you're not working with ints, you're working with objects. Here's how to tweak your data to get it to fit heapq's way of thinking:

heap = [] heapq.heappush(heap, (-obj.value, obj)) (ignored, first_obj) = heapq.heappop()

Here, I'm using - to make it a maxheap. I'm wrapping things in a tuple so that it's sorted by the obj.value, and I'm including the obj as the second value so that I can get it.

Use bisect for binary search

I'm sure you've implemented binary search before. Python has it built-in. It even has keyword arguments that you can use to search in only part of the list:

import bisect insertion_point = bisect.bisect_left(sorted_list, some_item, lo=lo, high=high)

Pay attention to the key argument which is sometimes useful, but may take a little work for it to work the way you want.

namedtuple and dataclasses

Tuples are great, but it can be a pain to deal with remembering the order of the elements or unpacking just a single element in the tuple. That's where namedtuple comes in.

>>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(5, 7) >>> p Point(x=5, y=7) >>> p.x 5 >>> q = p._replace(x=92) >>> p Point(x=5, y=7) >>> q Point(x=92, y=7)

Keep in mind that tuples are immutable. I particularly like using namedtuples for backtracking problems. In that case, the immutability is actually a huge asset. I use a namedtuple to represent the state of the problem at each step. I have this much stuff done, this much stuff left to do, this is where I am, etc. At each step, you take the old namedtuple and create a new one in an immutable way.

Updated: Python 3.7 introduced dataclasses. These have multiple advantages:

  • They can be mutable or immutable (although, there's a small performance penalty).
  • You can use type annotations.
  • You can add methods.

from dataclasses import dataclass @dataclass # Or: @dataclass(frozen=True) class InventoryItem: """Class for keeping track of an item in inventory.""" name: str unit_price: float quantity_on_hand: int = 0 def total_cost(self) -> float: return self.unit_price * self.quantity_on_hand item = InventoryItem(name='Box', unit_price=19, quantity_on_hand=2)

dataclasses are great when you want a little class to hold some data, but you don't want to waste much time writing one from scratch.

Updated: Here's a comparison between namedtuples and dataclasses. It leads me to favor dataclasses since they have faster property access and use 30% less memory :-/ Per the Python docs, using frozen=True is slightly slower than not using it. In my (extremely unscientific) testing, using a normal class with __slots__ is faster and uses less memory than a dataclass.

int, decimal, math.inf, etc.

Thankfully, Python's int type supports arbitrarily large values by default:

>>> 1 << 128 340282366920938463463374607431768211456

There's also the decimal module if you need to work with things like money where a float isn't accurate enough or when you need a lot of decimal places of precision.

Sometimes, they'll say the range is -2 ^ 32 to 2 ^ 32 - 1. You can get those values via bitshifting:

>>> -(2 ** 32) == -(1 << 32) True >>> (2 ** 32) - 1 == (1 << 32) - 1 True

Sometimes, it's useful to initialize a variable with math.inf (i.e. infinity) and then try to find new values less than that.

Updated: If you want to save memory by not importing the math module, just use float("inf").

Closures

I'm not sure every interviewer is going to like this, but I tend to skip the OOP stuff and use a bunch of local helper functions so that I can access things via closure:

class Solution(): # This is what LeetCode gave me. def solveProblem(self, arg1, arg2): # Why they used camelCase, I have no idea. def helper_function(): # I have access to arg1 and arg2 via closure. # I don't have to store them on self or pass them around # explicitly. return arg1 + arg2 counter = 0 def can_mutate_counter(): # By using nonlocal, I can even mutate counter. # I rarely use this approach in practice. I usually pass in it # as an argument and return a value. nonlocal counter counter += 1 can_mutate_counter() return helper_function() + counter match statement

Did you know Python now has a match statement?

# Taken from: https://learnpython.com/blog/python-match-case-statement/ >>> command = 'Hello, World!' >>> match command: ... case 'Hello, World!': ... print('Hello to you too!') ... case 'Goodbye, World!': ... print('See you later') ... case other: ... print('No match found')

It's actually much more sophisticated than a switch statement, so take a look, especially if you've never used match in a functional language like Haskell.

OrderedDict

If you ever need to implement an LRU cache, it'll be quite helpful to have an OrderedDict.

Python's dicts are now ordered by default. However, the docs for OrderedDict say that there are still some cases where you might need to use OrderedDict. I can't remember. If you never need your dicts to be ordered, just read the docs and figure out if you need an OrderedDict or if you can use just a normal dict.

@functools.cache

If you need a cache, sometimes you can just wrap your code in a function and use functools.cache:

from functools import cache @cache def factorial(n): return n * factorial(n - 1) if n else 1 print(factorial(5)) ... factorial.cache_info() # CacheInfo(hits=3, misses=8, maxsize=32, currsize=8) Debugging ListNodes

A lot of the problems involve a ListNode class that's provided by LeetCode. It's not very "debuggable". Add this code temporarily to improve that:

def list_node_str(head): seen_before = set() pieces = [] p = head while p is not None: if p in seen_before: pieces.append(f'loop at {p.val}') break pieces.append(str(p.val)) seen_before.add(p) p = p.next joined_pieces = ', '.join(pieces) return f'[{joined_pieces}]' ListNode.__str__ = list_node_str Saving memory with the array module

Sometimes you need a really long list of simple numeric (or boolean) values. The array module can help with this, and it's an easy way to decrease your memory usage after you've already gotten your algorithm working.

>>> import array >>> array_of_bytes = array.array('b') >>> array_of_bytes.frombytes(b'\0' * (array_of_bytes.itemsize * 10_000_000))

Pay close attention to the type of values you configure the array to accept. Read the docs.

I'm sure there's a way to use individual bits for an array of booleans to save even more space, but it'd probably cost more CPU, and I generally care about CPU more than memory.

Using an exception for the success case rather than the error case

A lot of Python programmers don't like this trick because it's equivalent to goto, but I still occasionally find it convenient:

class Eureka(StopIteration): """Eureka means "I found it!" """ pass def do_something_else(): some_value = 5 raise Eureka(some_value) def do_something(): do_something_else() try: do_something() except Eureka as exc: print(f'I found it: {exc.args[0]}') Updated: EnumsPython now has a built-in enums:from enum import Enum # Either: class Color(Enum): RED = 1 GREEN = 2 BLUE = 3 # Or: Color = Enum('Color', ['RED', 'GREEN', 'BLUE']) However, in my experience, when coding for LeetCode, just having some local constants (even if the values are strings) is a tad faster and requires a tad less memory:
RED = "RED" GREEN = "GREEN" BLUE = "BLUE" Using strings isn't actually slow if all you're doing is pointer comparisons.Updated: Using a profilerYou'll need some sample data. Make your code crash when it sees a test case with a lot of data. Grab the data in order to get your code to run on its own. Run something like the following. It'll print out enough information to figure out how to improve your code.import cProfile cProfile.run("Solution().someMethod(sampleData)") Using VS Code, etc.

VS Code has a pretty nice Python extension. If you highlight the code and hit shift-enter, it'll run it in a shell. That's more convenient than just typing everything directly in the shell. Other editors have something similar, or perhaps you use a Jupyter notebook for this.

Another thing that helps me is that I'll often have separate files open with separate attempts at a solution. I guess you can call this the "fast" approach to branching.

Write English before Python

One thing that helps me a lot is to write English before writing Python. Just write all your thoughts. Keep adding to your list of thoughts. Sometimes you have to start over with a new list of thoughts. Get all the thoughts out, and then pick which thoughts you want to start coding first.

Conclusion

Well, those are my favorite tricks off the top of my head. I'll add more if I think of any.

This is just a single blog post, but if you want more, check out Python 3 Module of the Week.

Categories: FLOSS Project Planets

Marcos Dione: Collating, processing, managing, backing up and serving a gallery of a 350GiB, 60k picture collection

Planet Python - Tue, 2024-05-07 07:06

In the last two days I have commented a little bit how I process and manage my photos. I'm not a very avid photographer, I have like 350 gigabytes of photos, most of them are yet not processed, around 60,000 of them. So I will comment a little bit more how do I manage all that.

I start with the camera, a 24Mpx camera, just a couple of lenses, nothing fancy. Go out, take some pictures, come back home.

I put the SD camera on my computer and I use my own software to import it. The import process is not fancy, it just empties the SD card, checks every file for the EXIF information, uses the date and time to create the filename, a sequence number if needed, and puts them all in a single incoming directory where all the current unprocessed images are1.

Then I use this software I developed in PyQt5. It's very, very basic, but it's really quick, it's mostly keyboard based. It reads the EXIF information and present some of the tags at the left of the screen; things like date, time, size, orientation and then focal length, aperture, ISO and various other data I can get from the images. It's mostly focused on my current camera and the previous one, both Nikons2. The previous one was an N90, right now it's an N7200. The image occupies most of the window, and the program is always in full screen. At the bottom there's the filename and a couple of toggles.

I can do several things with this:

  • Go forwards, backwards, by one, by ten, by a hundred and by a thousand, because that incoming directory right now has almost seven years of history, probably ten thousand pictures.

  • Move randomly, which allows me to pick up a new thing to collate when I get bored with the current one but I want to keep doing it to reduce the backlog.

  • Mark the images in different ways. The main ones are about selecting for storing, with two modes: One is to keep the image in the original size. I usually use this for my best landscape or astro photos. The other one will resize it down to twelve megapixels3, from 6000x4000 pixels to 4500x3000 pixels, 75% on each dimension.

  • Rotate the images, just in case the camera did not guess the orientation correctly, usually when I'm taking pictures right upward or right downwards.
  • Select several pictures for stitching, which will use hugin to do so. It's not 100% automatic, but at least puts the pictures in a stitch directory and point hugin there.

  • Select a picture for cropping or editing; I'm not going to develop a whole image editor, so I just delegate to an existing program, gwenview.

  • Select images for deleting and delete them permanently.

  • Select several images for comparison and enter/exit comparison mode, which means that going backwards and forwards applies only this this set. This is good for things like when you take certain pictures, but there are not necessarily sequences in the original picture sequence, which for me makes culling images faster.

  • It has two zoom levels, fit to screen and full size. I don't have much the need for other options.
  • 99% of the pictures I take are freehand, so in a sequence there's always some movement between images. In full size I can put every image on its own position, aligning the whole sequence and allow culling based on blurriness or other factors.

  • Also in full size, I can lock the view, so when I pan one of the images and I switch to another one, it will also pan that second image to that position. It also helps when I'm checking for details between two different images of the same thing.

  • Move all the selected images, resize them if needed, and put them in a folder. It also creates a hardlink between my categorization in folders into a folder that collects all the images by date; there's one folder for each month and year with all the pictures of that month inside. It uses hardlinks so it doesn't duplicate the image file, saving space.

  • It also has a readonly mode, so I can hand the computer to my kids to watch the photos.

When culling, I use the comparison mode and individual position and lock view features a lot, going back and forth between images, discarding until only one is left.

That's the first part, the one I must spend my time on, just basic culling, selection and storage. My main tree is just a tree based on my way of categorizing the images.

My program doesn't have a directory view; instead, I just use gwenview again.

Notice there's no photo editing in this workflow. I rarely shoot in RAW for two reasons: a) I'm really bad at postprocessing; and b) even if I was good, I don't have the time to do it; my free time is shared among several hobbies. I only do it for astro photograpy and very few, rare occasions.

The third tool I use is digikam. I use it for two things, which are related: semi-automatic and manual tagging. The semi-automatic is face detection; digikam can find and guess faces, but requires manual confirmation4. The fully manual part is plain tagging, mostly with location5 and sometimes some other info. I sometimes also rate my pictures; I mostly use four and five, sometimes three, only for my best pictures.

Then there's another script that reads the digikam database and uses the tags to create another directory for the tags, which also uses hardlinks. It still doesn't do anything about the rating, but I could easily add that.

That's all on my personal computer. I use rsync to make a copy on my home server that has two purposes. One, it's a backup, which includes all the original 24Mpx images that I hadn't culled yet, which I think is the biggest part of my collection.

The second one, it feeds a gallery program that is developed in PHP by a guy named Karl. It's probably the single paid software I use. It's a single PHP file that you put at the root of your gallery, you enable PHP processing by your web server (in my case, Apache), and generates the gallery on the run, just reading the directories and creating all the necessary thumbnails and all that. I did a small change to this program. The original algorithm creates thumbnails based on each file's path (and other attributes, 4 or 5 I think), but because I have all these hard links, it creates duplicated thumbnail files. So I changed it to use the filename instead of the filepath6.

I don't have any kind of synchronization with my phone. Most of the pictures I take with it are not the kind of pictures I usually will put in my own gallery, except the times I go out without my camera and I end up taking pictures anyway. I still don't have a workflow for that, it's mostly manual. So if I ever lose my phone, I'm fscked because I have definitely no backups of it.

That lack of synchronization also means that the only way to see the pictures in my phone is by opening the gallery in the browser. It's not the best, but I don't do that that often. I have tried to use alternatives like NextCloud, which I also have installed on my home server. I have some issues with permissions because, again, this is a backup directory, so it has all the owner information that belongs to me, instead of the web server. That means it doesn't have the proper permissions to let NextCloud manage those files. Luckily files.gallery just needs a subdirectory.

Another reason is that before I was using static gallery generators: sigal, gallerpy or even nikola, which drives this glob. All those can generate the gallery statically, so serving them is so much easier. My old home server died at some point and I had to come up with something. I had a spare old laptop laying around and I used that. Now it's enough to generate the gallery on the fly. I have plans to make something bigger, but that's for another time.

  1. In fact I have another directory for all the unprocessed photos from another era, and I'm thinking of starting a new era. 

  2. Even if EXIV is a standard for storing tags, there's no standard for the tag names, so every manufacturer has its own sets, that even change between camera lines. For a better idea of what I'm talking about, just peruse Image::ExifTool's source code

  3. I currently own no screen that is 4500 pixels of width, let alone 6000. Maybe my kids will, but by then Mpx count will be so different that it won't make any sense to accomodate that. Right now storage for me is expensive, so I'll keep it this way. 

  4. Or rejection: the false positive rate is bigger that I would like, and it doesn't have a way to say 'yes, this is that person, but don't train on this image'. This is the case for pictures where the face is either semi occluded, sometimes painted, sometimes bad lightning, and mostly just blurry. 

  5. Most of my pictures don't have GPS info, not even the ones in the phone. The latter I only enable when I really need the info later, mostly for mapping. Later I either discard the photo or remove the info. 

  6. For a while now I'm even making this distinction in my own code, filename vs filepath. 

Categories: FLOSS Project Planets

Qt Creator 13.0.1 released

Planet KDE - Tue, 2024-05-07 06:59

We are happy to announce the release of Qt Creator 13.0.1!

Categories: FLOSS Project Planets

Why datasets built on public domain might not be enough for AI

Open Source Initiative - Tue, 2024-05-07 06:00

There is tension between copyright laws and large datasets suitable to train large language models. Common Corpus is a dataset that only uses text from copyright-expired sources to bypass the legal issues. It’s a useful achievement, paving the path to research without immediate risk of lawsuits. I also fear that this approach may lead to bad policies, reinforcing the power of copyright holders; not the small creators but large corporations. 

A dataset built on public domain sources

In March 2024 Common Corpus was released as an open access dataset for training large language models (LLMs). Announcing the release, the lead developer Pierre-Carl Langlais says “Common Corpus shows it is possible to train fully open LLMs on sources without copyright concerns.” The dataset contains 500 billion words in multiple European languages and different cultural heritages. It is a project coordinated by the French startup Pleias and supported by organizations committed to open science such as Occiglot, Eleuther AI and Nomic AI as well as being partly funded by the French government. The stated intention of Common Corpus is to democratize access to large quality datasets. It has many other positive characteristics, highlighted also by Open Future’s summary of a talk given by Langlais

The commons needs more data

The debates sparked by the Deep Dive: AI process on the role of training data highlighted that AI practitioners encounter many obstacles assembling datasets. At the same time, we discovered that tech giants have an incredible advantage over researchers and startups. They’ve been slurping data for decades, have the financial means to go to court and can enter into bilateral agreements to license data. These strategies are inaccessible to small competitors and academics. Accepting that the only path to creating open large datasets suitable to train Open Source AI systems is to use sources in the public domain, risks cementing the dominant positions of existing large corporations.

The open landscape already faces issues with big tech and their ability to influence legislation. The big corporations have lobbied to extend the duration of copyright, introduced the DMCA, are opposing the right to repair, and have the resources to continue lobbying and sue any new entrant who they deem to get too close. There are plenty of examples showing an unequal advantage in protecting what they think is theirs. The non-profit Fairly Trained certifies companies “willing to prove that they’ve trained their AI models on data that they own, have licensed, or that is in the public domain,” respecting copyright law: who’s going to benefit from this approach?

Unsuitable for public policies

Initiatives like Common Corpus and The Stack (used to train Starcoder2) are important achievements as they allow researchers to develop new AI systems while mitigating the risk of being sued. They also push the technical boundaries of what can be achieved with smaller datasets that don’t require a nuclear power plant to train new models. But I think they mask the underlying issue: AI needs data and limiting open datasets to only public domain sources will never give them a chance to match the size of the proprietary ones. The lobby for copyright maximalists is always looking for ways to expand scope and extend terms for copyright laws, and when they succeed it is a one-way ratchet. It would be a tragedy for society if legislators listened to their sophistry and made new laws doing this based on the apparent consensus that creators need protection from AI.
The role of data for training machine learning systems is a divisive topic and a complex one. Having datasets like Common Corpus is a very useful way for the science of AI to progress with better sources. For policies, we’d be better off pushing for something like the proposal advanced by Open Future and Creative Commons in their paper Towards a Books Data Commons for AI Training.

Categories: FLOSS Research

MBition becomes a KDE patron

Planet KDE - Tue, 2024-05-07 05:30

MBition supports the work of the KDE community with its generous sponsorship.

MBition designs and implements the infotainment system for future generations of Mercedes-Benz cars and relies on KDE's technology and know-how for its products.

"After multiple years of collaboration across domains, we feel that becoming a patron of KDE e.V is the next step in deepening our partnership and furthering our open-source strategy" says Marcus Mennemeier, Chief of Technology at MBition.

"We are delighted to welcome MBition as a Patron," says Lydia Pintscher, Vice President of KDE e.V. "MBition has been contributing to KDE software and the stack we build on it for some time now. This is a great step to bring us even closer together and support the KDE community, and further demonstrates the robustness and hardware readiness of KDE's software products."

MBition joins KDE e.V.'s other patrons: Blue Systems, Canonical, g10 Code, Google, Kubuntu Focus, Slimbook, SUSE, The Qt Company and TUXEDO Computers, who support free open source software and KDE development through KDE e.V.

Categories: FLOSS Project Planets

Robin Wilson: Simple segmentation of geospatial images

Planet Python - Tue, 2024-05-07 05:30

I had a need to do some segmentation of some satellite imagery the other day, for a client. Years ago I was quite experienced at doing segmentation and classification using eCognition but that was using the university’s license, and I don’t have a license myself (and they’re very expensive). So, I wanted a free solution.

There are various segmentation tools in the scikit-image library, but I’ve often struggled using them on satellite or aerial imagery – the algorithms seem better suited to images with a clear foreground and background.

Luckily, I remembered RSGISLib – a very comprehensive library of remote sensing and GIS functions. I last used it many years ago, when most of the documentation was for using it from C++, and installation was a pain. I’m very pleased to say that installation is nice and easy now, and all of the examples are in Python.

So, doing segmentation – using an algorithm specifically designed for segmenting satellite/aerial images – is actually really easy now. Here’s how:

First, install RSGISLib. By far the easiest way is to use conda, but there is further documentation on other installation methods, and there are Docker containers available.

conda install -c conda-forge rsgislib

Then it’s a simple matter of calling the relevant function from Python. The documentation shows the segmentation functions available, and the one you’re most likely to want to use is the Shepherd segmentation algorithm, which is described in this paper). So, to call it, run something like this:

from rsgislib.segmentation.shepherdseg import run_shepherd_segmentation run_shepherd_segmentation(input_image, output_seg_image, gdalformat=’GTiff’, calc_stats=False, num_clusters=20, min_n_pxls=300)

The parameters are fairly self-explanatory – it will take the input_image filename (any GDAL-supported format will work), produce an output in output_seg_image filename in the gdalformat given. The calc_stats parameter is important if you’re using a format like GeoTIFF, or any format that doesn’t support a Raster Attribute Table (these are mostly supported by somewhat more unusual formats like KEA). You’ll need to set it to False if your format doesn’t support RATs – and I found that if I forgot to set it to false then the script crashed when trying to write stats.

The final two parameters control how the segmentation algorithm itself works. I’ll leave you to read the paper to find out the details, but the names are fairly self-explanatory.

The output of the algorithm will look something like this:

It’s a raster where the value of all the pixels in the first segment are 1, the pixels in the second segment are 2, and so on. The image above uses a greyscale ‘black to white’ colormap, so as the values of the segments increase towards the bottom of the image, they show as more white.

You can convert this raster output to a set of vector polygons, one for each segment, by using any standard raster to vector ‘polygonize’ algorithm. The easiest is probably using GDAL, by running a command like:

gdal_polygonize.py SegRaster.tif SegVector.gpkg

This will give you a result that looks like the red lines on this image:

So, there’s a simple way of doing satellite image segmentation in Python. I hope it was useful.

Categories: FLOSS Project Planets

Python Bytes: #382 A Simple Game

Planet Python - Tue, 2024-05-07 04:00
<strong>Topics covered in this episode:</strong><br> <ul> <li><a href="https://github.com/nektos/act"><strong>act: Run your GitHub Actions locally!</strong> </a></li> <li><a href="https://portr.dev">portr</a></li> <li><a href="https://rednafi.com/python/annotate_args_and_kwargs/"><strong>Annotating args and kwargs in Python</strong></a></li> <li><a href="https://github.com/Envoy-VC/awesome-badges">github badges</a></li> <li><strong>Extras</strong></li> <li><strong>Joke</strong></li> </ul><a href='https://www.youtube.com/watch?v=v3x4WqEwamg' style='font-weight: bold;'data-umami-event="Livestream-Past" data-umami-event-episode="382">Watch on YouTube</a><br> <p><strong>About the show</strong></p> <p>Sponsored by ScoutAPM: <a href="https://pythonbytes.fm/scout">pythonbytes.fm/scout</a></p> <p><strong>Connect with the hosts</strong></p> <ul> <li>Michael: <a href="https://fosstodon.org/@mkennedy"><strong>@mkennedy@fosstodon.org</strong></a></li> <li>Brian: <a href="https://fosstodon.org/@brianokken"><strong>@brianokken@fosstodon.org</strong></a></li> <li>Show: <a href="https://fosstodon.org/@pythonbytes"><strong>@pythonbytes@fosstodon.org</strong></a></li> </ul> <p>Join us on YouTube at <a href="https://pythonbytes.fm/stream/live"><strong>pythonbytes.fm/live</strong></a> to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too.</p> <p>Finally, if you want an artisanal, hand-crafted digest of every week of </p> <p>the show notes in email form? Add your name and email to <a href="https://pythonbytes.fm/friends-of-the-show">our friends of the show list</a>, we'll never share it.</p> <p><strong>Brian #1:</strong> <a href="https://github.com/nektos/act"><strong>act: Run your GitHub Actions locally!</strong> </a></p> <ul> <li>Why? <ul> <li>“Fast Feedback - Rather than having to commit/push every time you want to test out the changes you are making to your .github/workflows/ files (or for any changes to embedded GitHub actions), you can use act to run the actions locally. The environment variables and filesystem are all configured to match what GitHub provides.”</li> <li>“Local Task Runner - I love make. However, I also hate repeating myself. With act, you can use the GitHub Actions defined in your .github/workflows/ to replace your Makefile!”</li> </ul></li> <li>Docs: <a href="https://nektosact.com/introduction.html">nektosact.com</a></li> <li>Uses Docker to run containers for each action.</li> </ul> <p><strong>Michael #2:</strong> <a href="https://portr.dev">portr</a></p> <ul> <li>Open source ngrok alternative designed for teams</li> <li>Expose local http, tcp or websocket connections to the public internet</li> <li>Warning: Portr is currently in beta. Expect bugs and anticipate breaking changes.</li> <li><a href="https://portr.dev/server/">Server setup</a> (docker basically).</li> </ul> <p><strong>Brian #3:</strong> <a href="https://rednafi.com/python/annotate_args_and_kwargs/"><strong>Annotating args and kwargs in Python</strong></a></p> <ul> <li>Redowan Delowar</li> <li>I don’t think I’ve ever tried, but this is a fun rabbit hole.</li> <li>Leveraging bits of PEP-5891, PEP-6462, PEP-6553, and PEP-6924.</li> <li><p>Punchline:</p> <pre><code>from typing import TypedDict, Unpack *# Python 3.12+* *# from typing_extensions import TypedDict, Unpack # &lt; Python 3.12* class Kw(TypedDict): key1: int key2: bool def foo(*args: Unpack[tuple[int, str]], **kwargs: Unpack[Kw]) -&gt; None: ... </code></pre></li> <li><p>A recent pic from Redowan’s blog: </p> <ul> <li><a href="https://rednafi.com/python/typeguard_vs_typeis/">TypeIs does what I thought TypeGuard would do in Python</a></li> </ul></li> </ul> <p><strong>Michael #4:</strong> <a href="https://github.com/Envoy-VC/awesome-badges">github badges</a></p> <ul> <li><img src="https://paper.dropboxstatic.com/static/img/ace/emoji/1f60e.png?version=8.0.0" alt="smiling face with sunglasses" /> A curated list of GitHub badges for your next project</li> </ul> <p><strong>Extras</strong> </p> <p>Brian:</p> <ul> <li><a href="https://www.bleepingcomputer.com/news/security/fake-job-interviews-target-developers-with-new-python-backdoor/">Fake job interviews target developers with new Python backdoor</a></li> <li>Later this week, <a href="https://courses.pythontest.com">course.pythontest.com</a> will shift from Teachable to Podia <ul> <li>Same great content. Just a different backend.</li> <li>To celebrate, get 25% off at <a href="https://pythontest.podia.com">pythontest.podia.com</a> now through this Sunday using coupon code PYTEST</li> </ul></li> <li><a href="https://podcast.pythontest.com/episodes/220-juggling-pycon">Getting the most out of PyCon, including juggling - Rob Ludwick</a> <ul> <li>Latest PythonTest episode, also cross posted to <a href="https://pythonpeople.fm">pythonpeople.fm</a></li> </ul></li> <li><a href="https://hci.social/@orion/112167137880858495">3D visualization of dom</a></li> </ul> <p>Michael:</p> <ul> <li><a href="https://djangonaut.space/comms/2024/04/25/2024-opening-session-2/">Djangonauts Space Session 2 Applications</a> Open! More background at <a href="https://talkpython.fm/episodes/show/451/djangonauts-ready-for-blast-off">Djangonauts, Ready for Blast-Off</a> on Talk Python.</li> <li><a href="https://djangochat.com/episodes/michael-kennedy">Self-Hosted Open Source - Michael Kennedy</a> on Django Chat</li> </ul> <p><strong>Joke:</strong> <a href="https://www.reddit.com/r/programminghumor/comments/1ceo0ds/just_a_silly_little_game/">silly games</a></p> <p>Closing song: <a href="https://www.youtube.com/watch?v=pGbodliLFVE">Permission Granted</a></p>
Categories: FLOSS Project Planets

The Drop Times: Tim Hestenes Lehnen Delves into 'Fog & Fireflies': A Journey of Magic and Metaphor

Planet Drupal - Tue, 2024-05-07 03:45
Discover the enchanting world of 'Fog & Fireflies' as Tim Hestenes Lehnen shares the story behind his latest fantasy novel in an exclusive interview with Kazima Abbas. Uncover the inspiration, themes, and secrets that await within the pages of this captivating tale.
Categories: FLOSS Project Planets

The Drop Times: Introducing Drupal Starshot and Charting a New Course for the Future

Planet Drupal - Tue, 2024-05-07 03:30
Discover the highlights from DrupalCon Portland 2024, where Dries Buytaert presents the latest innovations including Drupal Starshot. Explore the significant strides toward enhancing usability, inclusivity, and the global impact of Drupal on maintaining an open, accessible web. Join the movement shaping the future of digital experiences.
Categories: FLOSS Project Planets

Specbee: Using Drupal 10’s Asset Library to Streamline Asset Handling

Planet Drupal - Tue, 2024-05-07 02:07
Drupal 7 lacked a streamlined mechanism for handling assets, which necessitated the development of more efficient solutions like the Asset Library introduced in Drupal 8 and the latest versions. Asset library solves the problem of loading JS and CSS files on every page. However, unless specified, Drupal does not load these assets as it can affect front-end performance. Let’s learn more about asset libraries in Drupal 10 and how to work with them. What is an Asset Library in Drupal An Asset library in Drupal is nothing but a YAML data structured inside a THEMENAME.libraries.yml file and they contain only CSS and JS files. They are the bundles of CSS and JavaScript files that present inside a module or theme and perform together for style and functionality. The Asset Library in Drupal provides a centralized and organized repository for managing various types of digital assets. Assets Library boasts various features designed to enhance usability, scalability, and flexibility. Asset Library in Drupal is designed to support responsive web design, ensuring that assets are displayed consistently on various devices. Drupal places a strong emphasis on accessibility, and the Asset Library follows these standards to ensure a positive user experience for all. Drupal's Asset Library includes version control features, allowing users to manage and track changes to assets over time. Performance Optimization Define an Asset Library Let’s declare a new Asset library named custom-slider. custom-slider:   version: 1.0   CSS:     theme:       css/custom-slider-theme.css: {}   js:     js/custom-slider.js: {}Some of the attributes used include: Minified: If the file is already minified, set this to True to avoid minifying it again, else default value is False. Preprocess: Default value is True, set to False to exclude a file from Aggregation. Type (Javascript Only):        ◦ The default value is a file, if you leave it blank.       ◦ For external files, use type as external like: //cdn.com/js/example.js: {type: external}Assets Loading Order By default, all JS files are loaded in the order in which files are listed. By default, JS files are loaded in the footer. Set header: true for a library to get loaded in the header. For example: jquery.ui:   header: true   js:     assets/vendor/jquery.ui/ui/core-min.js: {}SMACSS Categorization Drupal follows a SMACSS-style categorization and all CSS files are loaded first based on their category and then by the order. SMACSS categorization is used to set the weight of CSS files, this will not work for JS files. To set CSS weights there are 5 different levels:       ◦ base – This rule consists of styling HTML elements only. CSS_BASE = -200       ◦ layout – Macro management of page or arrangements of elements on the page, including any grid system. CSS_LAYOUT = -100       ◦ component – Components are reusable and discrete UI elements. CSS_COMPONENT = 0       ◦ state – Styles that deal mostly with client-side changes such as hovering links, opening modal dialog, etc. CSS_STATE = 100       ◦ theme – This is purely visual styling such as box-shadow, backgrounds, borders, colors, etc. CSS_THEME = 200 Attach an Asset Library 1. Globally:  We can attach the asset library globally via the THEMENAME.info.yml file, but this approach would work only for a Theme. For any modules you should use hook_page_attachments_alter() or similar. For example: name: 'My Custom Theme' type: theme description: 'A custom Drupal 9 theme for demonstration purposes.' package: Custom core_version_requirement: ^8 || ^9 || ^10 base theme: false libraries:   - THEMENAME/global-styling   - THEMENAME/global-scripts2. Conditionally, via a preprocess function using #attached:  If you need to restrict the library to a particular page or element, then this is the best way to add libraries. For example:Taking a case where we need to attach a library to our page, then we can use hook_page_attachments_alter(): /** * Implements hook_page_attachments_alter(). */ function custom_module_page_attachments_alter(array &$attachments) {   // Adding stylesheet to the page.   $attachments['#attached']['library'][] = 'custom_module/custom-styles';     // Add a custom JavaScript file to the page.   $attachments['#attached']['library'][] = 'custom_module/custom-scripts';   }Or hook_preprocess_page(): /** * Implements hook_preprocess_page(). */ function custom_module_preprocess_page(&$variables) {   // Adding stylesheet to the page.   $attachments['#attached']['library'][] = 'custom_module/custom-styles'; }Similarly, with different preprocess functions we can attach a library using the #attached render array property like: /** * Implements hook_page_attachments_alter */ function custom_module_attachments_alter(array &$page) {   // Get the current path.   $path = $current_path = \Drupal::service('path.current')->getPath();   // If we're on the node listing page, add our retro library.   if ($path == '/node') {     $page['#attached']['library'][] = 'custom_module/custom-styles';   } }3. Inside a Twig template file: Use attach_library() in twig template. {# Attach a CSS library #} {% attach_library('my_theme/global-styling') %} {# Attach a JavaScript library #} {% attach_library('my_theme/global-scripts') %}Final Thoughts Assets Library in Drupal (versions 8 and above) has a profound impact on web development. It centralizes the management of CSS and JavaScript files within modules or themes, ensuring consistency and ease of maintenance across a website or application. By bundling these assets together, developers can efficiently control the presentation and functionality of their digital creations. If you’re looking to implement fantastic features of Drupal like this one in your next project, we have a team of Drupal experts who can help you. We’d love to talk!
Categories: FLOSS Project Planets

Capellic: Frontend performance optimization for Drupal websites: Part 3

Planet Drupal - Tue, 2024-05-07 00:00
This is part 3 of a series of articles that defines our approach to frontend performance optimization. In this part we challenge the prevailing wisdom of the monolithic CSS file.
Categories: FLOSS Project Planets

Krita Monthly Update – Edition 15

Planet KDE - Mon, 2024-05-06 20:00

It is time for the monthly news update brought to you by the Krita-promo team. Let us take a look at the highlights of krita community and development for this month.

Development report
  • Our users on chromebooks faced a nasty bug which crashed krita on startup. So we made a 5.2.2.1 hotfix release for Android Play Store only to fix this bug. It also contains other fixes from the stable branch, but be warned there is a known crash regression with importing audio.

  • A proper 5.2.3 release for all supported platforms will be made as soon as possible, hopefully in the next few weeks.

  • At the time of writing, nightly builds for macOS are still blocked by a signing-related issue. Once that is resolved, automated builds for all supported platforms will be up and running again. That is the culmination of months of work by lead developer Dmitry Kazakov, together with macOS developer Iván Yossi, Android developer Sharaf Zaman, Windows contributor Simon Ra, and others, in a refactor of Krita’s build system.

  • Feature Request: Palette in Toolbar has been marked “solved” by freyalupen’s most recently merged code. Add docker box toolbar widget allows the user to add any docker to the toolbar in a temporary popup widget similar to the “choose brush preset” one in Painter’s Tools.

  • A problem with certain RBGA brushes has been solved and will be part of the next release. Users were experiencing lagging and freezing when accessing these brushes. The thread makes an interesting read as it’s a “live” look at an issue being revealed and it shows how helpful it is when users conduct testing. You can read the thread here.

  • Ken_Lo has been accepted as a student for Google Summer of Code, to work on pixel perfect hand-drawn lines.

  • In addition to various recorder related fixes by @freyalupen, the FFmpeg profiles in the recorder docker are improved by @Ralek. We congratulate @Ralek on their first contribution to Krita.

  • When entering canvas-only mode, the document used to jump abruptly and reposition itself. @YRH helped in solving this issue

  • Deif_Lou has improved performance of the fill tool making it faster.

  • Ken Lo added an option in the settings to pick default export file type.

  • Grum999 has looked into improving Krita’s API for python plugins and as a start, chose to implement a scratchpad API that adds functionality to the scratchpad.

  • Emir Sari sent patches to help Krita build on Haiku OS.

Community report March 2024 Monthly Art Challenge

The April Monthly Art Challenge, Animal Curiosity, inspired submissions from 26 artists. @jimplex was voted the winner with this creative piece: Firefly by jimplex

The theme for the May 2024 challenge is “reflection.” You can get all the details here. We already have some ideas and pre-work flying around in the discussion and WIP thread. Have a look – something might inspire your creativity.

Featured artwork

Krita-Artists members nominated 9 images for the featured artwork banner. When the mid-month poll ended, these are the 5 that won a place on the banner. All 5 will be entered into the Best of Krita-Artists 2024 competition next January.

Cabin in the woods-RH by Rohit Hela

Detailed Portrait by denjay5

Nier Automata by IvanGilbertt

Alien Senator by DavB

My uni project by smollbirb

Nominations for the April/May poll are open until May 11, 2024.

Noteworthy plugin

Blender-Krita link plugin for texture editing by heisenshark

This plugin has a fresh update that the author describes as a “big overhaul of how the plugin works.” Check out the thread on Krita-artists.org here.

Tutorial of the month

Krita’s newest tutorial by Ramon Miranda features an interview with Rakurri, the creator of Rakurri’s brush pack containing more than 200 brushes made just for Krita. Ramon demonstrates his favorite ones such Glow FX, Liquid Bristle and the vegetation brushes.

Notable changes in code

This section has been compiled by freyalupen.. Apr 3 - May 2, 2024

Stable branch (5.2.2+):

Bugfixes:

Nightly build regression bugfixes:

  • [Layer Stack] Fix wrong layer being active on opening document. In the case of single-layer documents, no layer was active, which caused crashes under some circumstances. (BUG:480718) (merge request, Dmitry Kazakov)
Unstable branch (5.3.0-prealpha):

Features:

  • Toolbars, Shortcuts Add Docker Box action that shows a docker in a temporary box, which can be added to a toolbar or assigned to shortcut. (merge request, Freya Lupen)
  • Canvas Input Shortcuts Add new Tool Invocation action, "Activate with Other Color". This can be bound to a key+mousebutton, where holding those keys will cause, for instance, the Freehand Brush to paint with the background instead of foreground color. (merge request, ziplantil ..)

Bugfixes:

Nightly build regression bugfixes:

These changes are made available for testing in the following Nightly builds:

Like what we are doing? Help support us

Krita is a free and open source project. Please consider supporting the project with donations or by buying training videos or the artbook! With your support, we can keep the core team working on Krita full-time.

Donate Buy something
Categories: FLOSS Project Planets

Horizontal Digital Blog: Try this one weird trick with the Migrate API

Planet Drupal - Mon, 2024-05-06 18:00
One of the key concepts of the Drupal Migrate API is the so-called process pipeline, in which we pass a value that is transformed by a series of process plugins. From time to time we find ourselves in the middle of a process pipeline wishing we could easily reference the current value in the process pipeline. I even created an issue on Drupal.org asking for this feature. As it turns out, the feature already exists! That is, as long as you know this one weird trick...
Categories: FLOSS Project Planets

Talking Drupal: Talking Drupal #449 - Agile Methodologies

Planet Drupal - Mon, 2024-05-06 14:00

Today we are talking about Agile Methodologies, How to pick the best one, and why they matter with guest Chris Wells. We’ll also cover CKEditor Text Transformation / AutoCorrect as our module of the week.

For show notes visit: www.talkingDrupal.com/449

Topics
  • Drupal FL Camp talk
  • Fundamentals of Agile
  • How do you square long term planning
  • What is Redfin Solutions's preferred methodology
  • What is Crystal Agile Methodology
  • Do other methodologies have web specific versions
  • Would you agree that large companies can use different agile methodologies
  • Have you ever used Scrumban
  • Listener Question: Shivan xamount:: Story points are usually equated to fibonacci numbers. These are not supposed to correlate to hours, what do you think about that?
Resources Guests

Chris Wells - chrisfromredfin.dev chrisfromredfin

Hosts

Nic Laflin - nLighteneddevelopment.com nicxvan John Picozzi - epam.com johnpicozzi Matthew Grasmick - grasmash

MOTW Correspondent

Martin Anderson-Clutz - mandclu

  • Brief description:
    • Have you ever wanted CKEditor to autocorrect symbols like the copyright mark, the “not equals” sign, and fractions, from their text equivalents? There’s a module for that
  • Module name/project name:
  • Brief history
    • How old: created in Mar 2024 by Gedvan Dias of Redfin Solutions
    • Versions available: 1.0.0-alpha1, which works with CKeditor 4 on Drupal 8, and 2.0.0-alpha1, which works with CKEditor 5 on Drupal 9 and 10
  • Maintainership
    • Actively maintained, was released just a few weeks ago
    • Not much documentation of its own, but the module leverages CKEditor’s Automatic text transformation, which has a fair bit of documentation on CKEditor.com
    • Number of open issues: only 1 open issues, which is the Project Update Bot’s automatically-created Drupal 11 compatibility issue
  • Usage stats:
    • 8 sites
  • Module features and usage
    • By default the module enables four categories of transformations: 'symbols', 'mathematical', 'typography', and 'quotes'
    • You can override the module’s plugin if you want a different set enabled, but the module also provides a hook you can use to alter the active sets or define custom transformations, similar to using emojis in Slack, for example
Categories: FLOSS Project Planets

Drupal Association blog: Drupal lead Dries Buytaert announces a completely new Drupal CMS 23 years after its creation

Planet Drupal - Mon, 2024-05-06 12:52

PORTLAND, Ore., 6 May 2024—Twenty-three years after creating Drupal as a university student and hundreds of thousands of websites later, Dries Buytaert announced today that a new version of Drupal will launch at the end of 2024. Drupal is an Open Source CMS that is foundational to a great digital experience platform. Its reliable, highly secure, and flexible tools build the versatile, structured content needed to create dynamic web experiences.

This new version of Drupal will incorporate the best of the 50,000+ modules created over the past decade into a curated, out-of-the-box experience for organizations wishing to build powerful websites quickly.

“We built this amazing platform to power the most robust digital experiences. And now we will make it more accessible to non-developers,” said Dries. “Drupal Starshot is an initiative that will deliver this new version of Drupal within eight months.”

“The Drupal Association is excited to support the Drupal Starshot initiative and to begin marketing the new version of Drupal as the first, best stop for those interested in understanding what Drupal can do,” said Owen Lansbury, Chair of the Drupal Association’s board of directors.

On 6 May, founder and project lead Dries Buytaert gave an inspiring keynote—also known as the Driesnote—introducing this completely new version of Drupal: Drupal Starshot.

Dries described how, much like the race to space in the 1960s, Drupal is also in a race. The web is moving forward, with or without Drupal. Drupal has a long history of being a leader in the Open Web, but it needs its “Moonshot” moment. Dries reiterated how the future of Drupal’s success will come from broadening its usability to a wider audience. The way to do this, Dries said, is to open Drupal’s powerful tools to non-developers.

What is Drupal Starshot, and how does it differ from the traditional version of Drupal? Drupal Starshot will leverage Drupal Core but have a different governance model to move fast, allowing for more innovation more quickly. 

After Drupal Starshot is introduced, when someone visits the Drupal.org download page, both traditional Drupal Core and Drupal Starshot will be available (under a different name, still to be determined). When Drupal Starshot is selected, it will automatically download the features that the user wants for their use case, making it easier for new users to try and test out Drupal, all from right in their browser. Drupal Core will still be the fundamental building block of Drupal Starshot and can still be used independently from Drupal Starshot for custom builds.

How will this new Drupal be different?

The Drupal that exists today, known as “Drupal Core,” will continue to exist and will be maintained by core maintainers. The Drupal Starshot initiative will introduce a new version of Drupal with a fully featured out-of-box experience.

Features that Drupal Starshot will include are:

  • Next generation page builder
  • Project Browser + Recipes
  • Automatic updates
  • Key contributed modules
  • Easy configuration
  • Default content
  • And possibly more!

Drupal community members who are interested in contributing to the development of Drupal Starshot can submit their interest via this interest form or join Dries at several Birds of a Feather sessions happening during DrupalCon Portland.

Watch the full Driesnote on the Drupal Association YouTube Channel. 

About DrupalCon

This year, DrupalCon North America is a four-day conference held in Portland, Oregon, from 6-9 May. Over 1,300 professionals and Drupal users collaborate on the project for a week. The Drupal Association is a non-profit organization that caters to the needs of Drupal and its worldwide community. It focuses on the growth of the Drupal community and supports the project’s vision to create a safe, secure, and Open Web for everyone. 

About Drupal and the Drupal Association

Drupal is a powerful open-source content management system for everyone, from small nonprofits to enterprises. It is used by millions of people and organizations worldwide, made possible by a community of 100,000-plus contributors and enabling more than 1.3 million users on Drupal.org. The Drupal Association is a non-profit organization dedicated to accelerating the Drupal software project, fostering the community, and supporting its growth.

Categories: FLOSS Project Planets

Open Source AI Definition – Weekly update May 6

Open Source Initiative - Mon, 2024-05-06 12:02
Definition validation: Seeking volunteers

The process has entered a new phase: We are now seeking volunteers to validate the Open Source AI Definition, using it to review existing AI systems. The objective of the phase is to confirm that the Definition works as intended and understand where it fails.  

  • A spreadsheet is given where you locate and link to the license, research paper, or other document that grants rights or provides information for each required component. 
  • Systems include, but are not limited to:
    • Arctic
    • BLOOM
    • Falcon
    • Grok
    • Llama 2
    • Mistral
    • OLMo
    • OpenCV
    • Phi-2
    • Pythia
    • T5
  • To volunteer by May 20th, please contact Mer on the forum
Summary of comments received on the Definition draft
  • Grammatical and wording corrections 
    • Some minor grammatical suggestions were made. These change and order the layout slightly differently, though the overall message remains. 
    • One user suggested to explain what Open Source is under the “preamble” and “Why we need open source AI”. Instead of speaking about why Open Source is important, the section should rather be an introduction to what it is and why it matters for AI.
    • Under “Preferred form to make modifications to machine-learning systems” and “data information”, clarification is needed regarding “the training data set used”. It is not clear whether this means that all training data must be open source for the whole model to be.
      • Stefano Maffulli added here that the intention is to know what dataset was used, not to necessarily have it made available, and that it indeed seems to need clarification
  • Technical points
    • Under “Preferred form to make modifications to machine-learning systems” the release of checkpoints is mentioned as an example of required components, under “model parameters”. An objection was raised, arguing that this poses an unnecessary burden: It’d be like requiring that for software to be Open Source, it should include past versions of the program.
      • Maffulli reiterated that this was merely an example but that this might need to be a submission to the FAQ page
    • “Preferred form to make modifications to machine-learning systems” and “data information”, a “skilled person” is mentioned in the context of requiring sufficient information about the training data used to create a model. Question regarding why skill has to do with acquiring data
      • Clarification was given by Maffulli, pointing out that this is in the context of getting information about the data so that a “skilled person” can use, study, share and modify the AI system.
      • A user suggested that this confusion can be solved by changing the context of the wording “a skilled person can recreate”. From “using the same or similar data” to “if able to gain access to the same or similar data”.
      • A user points out that “skilled person” as a legal term used in patent law might not be appropriate as it has different legal connotations and precedence in different countries.
  • Discussion on why specifically we focus on machine learning (ML) as an AI system
    • A question was raised regarding why we explicitly mention ML systems under “preferred form to make modification to an ML system” and subsequently the “checklist”, pointing out that not all AI systems are ML.
      • Maffulli replied that we address ML as they need special and urgent attention as rule-based AI systems can fit under the open source definition. This needs to be addressed in the FAQ
Town hall announcement 
  • The 9th town hall meeting was held on the 3d of May. Access the recording here if you missed it!
Categories: FLOSS Research

Drupalize.Me: DrupalCon Portland 2024: Issue Queue Initiatives

Planet Drupal - Mon, 2024-05-06 12:00
DrupalCon Portland 2024: Issue Queue Initiatives

This Wednesday, May 8, I'm speaking at DrupalCon Portland 2024 as part of the Drupal Project Initiatives Keynote. The keynote is kicking off Contribution Day on Wednesday first thing in the morning. I'll be highlighting initiatives and programs that are helping people contribute in a strategic way, and as a result, increasing throughput in the core issue queue. Throughput is the rate that a project’s issues are resolved and committed. And it’s one way to gauge the health of an open source project like Drupal.

Check out these resources to learn more about the initiative and programs I highlight in this presentation.

Amber Matz Mon, 05/06/2024 - 11:00
Categories: FLOSS Project Planets

Thomas Lange: Removing tens of thousands of web pages

Planet Debian - Mon, 2024-05-06 10:58

In January I've removed tens of thousands of web pages on www.debian.org. Have you noticed it?

In the past

From 1997 onwards, we had web pages for security announcements. We had to manually prepare a .data and a .wml file which then generated a web page for each security announcement (DSA or DLA). We have listed the 6 most recent messages in a short list that was created from these files. Most of the work that went into the Debian web pages was creating these files.

Our search engine often listed the pages with security announcements instead of a more relevant web page for a particular topic.

Preparation

At DebConf Kosovo (2022) I started with a proof of concept and wrote a script, that generates this list without using the .data/.wml files in the Git repository, but instead reading the primary sources of security information[1]. This new list now includes links to the security tracker and the email of the announcement.

Following web pages and scripts were also using these .data and .wml files:

  • OVAL files
  • RSS feeds for security announcements (and LTS)
  • Apache config file for mapping URLs from dsa-NNN to YEAR/dsa-NNN
  • A huge list of crossreferences between DSA and CVE numbers

Before I could remove all the security web pages, I had to adjust the scripts, that create the above information.

When I looked at the OVAL files and the apache logs of our web server, I saw that more than 99% of the web traffic was generated by these XML files (134TB of 135TB total in two weeks). They were not compressed and were around 50MB in size. With the help of Carsten Schönert we managed to modify the python scripts that generate this OVAL file without using the .data/.wml files and now we only provide bzip2 compressed XML files[2].

The RSS feeds are created by the new Perl script which reads the DSA/DLA list the security tracker and determines the URL of the email of all entries. This script also generates the list of the most recent DSA/DLA entries. Currently we show the last 350 entries which covers more than the last year and includes links to the announcement email and the security tracker.

The huge list of crossreferences is not needed any more, since the mapping of CVE to DSA is already included in the DSA list[3] of the security tracker.

The amount of translations of the DSA/DLA was very different. French translations were almost all done, but all other languages did translations for a couple of months or years only. E.g. in 2022, Italian had 2 translations, Russian 15, Danish 212, French and English each 279. But from 2023 on only French translations were made. By generating the list of DSA/DLA we lost the ability to translate these web pages, but since these announcements are made of simple, identical sentences it is easy to use an automatic translation service if needed.

Now the translation statistics of all web pages are more accurate. Instead of 12200 pages that need to be translated (including all these old DSA/DLA) there are now only 2500 pages to translate[4]. Languages that had a lot of old translations of DSA/DLA lost some percentage but languages that are doing translations of newer web pages won in the statistics of how many pages are translated. Examples:

Before German (de) 3501 28.5% Italian (it) 1005 8.2% Danish (da) 6336 51.7% After German (de) 1486 59.0% Italian (it) 909 36.1% Danish (da) 982 39.0% Cleanup of all the security web pages

Finally in January, I could remove all web pages of the security announcements in one git commit[5]. Using several git rm -rf commands this commit removed 54335 files, including around 9650 DSA/DLA data files, 44189 wml files, nearly 500 Makefiles.

Outcome

No more manual work is needed for the security team and we now have direct links from a DSA-NNN/DLA-NNN to the email in our mailing list archive. This was not possible before. The search results became more accurate.

But we still host a lot of other old content on the Debian web pages which may be removed in the future.

[1] https://www.debian.org/security/#infos

[2] https://www.debian.org/security/oval/

[3] https://salsa.debian.org/security-tracker-team/security-tracker/-/raw/master/data/DSA/list

[4] https://www.debian.org/devel/website/stats

[5] https://salsa.debian.org/webmaster-team/webwml/-/commit/2aa73ff15bfc4eb2afd85c

Categories: FLOSS Project Planets

Pages