Feeds

Hugo van Kemenade: A surprising thing about PyPI's BigQuery data

Planet Python - Sun, 2024-11-24 15:45

You can get download numbers for PyPI packages (or projects) from a Google BigQuery dataset. You need a Google account and credentials, and Google gives 1 TiB of free quota per month.

Each month, I have automation to fetch the download numbers for the 8,000 most popular packages over the past 30 days, and make it available as more accessible JSON and CSV files at Top PyPI Packages. This data is widely used for research in academia and industry.

However, as more packages and releases are uploaded to PyPI, and there are more and more downloads logged, the amount of billed data increases too.

This chart shows the amount of data billed per month.

At first, I was only collecting downloads data for 4,000 packages, and it was fetched for two queries: downloads over 365 days and over 30 days. But as time passed, it started using up too much quota to download data for 365 days.

So I ditched the 365-day data, and increased the 30-day data from 4,000 to 5,000 packages. Later, I checked how much quota was being used and increased from 5,000 packages to 8,000 packages.

But then I exceeded the BigQuery monthly quota of 1 TiB fetching data for July 2024.

To fetch the missing data and investigate what's going in, I started Google Cloud's 90-day, $300 (€277.46) free-trial 💸

Here's what I found!

Finding: it costs more to get data for downloads from only pip than from all installers

I use the pypinfo client to help query BigQuery. By default, it only fetches downloads for pip.

Only pip

This command gets one day's download data for the top 10 packages, for pip only:

$ pypinfo --limit 10 --days 1 "" project Served from cache: False Data processed: 58.21 GiB Data billed: 58.21 GiB Estimated cost: $0.29

Results:

project download count boto3 37,251,744 aiobotocore 16,252,824 urllib3 16,243,278 botocore 15,687,125 requests 13,271,314 s3fs 12,865,055 s3transfer 12,014,278 fsspec 11,982,305 charset-normalizer 11,684,740 certifi 11,639,584 Total 158,892,247 All installers

Adding the --all flag gets one day's download data for the top 10 packages, for all installers:

$ pypinfo --all --limit 10 --days 1 "" project Served from cache: False Data processed: 46.63 GiB Data billed: 46.63 GiB Estimated cost: $0.23 project download count boto3 39,495,624 botocore 17,281,187 urllib3 17,225,121 aiobotocore 16,430,826 requests 14,287,965 s3fs 12,958,516 charset-normalizer 12,781,405 certifi 12,647,098 setuptools 12,608,120 idna 12,510,335 Total 168,226,197

So we can see the default pip-only costs an extra 25% data processed and data billed, and costs an extra 25% in dollars.

Unsurprisingly, the actual download counts are higher for all installers. The ranking has changed a bit, but I expect we're still getting more-or-less the same packages in the top thousands of results.

Queries

It sends a query like this to BigQuery for only pip:

SELECT file.project as project, COUNT(*) as download_count, FROM `bigquery-public-data.pypi.file_downloads` WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -2 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) AND details.installer.name = "pip" GROUP BY project ORDER BY download_count DESC LIMIT 10

And for all installers:

SELECT file.project as project, COUNT(*) as download_count, FROM `bigquery-public-data.pypi.file_downloads` WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -2 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) GROUP BY project ORDER BY download_count DESC LIMIT 10

These queries are the same, except the default has an extra AND details.installer.name = "pip" condition. It seems reasonable it would cost more to do extra filtering work.

Installers

Let's look at the installers:

$ pypinfo --all --limit 100 --days 1 "" installer Served from cache: False Data processed: 29.49 GiB Data billed: 29.49 GiB Estimated cost: $0.15 installer name download count pip 1,121,198,711 uv 117,194,833 requests 29,828,272 poetry 23,009,454 None 8,916,745 bandersnatch 6,171,555 setuptools 1,362,797 Bazel 1,280,271 Browser 1,096,328 Nexus 593,230 Homebrew 510,247 Artifactory 69,063 pdm 62,904 OS 13,108 devpi 9,530 conda 2,272 pex 194 Total 1,311,319,514

pip still by far the most popular, and unsurprising uv is up there too, with about 10% of pip's downloads.

The others are about 25% or less of uv. A lot of them are mirroring services that we wanted to exclude before.

I think given uv's importance, and my expectation that it will continue to take a bigger share of the pie, plus especially the extra cost for filtering by just pip, means that we should switch to fetching data for all downloaders. Plus the others don't account for that much of the pie.

Finding: the number of packages doesn't affect the cost

This was the biggest surprise. Earlier I'd been increasing or decreasing the number to try and remain under quota. But it turns out it makes no difference how many packages you query!

I fetched data for just one day and all installers for different package limits: 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000. Sample query:

SELECT file.project as project, COUNT(*) as download_count, FROM `bigquery-public-data.pypi.file_downloads` WHERE timestamp BETWEEN TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -2 DAY) AND TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) GROUP BY project ORDER BY download_count DESC LIMIT 8000

Result: Interestingly, the cost is the same for all limits (1000-8000): $0.31.

Repeating with one day but filtering for pip only:

Result: Cost increased to $0.39 but again the same for all limits.

Let's repeat with all installers, but for 30 days, and this time query in decreasing limits, in case we were only paying for incremental changes: 8000, 7000, 6000, 5000, 4000, 3000, 2000, 1000:

Result: Again, the cost is the same regardless of package limit: $4.89 per query.

Well then, let's repeat with the limit increasing by powers of ten, up to 1,000,000! This last one fetches data for all 531,022 packages on PyPI:

limit projects count estimated cost bytes billed bytes processed 1 1 0.20 43,447,746,560 43,447,720,943 10 10 0.20 43,447,746,560 43,447,720,943 100 100 0.20 43,447,746,560 43,447,720,943 1000 1,000 0.20 43,447,746,560 43,447,720,943 8000 8,000 0.20 43,447,746,560 43,447,720,943 10000 10,000 0.20 43,447,746,560 43,447,720,943 100000 100,000 0.20 43,447,746,560 43,447,720,943 1000000 531,022 0.20 43,447,746,560 43,447,720,943

Result: Again, same cost, whether for 1 package or 531,022 packages!

Finding: the number of days affects the cost

No surprise. I'd earlier noticed 365 days too took much quota, and I could continue with 30 days.

Here's the estimated cost and bytes billed (for one package, all installers) between one and 30 days (f"pypinfo --all --json --indent 0 --days {days} --limit 1 '' project"), showing a roughly linear increase:

Conclusion
  • It doesn't matter how many packages I fetch data for, I might as well fetch all and make it available to everyone, depending on the size of the data file. It will make sense to still offer a smaller file with 8,000 or so packages: often you just need a large-ish yet manageable number.

  • It costs more to filter for only downloads from pip, so I've switched to fetching data for all installers.

  • The number of days affects the cost, so I will need to decrease this in the future to stay within quota. For example, at some point I may need to switch from 30 to 25 days, and later from 25 to 20 days.

More details from the investigation, the scripts and data files can be found at
hugovk/top-pypi-packages#36.

And let me know if you know any tricks to reduce costs!

Header photo: "The Balancing Rock, Stonehenge, Near Glen Innes, NSW" by the Royal Australian Historical Society, with no known copyright restrictions.

Categories: FLOSS Project Planets

Django Weblog: 2024 Malcolm Tredinnick Memorial Prize awarded to Rachell Calhoun

Planet Python - Sun, 2024-11-24 13:39

This year it was hard to decide, and we wanted to also show who else got nominated, because they also deserve recognition, so it took a bit longer than we expected.

The Django Software Foundation Board is pleased to announce that the 2024 Malcolm Tredinnick Memorial Prize has been awarded to Rachell Calhoun.

Rachell Calhoun is an influential figure within the Django community, well known for being cheerful and always willing to help others. She consistently empowers folks behind the scenes.

Rachell got her start in the Django community through a Django Girls Seoul event. Being an educator, she started organizing Django Girls Seoul events. Her contributions to Django Girls Seoul and Django Girls Grand Rapids exemplify her commitment to sharing knowledge, spreading Django and lifting others up. Rachell is now a trustee for Django Girls +, contributing to its mission of helping women and other underrepresented groups around the world learn programming with Django.

In 2022, Rachell co-founded Djangonaut Space, an initiative designed to support new contributors to the Django ecosystem, encouraging leadership and growth. Rachell’s willingness to help people achieve their goals and celebrate their achievements has been imprinted in Djangonaut Space’s culture. Rachell and Djangonaut Space have done a stellar job on helping people become contributors and Django community members.

Her commitment to fostering diversity and inclusion extends beyond her organizational work; she has volunteered at multiple DjangoCon US events, bringing her welcoming and inclusive spirit to the community. A long-time volunteer and speaker at DjangoCon US and DjangoCon Europe from 2016 to 2024, she has shared her expertise and insights on various topics related to Django and web development.

Rachell has contributed to Django for many years, she has been instrumental in creating spaces where people of all backgrounds can thrive, making her a beloved and respected member of the global Django ecosystem.

Some quotes from the thirteen people who nominated Rachell had this to say about her:

Rachell advocates for others constantly through sponsorship, inclusivity, and connection. She is extremely empathic and seeks to not only welcome others in, but to actively bring them into the group.

She has been one of the core members of Djangonaut Space which has done a lot for bringing new contributors into the Django community. This program has done a lot to excite and energize the Django community and Rachell is one of the major reasons why. --
Throughout her career she's been involved in Django Girls starting about a decade ago in South Korea. She was a major organizer of the Grand Rapids, MI branch, before moving into the trustee role she occupies now.

Rachell is one of my favorite people and she's been doing an excellent job at growing Django and helping others feel more welcome here. Rachell is an excellent choice for the Malcolm Tredinnick 2024 award!

— Tim Shilling

Rachell is an extremely skillful leader who is always nurturing newcomers into leaders. She has been pivotal to my experience with the Djangonaut Space Program.
I started out as a nervous Djangonaut who didn’t schedule my 1:1s until Rachell checked in with me and made sure I knew the program was a safe space to discuss anything.

When I joined the program organizers as a Navigator Coordinator, I was initially much more of a follower. Rachell knew to step back while continuing to provide her support, so I could step into the leadership role and get my job done.

Rachell shows people that she believes in them. She does this in a friendly, gentle, and encouraging manner. She never forces anyone to make decisions that they don’t feel comfortable with. The community is really lucky to have Rachell.

— Lilian

Rachell Calhoun, one of the organizers and founders of Djangonaut Space, has been an open, supportive, and educational help on my Django journey. Her contributions to the Djangonaut Space program are invaluable—a program I hold quite dearly as a cornerstone of my technical interactions and growth.

Rachell's ideals of nurturing and guiding have shone through the program, for which I am grateful. Encouraging wonderful conversations, organizing and fostering mentorship, and being a great person!

I believe Rachell is an embodiment of the Malcolm Tredinnick spirit and am confident that should she win the prize, she would go on to create more impact for the Django community and the world at large.

— Emmanuel Katchy

Other nominations for this year included:

Anna Makarudze, Fundraising Coordinator at Django Girls+ Foundation, chair of the first DjangoCon Africa, previously served the DSF board as president.

Benjamin Balder Bach, chair of the DSF social media working group, organizer of Django Day Copenhagen for many years.

Black Python Devs, community founded by Jay Miller, to increase diversity and inclusion of typically underrepresented people.

Bhuvnesh Sharma, co-chair of the DSF social media working group, and co-founded and organized Django India.

Carlton Gibson, previously a Django fellow, co-host of Django Chat, volunteers in DjangoCon Europe and provides useful advice in forum and discord.

Christoph Bulter, active helper of the official and unofficial Django Discord.

Django Girls+, a non-profit organization and a community that empowers and helps women to organize free, one-day programming workshops by providing tools, resources and support.

Django Discord moderators and helpers, which are moderating the discord and provide help to keep the place welcoming and inclusive to everyone.

Daniel Moran, active contributor in various open-source projects, including django-tasks-scheduler. He is an administrator of the Django Commons organization.

Ester Beltrami, PyCon Italia and Django London organizer, is also a volunteer and a speaker in events such as EuroPython or DjangoCon Europe.

Felipe de Morais, co-founder of AfroPython, participant of Djangonaut Space program, organized and advised multiple Django Girls workshops across Brazil and Chile.

Jake Howard, speaker and contributor to Django, known for his work around background tasks.

Matt Westcott, frequent speaker and lead the development of Wagtail.

Russel Keith-Magee, python core contributor and previously Django core contributor and also served in the DSF board as President.

Ryan Cheley, django contributor and mentor (navigator) in Djangonaut Space program.

Simon Charette, long-time django contributor, previously member of the Django 5.x steering council

Sheena O’Connell, frequent speaker and DjangoCon Africa organizer.

Tom Carrick, Django Accessibility team creator and member, django contributor for many years and mentor (navigator) in Djangonaut Space program.

Tim Schilling, DEFNA secretary, DjangoCon Us organizer and co-founder of Djangonaut Space.

Will Vincent, former board member of the DSF, co-host of Django Chat and co-writer of Django News.

Each year we receive many nominations, and it is always hard to pick the winner. This year, as always, we received many nominations for the Malcolm Tredinnick Memorial Prize with some being nominated multiple times. Some have been nominated in multiple years. If your nominee didn’t make it this year, you can always nominate them again next year.

Malcolm would be very proud of the legacy he has fostered in our community!

Congratulations Rachell on the well-deserved honor!

Categories: FLOSS Project Planets

GNU Guix: Guix/Hurd on a Thinkpad X60

GNU Planet! - Sun, 2024-11-24 13:00

A lot has happened with respect to the Hurd since our Childhurds and GNU/Hurd Substitutes post. As long as two years ago some of you have been asking for a progress update and although there have been rumours on a new blog post for over a year, we were kind of waiting for the rumoured x86_64 support.

With all the exciting progress on the Hurd coming available after the recent (last?) merger of core-updates we thought it better not to wait any longer. So here is a short overview of our Hurd work over the past years:

NetDDE and Rumpdisk support

Back in 2020, Ricardo Wurmus added the NetDDE package that provides Linux 2.6 network drivers. At the time we didn't get to integrate and use it though and meanwhile it bitrotted.

After we resurrected the NetDDE build, and with kind help of the Hurd developers we finally managed to support NetDDE for the Hurd.. This allows the usage of the Intel 82573L Gigabit Ethernet Controller of the Thinkpad X60 (and many other network cards, possibly even WIFI). Instead of using the builtin kernel driver in GNU Mach, it would be running as a userland driver.

What sparked this development was upstream's NetBSD rumpdisk support that would allow using modern hard disks such as SSDs, again running as a userland driver. Hard disk support builtin in GNU Mach was once considered to be a nice hack but it only supported disks up to 128 GiB…

First, we needed to fix the cross build on Guix.

After the initial attempt at rumpdisk support for the Hurd it took (v2) some (v3) work (v4) to finally arrive at rumpdisk support for the Hurd, really, *really* (v5)

Sadly when actually using them, booting hangs:

start: pci.arbiter:

What did not really help is that upstream's rumpkernel archive was ridiculously large. We managed to work with upstream to remove unused bits from the archive. Upstream created a new archive that instead of 1.8 GiB (!) now “only” weighs 670 MiB.

Anyway, after a lot of building, rebuilding, and debugging and some more with kind help from upstream we finally got Rumpdisk and NetDDE to run in a Childhurd.

Initial Guix/Hurd on the Thinkpad X60

Now that the last (!) core-updates merge has finally happened (thanks everyone!), the recipe of installing Guix/Hurd has been much simpfilied. It goes something along these lines.

  1. Install Guix/Linux on your X60,

  2. Reserve a partition and format it for the Hurd:

    mke2fs -o hurd -L hurd /dev/sdaX
  3. In your config.scm, add some code to add GRUB menuentries for booting the Hurd, and mount the Hurd partition under /hurd:

    (use-modules (srfi srfi-26) (ice-9 match) (ice-9 rdelim) (ice-9 regex) (gnu build file-systems)) (define %hurd-menuentry-regex "menuentry \"(GNU with the Hurd[^{\"]*)\".*multiboot ([^ \n]*) +([^\n]*)") (define (text->hurd-menuentry text) (let* ((m (string-match %hurd-menuentry-regex text)) (label (match:substring m 1)) (kernel (match:substring m 2)) (arguments (match:substring m 3)) (arguments (string-split arguments #\space)) (root (find (cute string-prefix? "root=" <>) arguments)) (device-spec (match (string-split root #\=) (("root" device) device))) (device (hurd-device-name->device-name device-spec)) (modules (list-matches "module ([^\n]*)" text)) (modules (map (cute match:substring <> 1) modules)) (modules (map (cute string-split <> #\space) modules))) (menu-entry (label label) (device device) (multiboot-kernel kernel) (multiboot-arguments arguments) (multiboot-modules modules)))) (define %hurd-menuentries-regex "menuentry \"(GNU with the Hurd[^{\"]*)\" \\{([^}]|[^\n]\\})*\n\\}") (define (grub.cfg->hurd-menuentries grub.cfg) (let* ((entries (list-matches %hurd-menuentries-regex grub.cfg)) (entries (map (cute match:substring <> 0) entries))) (map text->hurd-menuentry entries))) (define (hurd-menuentries) (let ((grub.cfg (with-input-from-file "/hurd/boot/grub/grub.cfg" read-string))) (grub.cfg->hurd-menuentries grub.cfg))) ... (operating-system ... (bootloader (bootloader-configuration (bootloader grub-bootloader) (targets '("/dev/sda")) (menu-entries (hurd-menuentries)))) (file-systems (cons* (file-system (device (file-system-label "guix")) (mount-point "/") (type "ext4")) (file-system (device (file-system-label "hurd")) (mount-point "/hurd") (type "ext2")) %base-file-systems)) ...)
  4. Create a config.scm for your Hurd system. You can get inspiration from bare-hurd.tmpl and inherit from %hurd-default-operating-system. Use grub-minimal-bootloader and add a static-networking-service-type. Something like:

    (use-modules (srfi srfi-1) (ice-9 match)) (use-modules (gnu) (gnu system hurd)) (operating-system (inherit %hurd-default-operating-system) (bootloader (bootloader-configuration (bootloader grub-minimal-bootloader) (targets '("/dev/sda")))) (kernel-arguments '("noide")) ... (services (cons* (service static-networking-service-type (list %loopback-static-networking (static-networking (addresses (list (network-address (device "eth0") (value "192.168.178.37/24")))) (routes (list (network-route (destination "default") (gateway "192.168.178.1")))) (requirement '()) (provision '(networking)) (name-servers '("192.168.178.1"))))) ...)))
  5. Install the Hurd. Assuming you have an ext2 filesystem mounted on /hurd, do something like:

    guix system build --target=i586-pc-gnu vuurvlieg.hurd --verbosity=1 sudo -E guix system init --target=i586-pc-gnu --skip-checks \ vuurvlieg.hurd /hurd sudo -E guix system reconfigure vuurvlieg.scm
  6. Reboot and...

Hurray!

We now have Guix/Hurd running on Thinkpad.

Guix/Hurd on Real Iron

While the initial manual install on the X60 was an inspiring milestone, we can do better. As mentioned above, just recently the installer learnt about the Hurd, right after some smaller problems were addressed, like guix system init creating essential devices for the Hurd, not attempting to run a cross-built grub-install to install Grub, soft-coding the hard-coded part:1:device:wd0 root file-system, adding support for booting Guix/Hurd more than once.

To install Guix/Hurd, first, build a 32bit installation image and copy it to a USB stick:

guix system image --image-type=iso9660 --system=i686-linux gnu/system/install.scm dd if=/gnu/store/cabba9e-image.iso of=/dev/sdX status=progress sync

then boot it on a not-too-new machine that has wired internet (although installation over WIFI is possible, there is currently no WIFI support for the installed Hurd to use it). On the new Kernel page:

choose Hurd. Do not choose a desktop environment, that's not available yet. On the Network management page:

choose the new Static networking service. In the final Configuration file step, don't forget to edit:

and fill-in your IP and GATEWAY:

You may want to add some additional packages such as git-minimal from (gnu packages version-control) and sqlite from (gnu packages sqlite).

If you also intend to do Guix development on the Hurd—e.g., debugging and fixing native package builds—then you might want to include all dependencies to build the guix package, see the devel-hurd.tmpl for inspiration on how to do that. Note that any package you add must already have support for cross-building.

Good luck, and let us know if it works for you and on what kind of machine, or why it didn't!

What's next?

In an earlier post we tried to answer the question “Why bother with the Hurd anyway?” An obvious question because it is all too easy to get discouraged, to downplay or underestimate the potential social impact of GNU and the Hurd.

The most pressing problem currently is that the guix-daemon fails when invoking guix authenticate on the Hurd, which means that we cannot easily keep our substitutes up to date. guix pull is known to work but only by pulling from a local branch doing something like:

mkdir -p ~/src/guix cd src/guix git clone https://git.savannah.gnu.org/git/guix.git master guix pull --url=~/src/guix/master

kinda like we did it in the old days. Sometimes it seems that guix does not grok the sqlite store database. This is needs to be resolved but as sqlite does read it this can be easily worked around by doing something like:

mv /var/guix/db/db.sqlite /var/guix/db/db.sqlite.orig sqlite3 /var/guix/db/db.sqlite.orig .dump > /var/guix/db/db.sqlite.dump sqlite3 -init /var/guix/db/db.sqlite.dump /var/guix/db/db.sqlite .quit

Also, the boot process still fails to handle an unclean root file system. Whenever the Hurd has suffered an unclean shutdown, cleaning it must currently be done manually, e.g., by booting from the installer USB stick.

Upstream now has decent support for 64bit (x86_64) albeit with more bugs and fewer packages. As mentioned in the overview we are now working to have that in Guix and have made some progress:

more on that in another post. Other interesting task for Guix include:

  • Have guix system reconfigure work on the Hurd,
  • Figure out WiFi support with NetDDE (and add it to installer!),
  • An isolated build environment (or better wait for, err, contribute to the Guile guix-daemon?),
  • An installer running the Hurd, and,
  • Packages, packages, packages!

We tried to make Hurd development as easy and as pleasant as we could. As you have seen, things start to work pretty nicely and there is still plenty of work to do in Guix. In a way this is “merely packaging” the amazing work of others. Some of the real work that needs to be done and which is being discussed and is in progress right now includes:

All these tasks look daunting, and indeed that’s a lot of work ahead. But the development environment is certainly an advantage. Take an example: surely anyone who’s hacked on device drivers or file systems before would have loved to be able to GDB into the code, restart it, add breakpoints and so on—that’s exactly the experience that the Hurd offers. As for Guix, it will make it easy to test changes to the micro-kernel and to the Hurd servers, and that too has the potential to speed up development and make it a very nice experience.

Join #guix and #hurd on libera.chat or the mailing lists and get involved!

Categories: FLOSS Project Planets

Django Weblog: DjangoCon Europe 2026 call for organizers completed

Planet Python - Sun, 2024-11-24 11:57

The DjangoCon Europe 2026 call for organizers is now over. We’re elated to report we received three viable proposals, a clear improvement over recent years.

We’ll let the successful team decide when and how to make their announcement, but in the meantime – thank you to everyone who took part in this process ❤️ We’re elated to have such a strong community in Europe. And for now, look forward to DjangoCon Europe 2025 in Dublin, Ireland! 🍀

What about 2027?

We’re not ready to plan that yet, but if you’re interested in organizing – take a moment to add your name and email to our DjangoCon Europe 2027 expression of interest form. We’ll make sure to reach out once the time is right.

Categories: FLOSS Project Planets

Real Python: Python range(): Represent Numerical Ranges

Planet Python - Sun, 2024-11-24 09:00

In Python, the range() function generates a sequence of numbers, often used in loops for iteration. By default, it creates numbers starting from 0 up to but not including a specified stop value. You can also reverse the sequence with reversed(). If you need to count backwards, then you can use a negative step, like range(start, stop, -1), which counts down from start to stop.

The range() function is not just about iterating over numbers. It can also be used in various programming scenarios beyond simple loops. By mastering range(), you can write more efficient and readable Python code. Explore how range() can simplify your code and when alternatives might be more appropriate.

By the end of this tutorial, you’ll understand that:

  • A range in Python is an object representing an interval of integers, often used for looping.
  • The range() function can be used to generate sequences of numbers that can be converted to lists.
  • for i in range(5) is a loop that iterates over the numbers from 0 to 4, inclusive.
  • The range parameters start, stop, and step define where the sequence begins, ends, and the interval between numbers.
  • Ranges can go backward in Python by using a negative step value and reversed by using reversed().

A range is a Python object that represents an interval of integers. Usually, the numbers are consecutive, but you can also specify that you want to space them out. You can create ranges by calling range() with one, two, or three arguments, as the following examples show:

Python >>> list(range(5)) [0, 1, 2, 3, 4] >>> list(range(1, 7)) [1, 2, 3, 4, 5, 6] >>> list(range(1, 20, 2)) [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] Copied!

In each example, you use list() to explicitly list the individual elements of each range. You’ll study these examples in more detail later on.

A range can be an effective tool. However, throughout this tutorial, you’ll also explore alternatives that may work better in some situations. You can click the link below to download the code that you’ll see in this tutorial:

Get Your Code: Click here to download the free sample code that shows you how to represent numerical ranges in Python.

Construct Numerical Ranges

In Python, range() is built in. This means that you can always call range() without doing any preparations first. Calling range() constructs a range object that you can put to use. Later, you’ll see practical examples of how to use range objects.

You can provide range() with one, two, or three integer arguments. This corresponds to three different use cases:

  1. Ranges counting from zero
  2. Ranges of consecutive numbers
  3. Ranges stepping over numbers

You’ll learn how to use each of these next.

Count From Zero

When you call range() with one argument, you create a range that counts from zero and up to, but not including, the number you provided:

Python >>> range(5) range(0, 5) Copied!

Here, you’ve created a range from zero to five. To see the individual elements in the range, you can use list() to convert the range to a list:

Python >>> list(range(5)) [0, 1, 2, 3, 4] Copied!

Inspecting range(5) shows that it contains the numbers zero, one, two, three, and four. Five itself is not a part of the range. One nice property of these ranges is that the argument, 5 in this case, is the same as the number of elements in the range.

Count From Start to Stop

You can call range() with two arguments. The first value will be the start of the range. As before, the range will count up to, but not include, the second value:

Python >>> range(1, 7) range(1, 7) Copied!

The representation of a range object just shows you the arguments that you provided, so it’s not super helpful in this case. You can use list() to inspect the individual elements:

Python >>> list(range(1, 7)) [1, 2, 3, 4, 5, 6] Copied! Read the full article at https://realpython.com/python-range/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Real Python: Efficient String Concatenation in Python

Planet Python - Sun, 2024-11-24 09:00

Python string concatenation is a fundamental operation that combines multiple strings into a single string. In Python, you can concatenate strings using the + operator or the += operator for appending. For more efficient concatenation of multiple strings, the .join() method is recommended, especially when working with strings in a list. Other techniques include using StringIO for large datasets or the print() function for quick screen outputs.

By the end of this tutorial, you’ll understand that:

  • You can concatenate strings in Python using the + operator and the += operator.
  • You can use += to append a string to an existing string.
  • The .join() method is used to combine strings in a list in Python.
  • You can handle a stream of strings efficiently by using StringIO as a container with a file-like interface.

To get the most out of this tutorial, you should have a basic understanding of Python, especially its built-in string data type.

Get Your Code: Click here to download the free sample code that shows you how to efficiently concatenate strings in Python.

Doing String Concatenation With Python’s Plus Operator (+)

String concatenation is a pretty common operation consisting of joining two or more strings together end to end to build a final string. Perhaps the quickest way to achieve concatenation is to take two separate strings and combine them with the plus operator (+), which is known as the concatenation operator in this context:

Python >>> "Hello, " + "Pythonista!" 'Hello, Pythonista!' >>> head = "String Concatenation " >>> tail = "is Fun in Python!" >>> head + tail 'String Concatenation is Fun in Python!' Copied!

Using the concatenation operator to join two strings provides a quick solution for concatenating only a few strings.

For a more realistic example, say you have an output line that will print an informative message based on specific criteria. The beginning of the message might always be the same. However, the end of the message will vary depending on different criteria. In this situation, you can take advantage of the concatenation operator:

Python >>> def age_group(age): ... if 0 <= age <= 9: ... result = "a Child!" ... elif 9 < age <= 18: ... result = "an Adolescent!" ... elif 19 < age <= 65: ... result = "an Adult!" ... else: ... result = "in your Golden Years!" ... print("You are " + result) ... >>> age_group(29) You are an Adult! >>> age_group(14) You are an Adolescent! >>> age_group(68) You are in your Golden Years! Copied!

In the above example, age_group() prints a final message constructed with a common prefix and the string resulting from the conditional statement. In this type of use case, the plus operator is your best option for quick string concatenation in Python.

The concatenation operator has an augmented version that provides a shortcut for concatenating two strings together. The augmented concatenation operator (+=) has the following syntax:

Python string += other_string Copied!

This expression will concatenate the content of string with the content of other_string. It’s equivalent to saying string = string + other_string.

Here’s a short example of how the augmented concatenation operator works in practice:

Python >>> word = "Py" >>> word += "tho" >>> word += "nis" >>> word += "ta" >>> word 'Pythonista' Copied!

In this example, every augmented assignment adds a new syllable to the final word using the += operator. This concatenation technique can be useful when you have several strings in a list or any other iterable and want to concatenate them in a for loop:

Python >>> def concatenate(iterable, sep=" "): ... sentence = iterable[0] ... for word in iterable[1:]: ... sentence += (sep + word) ... return sentence ... >>> concatenate(["Hello,", "World!", "I", "am", "a", "Pythonista!"]) 'Hello, World! I am a Pythonista!' Copied!

Inside the loop, you use the augmented concatenation operator to quickly concatenate several strings in a loop. Later you’ll learn about .join(), which is an even better way to concatenate a list of strings.

Python’s concatenation operators can only concatenate string objects. If you use them with a different data type, then you get a TypeError:

Python >>> "The result is: " + 42 Traceback (most recent call last): ... TypeError: can only concatenate str (not "int") to str >>> "Your favorite fruits are: " + ["apple", "grape"] Traceback (most recent call last): ... TypeError: can only concatenate str (not "list") to str Copied!

The concatenation operators don’t accept operands of different types. They only concatenate strings. A work-around to this issue is to explicitly use the built-in str() function to convert the target object into its string representation before running the actual concatenation:

Python >>> "The result is: " + str(42) 'The result is: 42' Copied!

By calling str() with your integer number as an argument, you’re retrieving the string representation of 42, which you can then concatenate to the initial string because both are now string objects.

Read the full article at https://realpython.com/python-string-concatenation/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

UX Insights (that we cannot get right now)

Planet KDE - Sun, 2024-11-24 07:50

After the criticism in the last post about the limitations of KUserFeedback (KUF) for doing data-driven UX work — let’s get more detailed and constructive:

What insights do we as KDE UX people need to do even better than we are currently doing?

Let us start with what we already get from KUF. We get usage data, like how many people are using Wayland vs. X11. But we only get usage data according to our telemetry policy. So we do not get any deeper insight into how users configure their sessions when using Wayland compared to X11. But this is the kind of information we would need to do proper data-driven UX. What settings are users changing? How many users have icons on their desktop, and which ones? Are people manually mounting network drives? Which System Tray icons are interacted with the most? And so on.

But while this information is already impossible to gather with our current approach, we’re only scratching the surface. We need even deeper UX insights, like understanding where people click. And where they click next (in terms of Markov chains). That way we can understand if people are using Plasma the way we intended when we designed it. Or, how long does it take them to get from point A to point B? Are they taking detours because we’ve laid out paths that users don’t understand in the way we intended?

None of these questions can be answered with our current approach to telemetry.

The basic problem is that we currently send all the raw data to the KDE servers to get the answers we need. And the data we need to collect in order to get the above described desired user insights could of course be used to “identify a specific user” – which is not allowed by our telemetry policy for good reason.

And yet we need even more data. We want to target all users, or only users who exhibit certain behaviors. We want them to fill out questionnaires to better understand why they behave the way they do, to understand their goals and intentions. This would be extremely helpful in understanding bug reports. Or to support our design discussions with relevant data from real users.

All of this can only be achieved with a fundamental change in the way we do telemetry.

Existing alternatives, such as the opt-out Endless OS metrics system, also do not allow enough user insights and share the problem that the data leaves the property of the data owners, the users. That is why we have been working on the privact ecosystem, which allows all the insights described above, while fully preserving users’ privacy. And because of that, we can not only ask for more intimate data, but we can also make participation opt-out and so get data from substantially more people. And why is that? Because with the privact ecosystem, there is no technical possibility that any individual’s personal data can ever be shared remotely. Never. But it would finally enable good user-data-driven UX work. For the sake of KDE and our users.

Please also join the discussion about this issue on invent.kde.org.

Categories: FLOSS Project Planets

Edward Betts: A mini adventure at MiniDebConf Toulouse

Planet Debian - Sun, 2024-11-24 07:42
A mini adventure at MiniDebConf Toulouse

Last week, I ventured to Toulouse, for a delightful mix of coding, conversation, and crepes at MiniDebConf Toulouse, part of the broader Capitole du Libre conference, akin to the more well-known FOSDEM but with a distinctly French flair.

This was my fourth and final MiniDebConf of the year.

My trek to Toulouse was seamless. I hopped on a bus from my home in Bristol to the airport, then took a short flight. I luxuriated in seat 1A, making me the first to disembark—a mere ten minutes later, I was already on the bus heading to my hotel.

Exploring the Pink City

Once settled, I wasted no time exploring the charms of Toulouse. Just a short stroll from my hotel, I found myself beside a tranquil canal, its waters mirroring the golden hues of the trees lining its banks. Autumn in Toulouse painted the city in warm oranges and reds, creating a picturesque backdrop that was a joy to wander through. Every corner of the street revealed more of the city's rich cultural tapestry and striking architecture. Known affectionately as 'La Ville Rose' (The Pink City) for its unique terracotta brickwork, Toulouse captivated me with its blend of historical allure and vibrant modern life.

MiniDebCamp

Prior to the main event, the MiniDebCamp provided two days of hacking at Artilect FabLab—a space as creative as it was welcoming. It was a pleasure to reconnect with familiar faces and forge new friendships.

Culinary delights

The hospitality was exceptional. Our lunches boasted a delicious array of quiches, an enticing charcuterie board, and a superb selection of cheeses, all perfectly complemented by exquisite petite fours. Each item was not only a feast for the eyes but also a delight for the palate.

Wine and cheese

Leftovers from these gourmet feasts fuelled our impromptu cheese and wine party on Thursday evening—a highlight where informal chats blended seamlessly with serious software discussions.

The river at night

The enchantment of Toulouse doesn't dim with the setting sun; instead, it transforms. My evening strolls took me along the banks of the Garonne, under a sky just turning from twilight to velvet blue. The river, a dark mirror, perfectly reflected the illuminated grandeur of the city's architecture. Notably, the dome of the Hôpital de La Grave stood out, bathed in a warm glow against the night sky. This architectural gem, coupled with the soft lights of the bridge and the serene river, created a breathtaking scene that was both tranquil and awe-inspiring.

Capitole du Libre

The MiniDebConf itself, part of the larger Capitole du Libre event, was a fantastic immersion into the world of free software. Unlike the ticket-free FOSDEM, this conference required QR codes for entry and even had bag searches, adding an unusual layer of security for a software conference.

Highlights included the crepe-making by the organisers, reminiscent of street food scenes from larger festivals. The availability of crepes for MiniDebConf attendees and the presence of food trucks added a festive air, albeit with the inevitable long queues familiar to any festival-goer.

vélôToulouse

The city's bike rental system was a boon—easy to use with handy bike baskets perfect for casual city touring. I chose pedal power over electric, finding it a pleasant way to navigate the streets and absorb the city's vibrant atmosphere.

Markets

Toulouse's markets were a delightful discovery. From a spontaneous visit to a market near my hotel upon arrival, to cycling past bustling marketplaces, each day presented new local flavours and crafts to explore.

The Za'atar flatbread from a Syrian stall was a particularly memorable lunch pick.

La brasserie Les Arcades

Our conference wrapped up with a spontaneous gathering at La Brasserie Les Arcades in Place du Capitole. Finding a café that could accommodate 30 of us on a Sunday evening without a booking felt like striking gold. What began with coffee and ice cream smoothly transitioned into dinner, where I enjoyed a delicious braised duck leg with green peppercorn sauce. This meal rounded off the trip with lively conversations and shared experiences.

The journey back home

Returning from Toulouse, I found myself once again in seat 1A, offering the advantage of being the first off the plane, both on departure and arrival. My flight touched down in Bristol ahead of schedule, and within ten minutes, I was on the A1 bus, making my way back into the heart of Bristol.

Anticipating DebConf 25 in Brittany

My trip to Toulouse for MiniDebConf was yet another fulfilling experience; the city was delightful, and the talks were insightful. While I frequently travel, these journeys are more about continuous learning and networking than escape. The food in Toulouse was particularly impressive, a highlight I've come to expect and relish on my trips to France. Looking ahead, I'm eagerly anticipating DebConf in Brest next year, especially the opportunity to indulge once more in the excellent French cuisine and beverages.

Categories: FLOSS Project Planets

GNU Artanis: new title

GNU Planet! - Sun, 2024-11-24 06:33
Categories: FLOSS Project Planets

Zero to Mastery: Python Monthly Newsletter 💻🐍

Planet Python - Sat, 2024-11-23 20:43
60th issue of Andrei Neagoie's must-read monthly Python Newsletter: Python 3.13, Django Project Ideas, GPU Bubble is Bursting, and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.
Categories: FLOSS Project Planets

Seth Michael Larson: How do I pay the publisher of a web page?

Planet Python - Sat, 2024-11-23 19:00
How do I pay the publisher of a web page? AboutBlogCool URLs How do I pay the publisher of a web page?

Published 2024-11-24 by Seth Larson
Reading time: minutes

Here's an unanswered question:

I have money and I have a URL, how do I send money to the publisher of that URL?

URLs tell you where to get content on the web, but they don't tell you anything about how to support the person who created the content. This story might sound similar to paying open source maintainers, where a user can almost abstract an entire project to a single download URL.

There are tons of people creating content for the web and plenty of ways to get paid (Patreon, Kofi, GitHub Sponsors, YouTube Paid Membership), but there's no standardized way to direct someone interested in paying for the content of a page in the right direction.

We have HTML meta headers for many things, including where to find an RSS feed or what my Fediverse handle is, but none for enumerating options to pay the creator of the content. I wish I could click a button to easily send a "tip" to someone who created something I enjoy or to browse other options for supporting them.

Existing technology Payment Request API

There are things like the web "Payment Request API" which gives you a JavaScript API for generating a payment, but this doesn't fit my criteria.

For one: this means that every person creating content for the web needs to add JavaScript to their page. This is a much higher bar than simply linking to existing payment methods that a creator already likely uses to get paid. Being difficult means it's unlikely for large numbers of people to do the work.

I also don't see being able to automate this because of the JavaScript. Web creators likely have existing payment pages that they'd much rather link out to instead of trying to handle payments themselves individually.

Lastly, this API exists and I don't see it being used by creators today. That should say something about either its ease-of-use or return on investment from potential supporters.

Linking to payment methods in the page

Yeah, we could scrape the payment URLs we know about embedded in the page. But there's a difference between potential URLs in the page due to non-creator generated content (links in comments, etc) and whatever the "authoritative" URLs are for paying the creator of the page. Being able to set <meta> tags in <head> is typically a higher bar than setting arbitrary URLs in the <body>.

Podcasting 2.0 RSS <podcast:funding> tag

Podcasting 2.0 supports basically the exact tag that I want to use which encodes a URL and a human-readable name for that URL into the metadata description of a podcast publication. Really great to see some prior art here.

Thanks to DamonHD for sending me this reference.

Flattr

Flattr is a service that tried to turn a "subscription" from users into micro-payouts based on a users' browsing history. Flattr shut down in 2023. This approach isn't one I'm interested in replicating for a few reasons:

  • Access to the entire browsing history feels like a privacy nightmare. Yes you receive a "complete" sample of which web pages a user has visited but, yeah this doesn't seem great?
  • Flattr tried to create its own payment platform for creators, rather than pushing users to send monetary support through existing stable payment methods like Patreon. They had to do this because of the micro-payments thing.

In general this is making me think micro-payments is extremely hard to do. I think having a handful of dedicated fans for small creators might be enough to "offset" the "loss" of micro-payments? Perhaps there can be a recommendation to note to users when certain creators are "niche" and therefore are receiving fewer payments relative to other creators and thus would benefit more from a contribution / boost?

Thanks to Quentin for sending me this reference.

Brave

I know about Brave, and I would like to avoid crypto in my solution. Also many of the creators I pay for don't use crypto but do have multiple payment methods. I don't think the solution should require creators AND users adopt new technology to work.

What happens now?

I'm no stranger to standards, so maybe I do some research and write a web standard proposal? Seems like fun! I'm imagining something like:

<head> <!-- ... --> <meta property="financial-support" content="https://patreon.com/c/MatthewCarlson"> </head>

Because this is primarily for money, no doubt it will be abused to hell. First-party browsers probably wouldn't do anything with this information for the fear of legitimizing scammers' fake profiles.

The existence of the "Web Payments API" makes me think maybe it's not a huge deal and that whenever money gets involved peoples' spidey-senses start going off about whether a page is legitimate? Not sure.

Let me know what you think!

Have thoughts or questions? Let's chat over email or social:

sethmichaellarson@gmail.com
@sethmlarson@fosstodon.org

Want more articles like this one? Get notified of new posts by subscribing to the RSS feed or the email newsletter. I won't share your email or send spam, only whatever this is!

Want more content now? This blog's archive has ready-to-read articles. I also curate a list of cool URLs I find on the internet.

Find a typo? This blog is open source, pull requests are appreciated.

Thanks for reading! ♡ This work is licensed under CC BY-SA 4.0

Categories: FLOSS Project Planets

Welcome to My Blog

Planet KDE - Sat, 2024-11-23 19:00

Heyho together!

I am from now on writing my posts on GitHub pages. Apart from it being useful to keep my posts versioned using git, I had some issues with my previous blog. The idea was to simply use write.as and publish a post from time to time. This worked well except for more than a month ago me wanting to do a post about my KRunner plugins. It naturally contained a lot of links and thus the publishing was prevented and even the account blocked due to apparent spam. There was no response via mail for over a month.

So here we are not on another blog where I hopefully write more often and also be able to spent more time on KDE!

Categories: FLOSS Project Planets

gnuboot @ Savannah: GNU Boot November 2024 News

GNU Planet! - Sat, 2024-11-23 14:05

A lot has changed since the two last news from the GNU Boot project.

GNU Boot install party in Paris the 7 and 8 December 2024


People involved in the GNU Boot project will be organizing a 100% free
software install party within a bigger event that also has a regular
install party. There will also be a presentation about 100% free
software in there. The event will be mainly in French.

More details are available in French and in English in the following
link:
https://lists.gnu.org/archive/html/help-guix/2024-11/msg00112.html

GNU Boot 0.1 RC4


Many changes were made since the RC3 and since then we fixed an
important bug that prevented Trisquel from booting (If during the
Trisquel installation you chose "LVM2" and didn't encrypt the
storage, GNU Boot images with GRUB would not find the Trisquel
installation).

Because of that we decided to do a new RC4 (release candidate 4)
and to publish new GNU Boot images.

There are still some work needed before doing a 0.1 release as we want
to make it easier for less technical users to install and use GNU
Boot, but more and more of the project structure are getting in place
(website, manual, automatic tests, guix, good development procedures,
enabling build on all distributions, etc) which then makes it easier
to contribute.

We also decided to use Guix for more of the software components
we build, and since this is a big change, we will need people to
help more with testing.

Nonfree software found again, no supported device affected.


The last announcement we made was "Nonfree software found in GNU Boot
releases again, many distros affected"[1].

Some people misunderstood it (maybe we could have been more clear):
the nonfree software that we found was code that GNU Boot didn't use,
so it was easy to remove and it didn't affect the supported devices in
any way.

Finding nonfree software in 100% free distribution is also common:
this is part of the work to ensure these distribution remains 100%
free.

The first time it happened in GNU Boot we publicized it to explain why
we were re-releasing some of the GNU Boot files as it could be very
scary if this happens without any public communication.

The second time we published a news about it mainly to help propagate
the information to the affected distributions and this is probably why
it was misunderstood: it was mainly targeted at GNU Boot users and
maintainers of the affected packages. We also contacted upstream and
some affected distributions directly as well but contacting everybody
takes a lot of time so having a news about it helps. At least Debian
and Trisquel fixed the issue but we still need to contact some
distributions.

After that, and probably thanks to the previous news, Leah Rowe
contacted us on one of the GNU Boot mailing lists[2] to notify us that
she also found additional similar nonfree software in GNU Boot.

So we confirmed that and promptly removed them and re-made again the
source release. And here again even if the work was delayed a bit,
this was fast to do and it doesn't affect the supported devices in any
way.

But we also need help contacting distributions again because one of
the issue she found is very serious because it affects many
distributions and also important devices that GNU Boot doesn't
support.

The ARM trusted firmware ships a nonfree hdcp.bin binary in its source
code. ARM trusted firmware is a dependency of u-boot that is used to
support many ARM computers in other distributions (like Guix, Debian,
etc).

As contacting affected distributions is a tedious task, we also need
help to propagate the information and contact them especially because
we don't know if Leah intend or not to do that work (so far she didn't
reply when asked twice about it), so it's probably up to the GNU Boot
community as a whole (which also includes its maintainers and readers
of this news) to help here.

The details are in the commit 343515aee7ef34695ac45830fad419d9562f9c15
("coreboot: blobs.list: arm-trusted-firmware: Remove RK3399 hdcp.bin
firmware.") in the GNU Boot source code[3].

[1]https://savannah.gnu.org/news/?id=10684
[2]https://lists.gnu.org/archive/html/gnuboot-patches/2024-10/msg00028.html
[3]https://git.savannah.gnu.org/cgit/gnuboot.git/commit/?id=343515aee7ef34695ac45830fad419d9562f9c15

Website and documentation


Jordán (isf) has been contributing some Spanish translations of the
most important website pages (the landing, status and how to
contribute pages). This is important as it could help get more
contributors. These contributions also helped us improve the process
for accepting pseudonymous contributions and enabled us to fix issues.

The work on improving the website in general also continued. Many of
the website pages were reviewed and improved (there is a lot of work
there and mentioning it all would make the news way too long).

The website also now shows the git revision from which it is build and
we also helped the FSF fix some server configuration that created
issue with the deployment of the GNU Boot website (more details are in
the commit message[1]) by reporting the issue to them and testing the
fix.

Patches for making a manual are also being reviewed. While there isn't
much in the manual yet, it also enables to better organize the
documentation and it has the potential to make GNU Boot more
accessible to less technical people.

The next goals is to look how to merge part of the website inside the
manual and continue improving both the website and the manual.

[1]https://git.savannah.gnu.org/cgit/gnuboot.git/commit/?id=d1df672383f6eb8d4218fdef7fbe9ec5e41803e4

Authenticating GNU Boot source code


We now have the ability to verify the source code when downloading it
from git. This is important to avoid certain type of attacks and it
also enables to write code to automatically download, verify and build
the GNU Boot source code.

The source can be verified with the following command (it requires to
have Guix installed):
 $ guix git authenticate $(git rev-parse HEAD) \
  "E23C 26A5 DEEE C5FA 9CDD  D57A 57BC 26A3 6871 16F6" \
  -k origin/keyring

If the authentication works it will print a message like that:

    guix git: successfully
    authenticated commit 05c09293d9660ea1f26b5b705a089b466a0aa718

The 05c09293d9660ea1f26b5b705a089b466a0aa718 might be different in
your case.

The "E23C 26A5 DEEE C5FA 9CDD D57A 57BC 26A3 6871 16F6" part in the
command above is Adrien Bourmault (neox)'s GPG key.

How to use that will be documented more in depth in the upcoming GNU
Boot manual that is currently being reviewed. Its importance will also
be explained in more details for people not familiar with the security
issues it's meant to solve. Also note that we also welcome help for
reviewing patches.

Licensing


The GNU Boot source code has a complex history. It is based on the
last fully free software releases of Libreboot. And the Libreboot
source code history is very complex.

We found some missing authorship information in some of the files that
come from Libreboot and so we started such information from the
various git repositories that were used at some point by Libreboot or
some of the projects it was based on.

To help with this task we also added a page on the GNU Boot website
(https://www.gnu.org/software/gnuboot/web/docs/history/) to track the
status of the reconstruction of the missing authorship and to document
the GNU Boot source code history.

Upstream contributions and easier building of GNU Boot


GNU Boot is just a distribution and like most distributions, it tries
to collaborate with various upstream projects whenever possible.

Since GNU Boot relies on Guix, we improved the Guix documentation
directly to help people install Guix on Trisquel and Parabola. We also
helped Trisquel fix security issues in the Guix package by bug
reporting and testing fixes (some bugs still need to be fixed in
Parabola and Debian, and reporting issues upstream takes time).

Since we also advise to use PureOS or Trisquel to build GNU Boot we
also enabled people with Guix to produce PureOS or Trisquel chroots
with Debootstrap. This was done through contributions to Debootstrap,
and to the Guix Debootstrap package. We could then mention that in the
GNU Boot build documentation
(https://www.gnu.org/software/gnuboot/web/docs/build/) and added a
script (in contrib/start-guix-daemon.py) to support building GNU Boot
in chroots. However there are still issue with the build in chroots
that need to be fixed to producing all released files. Instructions on
how to do build in chroots is also lacking.

In addition we also added the ability to build GNU Boot with Trisquel
11 (aramo).

An apt-cacher-ng package was also contributed in Guix upstream as it
can then be used to speed-up one of the automatic tests used in GNU
Boot but the support for apt-cacher-ng was not integrated yet in GNU
Boot. Last year we also contributed a GRUB package in Guix but we
didn't have the occasion to use it yet. It will probably happen soon
though.

Building GNU Boot


How to build GNU Boot has changed a lot since GNU Boot 0.1 RC3.

Before Guix could only be optionally used to build the website.

In addition to that, Guix is now integrated in the build system so we
can now rely on Guix packages to build GNU Boot images. This also
means that you need to install Guix to build GNU Boot images.

We currently use Guix packages to build some tests. We also build some
installation utilities for the I945 ThinkPads (ThinkPad X60, X60s,
X60T and T60) but we don't have documentation for less technical
people yet on how to use them. We also would need help for testing
these computers as we have no idea if they still work fine or which
fully free distributions still work on them in practice.

We now also support the './configure' and 'make' commands to build GNU
Boot but not yet the 'make install' command as to work we would need
to adapt many of the scripts that are used during the build to be
compatible with that.

There is also less visible work that was done, like cleaning up a lot
of code, adding tests for code quality, documenting a bit the GNU Boot
source code structure, and so on.

Work on making GNU Boot reproducible also started. See
https://reproducible-builds.org/ or
https://en.wikipedia.org/wiki/Reproducible_builds for more detail on
the issue.

We took an extremely strict approach and put the checksum of some of
the things we build directly into GNU Boot and verify it the checksum
during the compilation. This enables us to automatically detect
issues without having to do anything.

We started to enable that for easy things, and we also added the
infrastructure to also use that in Guix packages as well by validating
one of the packages we use during automatic testing.

However at one point this guix package stopped being
reproducible. Since we wanted to keep that code (especially as it was
showing a good example of how to do it), we fixed the bug instead of
removing the test.

This then helped us detect a very subtle and interesting bug in one of
the components we use for automatic tests.

The bug could not be caught during testing because some time
information stored inside the FAT32 file system has a granularity of a
day, and since all the testing happened the same day, it was caught
only later on.

This bug was then fixed and the details are in the fix[1]. A bug
report was also opened upstream because bugs were found in diffoscope
along the way[2]. We still need to do some testing though to
understand if the bug is in diffoscope or one of the underlying
libraries (libguestfs) and then to report the remaining bugs to the
distributions we used during this work.

We also made it easier to update the checksum in the Guix package. If
you package software with Guix, this change is also a good example of
how not to break the '--without-tests' option when you override the
tests in the package you contribute. The commit message[3] and the
change have more details and references on all that.

[1]https://git.savannah.gnu.org/cgit/gnuboot.git/commit/?id=4c3de49fbb3b43940b43f8fdccc8e51ee7df8f46
[2]https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/390
[3]https://git.savannah.gnu.org/cgit/gnuboot.git/commit/?id=40fcb94e2f7ab1df8d320f78311e623f801d8602

LVM2 bug


WodeShengli reported a very important bug[1]: GNU Boot images with
GRUB can't find LVM2 partitions if the partition itself is not
encrypted. For instance if you have LVM2 and no encryption at all or
if the disk is encrypted and that on top you have LVM2, GNU Boot will
not find the partition.

Since this is an extremely serious usability issue (because images
with GRUB are supposed to work out of the box) we spent time to fix
it.

The issue was that the GRUB configuration we ship hardcoded the name
of the LVM volumes to try to boot from. Fixing it required to be able
loop over all the partitions being found, but we found no command to
do that in GRUB (which is probably why the LVM partition names were
hardcoded in the first place).

So we started adding GRUB command options to do that but while the
code worked fine, it didn't integrate in GRUB well. So we contacted
GRUB looking for help as we would have needed to upstream our command
option in GRUB anyway.

And we were told that GRUB already had a way to do what we were
looking for so we used that to fix the issue.

We also added tests that automatically download the Trisquel installer
and installs Trisquel with LVM2 and test if GNU Boot can boot the new
Trisquel installation[2].

While this test is skipped for 32bit computers, it is still good to
have as some people will run it. The test also paves the way to add
more tests that would enable us to improve further the GRUB
configuration without breaking the boot.

[1]https://savannah.gnu.org/bugs/index.php?65663
[2]https://git.savannah.gnu.org/cgit/gnuboot.git/commit/?id=860b00bf1e798d86c8bb2a70d77633599dfa1da2
[3]https://git.savannah.gnu.org/cgit/gnuboot.git/commit/?id=9cc02ddde1e164fabfbddc8bbd3832ef9468d92d

Categories: FLOSS Project Planets

Real Python: How to Iterate Through a Dictionary in Python

Planet Python - Sat, 2024-11-23 09:00

Python offers several ways to iterate through a dictionary, such as using .items() to access key-value pairs directly and .values() to retrieve values only.

By understanding these techniques, you’ll be able to efficiently access and manipulate dictionary data. Whether you’re updating the contents of a dictionary or filtering data, this guide will equip you with the tools you need.

By the end of this tutorial, you’ll understand that:

  • You can directly iterate over the keys of a Python dictionary using a for loop and access values with dict_object[key].
  • You can iterate through a Python dictionary in different ways using the dictionary methods .keys(), .values(), and .items().
  • You should use .items() to access key-value pairs when iterating through a Python dictionary.
  • The fastest way to access both keys and values when you iterate over a dictionary in Python is to use .items() with tuple unpacking.

To get the most out of this tutorial, you should have a basic understanding of Python dictionaries, know how to use Python for loops, and be familiar with comprehensions. Knowing other tools like the built-in map() and filter() functions, as well as the itertools and collections modules, is also a plus.

Get Your Code: Click here to download the sample code that shows you how to iterate through a dictionary with Python.

Take the Quiz: Test your knowledge with our interactive “Python Dictionary Iteration” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Dictionary Iteration

Dictionaries are one of the most important and useful data structures in Python. Learning how to iterate through a Dictionary can help you solve a wide variety of programming problems in an efficient way. Test your understanding on how you can use them better!

Getting Started With Python Dictionaries

Dictionaries are a cornerstone of Python. Many aspects of the language are built around dictionaries. Modules, classes, objects, globals(), and locals() are all examples of how dictionaries are deeply wired into Python’s implementation.

Here’s how the Python official documentation defines a dictionary:

An associative array, where arbitrary keys are mapped to values. The keys can be any object with __hash__() and __eq__() methods. (Source)

There are a couple of points to notice in this definition:

  1. Dictionaries map keys to values and store them in an array or collection. The key-value pairs are commonly known as items.
  2. Dictionary keys must be of a hashable type, which means that they must have a hash value that never changes during the key’s lifetime.

Unlike sequences, which are iterables that support element access using integer indices, dictionaries are indexed by keys. This means that you can access the values stored in a dictionary using the associated key rather than an integer index.

The keys in a dictionary are much like a set, which is a collection of hashable and unique objects. Because the keys need to be hashable, you can’t use mutable objects as dictionary keys.

On the other hand, dictionary values can be of any Python type, whether they’re hashable or not. There are literally no restrictions for values. You can use anything as a value in a Python dictionary.

Note: The concepts and topics that you’ll learn about in this section and throughout this tutorial refer to the CPython implementation of Python. Other implementations, such as PyPy, IronPython, and Jython, could exhibit different dictionary behaviors and features that are beyond the scope of this tutorial.

Before Python 3.6, dictionaries were unordered data structures. This means that the order of items typically wouldn’t match the insertion order:

Python >>> # Python 3.5 >>> likes = {"color": "blue", "fruit": "apple", "pet": "dog"} >>> likes {'color': 'blue', 'pet': 'dog', 'fruit': 'apple'} Copied!

Note how the order of items in the resulting dictionary doesn’t match the order in which you originally inserted the items.

In Python 3.6 and greater, the keys and values of a dictionary retain the same order in which you insert them into the underlying dictionary. From 3.6 onward, dictionaries are compact ordered data structures:

Python >>> # Python 3.6 >>> likes = {"color": "blue", "fruit": "apple", "pet": "dog"} >>> likes {'color': 'blue', 'fruit': 'apple', 'pet': 'dog'} Copied!

Keeping the items in order is a pretty useful feature. However, if you work with code that supports older Python versions, then you must not rely on this feature, because it can generate buggy behaviors. With newer versions, it’s completely safe to rely on the feature.

Another important feature of dictionaries is that they’re mutable data types. This means that you can add, delete, and update their items in place as needed. It’s worth noting that this mutability also means that you can’t use a dictionary as a key in another dictionary.

Understanding How to Iterate Through a Dictionary in Python Read the full article at https://realpython.com/iterate-through-dictionary-python/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

This Week in Plasma: Battery Charge Cycles in Info Center

Planet KDE - Fri, 2024-11-22 23:00

This week we of course continued the customary bug-fixing, but got some nice new features and UI improvements too!

Let me also remind folks about KDE's end-of-year fundraiser. We're 84% of the way to our goal, and it would be amazing to get all the way to 100% before December! Then we can focus on those stretch goals from December to January.

Anyway, enough of the sales pitch, back to the free stuff!

And isn't that amazing? Let's zoom out a bit here and remind ourselves just how incredible it is that this software is made available for free, with no contract or license agreement, to everyone. To you, to your school, to community organizations, businesses, governments, even our direct competitors to study and examine (which goes both ways, and helped me fix a bug in GTK this week; read on for details). It's kind of wild, if you think about it. But, here we are, and we want to keep on being a light in a tech world that sometimes seems to be darkening. Help us keep that light glowing!

Notable New Features

Info Center now shows your battery's cycle count. (Kai Uwe Broulik, 6.3.0. Link 1 and link 2)

Added the ability to convert to and from the CFP franc currency in KRunner-powered searches. (someone going by the pseudonym "Mr. Athozus", Frameworks 6.9. Link)

Notable UI Improvements

Middle-clicking on the Brightness and Color widget no longer does anything when the Night Light hasn't been turned on. (Elias Probst, 6.2.4. Link)

Improved some sources of visual awkwardness in System Monitor: now the loading screen no longer sometimes has a scrollbar; and clicking something selected in a table view visibly de-selects it. (Akseli Lahtinen, 6.2.4. Link 1 and link 2)

Improved the way Discover presents external links to be less visually heavy. (Nate Graham, 6.3.0. Link)

Re-did the "Apply Plasma Settings" dialog on System Settings' Login Screen page to look better and more consistent with other dialogs in QML-based software these days. (Oliver Beard, 6.3.0. Link)

Notable Bug Fixes

Fixed a regression in the Power and Battery widget that broke its ability to notice that power-profiles-daemon was installed instead of TLP after some porting work. (Méven Car, 6.2.4. Link)

Fixed a regression that caused the Disks & Devices widget to not show the correct actions for non-mounted optical discs after some porting work. (Kai Uwe Broulik, 6.2.4. Link)

Fixed an issue that caused screenshots and screen recordings to look too dim while using HDR mode. (Xaver Hugl, 6.2.4. Link)

Fixed a case where Plasma could crash when logging in with an external screen connected to a laptop via HDMI. (Marco Martin, 6.2.4. Link)

Fixed a rare case where Plasma could crash when copying data to the clipboard. (David Edmundson, 6.3.0. Link)

Fixed a bug affecting people using panels in "Fit content" mode that could, under certain circumstances, cause them to be too small until you manually entered Edit Mode once. (Niccolò Venerandi, 6.3.0, Link)

KWin now behaves better when you plug in a weird defective TV that asks for an inappropriate resolution. (Xaver Hugl, 6.3.0. Link)

Discover once again shows update-able "Get New [Stuff]" content on the updates page. (Harald Sitter, 6.3.0. Link)

XWayland-using apps can no longer crash KWin with ludicrously large icon sizes. (David Redondo, Frameworks 6.9. Link)

Fixed a bizarre and annoying bug that caused text displayed at fractional scale factors in Plasma and QtQuick-based KDE apps and to look, for lack of a better term, wobbly. Wobbly windows good, wobbly text bad! Text has now been put on the straight and narrow. (David Edmundson, Frameworks 6.9. Link)

Fixed a strange Qt bug that manifested as Plasma notifications sometimes being vertically squished. (David Edmundson, Qt 6.8.1. Link)

GTK 3 apps once again have the correct icon for their spinboxes' "decrease the value" buttons when using the Breeze icon theme or any other icon theme whose list-remove icon isn't a minus sign. (Nate Graham, GTK 3.24.44. Link)

Other bug information of note:

Notable in Performance & Technical

The feature to let you record the screen without re-approval now also works for virtual outputs. Additionally, virtual outputs now have a better name that indicates which app records them. (David Redondo, 6.3.0. Link)

Fixed a memory leak caused by having a lot of OverlayFS mounts, e.g. from Docker containers. (Joshua Goins, Frameworks 6.9. Link)

How You Can Help

KDE has become important in the world, and your time and contributions have helped us get there. As we grow, we need your support to keep KDE sustainable.

You can help KDE by becoming an active community member and getting involved somehow. Each contributor makes a huge difference in KDE — you are not a number or a cog in a machine!

You don’t have to be a programmer, either. Many other opportunities exist:

You can also help us by donating to our yearly fundraiser! Any monetary contribution — however small — will help us cover operational costs, salaries, travel expenses for contributors, and in general just keep KDE bringing Free Software to the world.

To get a new Plasma feature or a bugfix mentioned here, feel free to push a commit to the relevant merge request on invent.kde.org.

Categories: FLOSS Project Planets

Eli Bendersky: GoMLX: ML in Go without Python

Planet Python - Fri, 2024-11-22 18:00

In the previous post I talked about running ML inference in Go through a Python sidecar process. In this post, let's see how we can accomplish the same tasks without using Python at all.

How ML models are implemented

Let's start with a brief overview of how ML models are implemented under the hood [1]. The model is typically written in Python, using one of the ML frameworks like TensorFlow, JAX or PyTorch. The framework takes care of at least 2 high-level concerns for developers:

  • Expressive way to describe the model architecture, including auto-differentiation for training.
  • Efficient implementation of computational primitives on common HW: CPUs, GPUs and TPUs.

In-between these two concerns there exists a standardized model definition format (or several) that helps multiple tools interoperate. While it's by no means the only solution [2], let's look at the OpenXLA stack as a way to run models on diverse hardware:

  • The top layer are the frameworks that provide high-level primitives to define ML models, and translate them to a common interchange format called StableHLO (where "HLO" stands for High-Level Operations). I've added the gopher on the very right - it will soon become clear why.
  • The bottom layer is the HW that executes these models efficiently.
  • In the middle is the OpenXLA system, which includes two major components: the XLA compiler translating HLO to HW machine code, and PJRT - the runtime component responsible for managing HW devices, moving data (tensors) between the host CPU and these devices, executing tasks, sharding and so on.

There's a huge amount of complexity hidden by the bottom layers of this diagram. Efficient compilation and code generation for diverse HW - including using fixed blocks and libraries (like cuDNN), runtime management etc. All of this is really something one shouldn't try to re-implement unless there's a really, really good reason to do so. And the best part? There's no Python there - this is C and C++; Python only exists on the upper layer - in the high-level ML frameworks.

GoMLX

GoMLX is a relatively new Go package for ML that deserves some attention. GoMLX slots in as one of the frameworks, exactly where the Gopher is in the diagram above [3]. This is absolutely the right approach to the problem. There's no point in re-implementing the low-level primitives - whatever works for TF and JAX will work for Go as well! Google, NVIDIA, Intel and several other companies invest huge resources into these systems, and it's a good idea to benefit from these efforts.

In this post I will showcase re-implementations of some of the samples from the previous post, but with no Python in sight. But first, a few words about what GoMLX does.

GoMLX should be familiar if you've used one of the popular Python ML frameworks. You build a computational graph representing your model - the usual operations are supported and sufficient to implement anything from linear regression to cutting-edge transformers. Since GoMLX wraps XLA, it has access to all the same building blocks TF and JAX use (and it adds its own higher-level primitives, similarly to the Python frameworks).

GoMLX supports automatic differentiation to create the backward propagation operations required to update weights in training. It also provides many helpers for training and keeping track of progress, as well as Jupyter notebook support.

An image model for the CIFAR-10 dataset with GoMLX

In the previous post we built a CNN (convolutional neural network) model using TF+Keras in Python, and ran its inference in a sidecar process we could control from Go.

Here, let's build a similar model in Go, without using Python at all; we'll be training it on the same CIFAR-10 dataset we've used before.

The full code for this sample is here; it is heavily based on GoMLX's own example, with some modifications for simplicity and clarity. Here's the code defining the model graph:

func C10ConvModel(mlxctx *mlxcontext.Context, spec any, inputs []*graph.Node) []*graph.Node { batchedImages := inputs[0] g := batchedImages.Graph() dtype := batchedImages.DType() batchSize := batchedImages.Shape().Dimensions[0] logits := batchedImages layerIdx := 0 nextCtx := func(name string) *mlxcontext.Context { newCtx := mlxctx.Inf("%03d_%s", layerIdx, name) layerIdx++ return newCtx } // Convolution / activation layers logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 32, 32, 32) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.3), true) logits.AssertDims(batchSize, 16, 16, 32) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 8, 8, 64) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 4, 4, 128) // Flatten logits, and apply dense layer logits = graph.Reshape(logits, batchSize, -1) logits = layers.Dense(nextCtx("dense"), logits, true, 128) logits = activations.Relu(logits) logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) numClasses := 10 logits = layers.Dense(nextCtx("dense"), logits, true, numClasses) return []*graph.Node{logits} }

As you might expect, the Go code is longer and more explicit (nodes are threaded explicitly between builder calls, instead of being magically accumulated). It's not hard to envision a Keras-like high level library on top of this.

Here's a snippet from the classifier (inference):

func main() { flagCheckpoint := flag.String("checkpoint", "", "Directory to load checkpoint from") flag.Parse() mlxctx := mlxcontext.New() backend := backends.New() _, err := checkpoints.Load(mlxctx).Dir(*flagCheckpoint).Done() if err != nil { panic(err) } mlxctx = mlxctx.Reuse() // helps sanity check the loaded context exec := mlxcontext.NewExec(backend, mlxctx.In("model"), func(mlxctx *mlxcontext.Context, image *graph.Node) *graph.Node { // Convert our image to a tensor with batch dimension of size 1, and pass // it to the C10ConvModel graph. image = graph.ExpandAxes(image, 0) // Create a batch dimension of size 1. logits := cnnmodel.C10ConvModel(mlxctx, nil, []*graph.Node{image})[0] // Take the class with highest logit value, then remove the batch dimension. choice := graph.ArgMax(logits, -1, dtypes.Int32) return graph.Reshape(choice) }) // classify takes a 32x32 image and returns a Cifar-10 classification according // to the models. Use C10Labels to convert the returned class to a string // name. The returned class is from 0 to 9. classify := func(img image.Image) int32 { input := images.ToTensor(dtypes.Float32).Single(img) outputs := exec.Call(input) classID := tensors.ToScalar[int32](outputs[0]) return classID } // ...

Now classify is a function that takes an image.Image and runs it through the network, returning the index of the most likely label out of the list of CIFAR-10 labels.

The README file in the sample explains how to run it locally on a GPU; the model trains and runs successfully, with similar results to the TF+Keras model we trained in Python earlier.

Gemma2 with GoMLX

For a (much) more involved example, GoMLX has a full implementation of Gemma2 inference. The model implementation itself is in the transformers package. It should look fairly familiar if you've seen a transformer implementation in another language.

The official example in that repository shows how to run it with weights downloaded from HuggingFace; since I've already downloaded the Gemma2 weights from Kaggle for the previous post, here's a simple adaptation:

var ( flagDataDir = flag.String("data", "", "dir with converted weights") flagVocabFile = flag.String("vocab", "", "tokenizer vocabulary file") ) func main() { flag.Parse() ctx := context.New() // Load model weights from the checkpoint downloaded from Kaggle. err := kaggle.ReadConvertedWeights(ctx, *flagDataDir) if err != nil { log.Fatal(err) } // Load tokenizer vocabulary. vocab, err := sentencepiece.NewFromPath(*flagVocabFile) if err != nil { log.Fatal(err) } // Create a Gemma sampler and start sampling tokens. sampler, err := samplers.New(backends.New(), ctx, vocab, 256) if err != nil { log.Fatalf("%+v", err) } start := time.Now() output, err := sampler.Sample([]string{ "Are bees and wasps similar?", }) if err != nil { log.Fatalf("%+v", err) } fmt.Printf("\tElapsed time: %s\n", time.Since(start)) fmt.Printf("Generated text:\n%s\n", strings.Join(output, "\n\n")) }

The complete code together with installation and setup instructions is here.

gomlx/gemma demonstrates that GoMLX has sufficiently advanced capabilities to run a real production-grade open LLM, without Python in the loop.

Summary

The previous post discussed some options for incorporating ML inference into a Go project via a minimal Python sidecar process. Here, we take it a step further and implement ML inference in Go without using Python. We do so by leveraging GoMLX, which itself relies on XLA and PJRT to do the heavy lifting.

If we strip down a framework like TensorFlow to its layers, GoMLX reuses the bottom layers (which is where most of the magic lies), and replaces the model builder library with a Go variant.

Since GoMLX is still a relatively new project, it may be a little risky for production uses at this point. That said, I find this direction very promising and will be following the project's development with interest.

Code

The full code for the samples in this post is on GitHub.

[1]This assumes you know the basics of neural network graphs, their training, etc. If not, check out this post and some of my other posts in the Machine Learning category. [2]It's likely the most common production solution, and pretty much the only way to access Google's TPUs. [3]It does so by including Go bindings for both XLA and PJRT; these are wrapped in higher-level APIs for users.
Categories: FLOSS Project Planets

parallel @ Savannah: GNU Parallel 20241122 ('Ahoo Daryaei') released

GNU Planet! - Fri, 2024-11-22 17:22

GNU Parallel 20241122 ('Ahoo Daryaei') has been released. It is available for download at: lbry://@GnuParallel:4

Quote of the month:

  GNU parallel is so satisfying
    -- James Coman @jcoman.bsky.social

New in this release:

  • --pipe --block works similar to --pipepart --block if --block size is negative.
  • DBURLs can be written with / instead of %2F for sqlite and CSV.
  • Bug fixes and man page updates.

News about GNU Parallel:


GNU Parallel - For people who live life in the parallel lane.

If you like GNU Parallel record a video testimonial: Say who you are, what you use GNU Parallel for, how it helps you, and what you like most about it. Include a command that uses GNU Parallel if you feel like it.


About GNU Parallel


GNU Parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU Parallel can then split the input and pipe it into commands in parallel.

If you use xargs and tee today you will find GNU Parallel very easy to use as GNU Parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU Parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel. GNU Parallel can even replace nested loops.

GNU Parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU Parallel as input for other programs.

For example you can run this to convert all jpeg files into png and gif files and have a progress bar:

  parallel --bar convert {1} {1.}.{2} ::: *.jpg ::: png gif

Or you can generate big, medium, and small thumbnails of all jpeg files in sub dirs:

  find . -name '*.jpg' |
    parallel convert -geometry {2} {1} {1//}/thumb{2}_{1/} :::: - ::: 50 100 200

You can find more about GNU Parallel at: http://www.gnu.org/s/parallel/

You can install GNU Parallel in just 10 seconds with:

    $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \
       fetch -o - http://pi.dk/3 ) > install.sh
    $ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
    12345678 883c667e 01eed62f 975ad28b 6d50e22a
    $ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
    cc21b4c9 43fd03e9 3ae1ae49 e28573c0
    $ sha512sum install.sh | grep ec113b49a54e705f86d51e784ebced224fdff3f52
    79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
    fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
    $ bash install.sh

Watch the intro video on http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.

When using programs that use GNU Parallel to process data for publication please cite:

O. Tange (2018): GNU Parallel 2018, March 2018, https://doi.org/10.5281/zenodo.1146014.

If you like GNU Parallel:

  • Give a demo at your local user group/team/colleagues
  • Post the intro videos on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
  • Get the merchandise https://gnuparallel.threadless.com/designs/gnu-parallel
  • Request or write a review for your favourite blog or magazine
  • Request or build a package for your favourite distribution (if it is not already there)
  • Invite me for your next conference


If you use programs that use GNU Parallel for research:

  • Please cite GNU Parallel in you publications (use --citation)


If GNU Parallel saves you money:



About GNU SQL


GNU sql aims to give a simple, unified interface for accessing databases through all the different databases' command line clients. So far the focus has been on giving a common way to specify login information (protocol, username, password, hostname, and port number), size (database and table size), and running queries.

The database is addressed using a DBURL. If commands are left out you will get that database's interactive shell.

When using GNU SQL for a publication please cite:

O. Tange (2011): GNU SQL - A Command Line Tool for Accessing Different Databases Using DBURLs, ;login: The USENIX Magazine, April 2011:29-32.


About GNU Niceload


GNU niceload slows down a program when the computer load average (or other system activity) is above a certain limit. When the limit is reached the program will be suspended for some time. If the limit is a soft limit the program will be allowed to run for short amounts of time before being suspended again. If the limit is a hard limit the program will only be allowed to run when the system is below the limit.

Categories: FLOSS Project Planets

Matthew Palmer: Your Release Process Sucks

Planet Debian - Fri, 2024-11-22 15:15

For the past decade-plus, every piece of software I write has had one of two release processes.

Software that gets deployed directly onto servers (websites, mostly, but also the infrastructure that runs Pwnedkeys, for example) is deployed with nothing more than git push prod main. I’ll talk more about that some other day.

Today is about the release process for everything else I maintain – Rust / Ruby libraries, standalone programs, and so forth. To release those, I use the following, extremely intricate process:

  1. Create an annotated git tag, where the name of the tag is the software version I’m releasing, and the annotation is the release notes for that version.

  2. Run git release in the repository.

  3. There is no step 3.

Yes, it absolutely is that simple. And if your release process is any more complicated than that, then you are suffering unnecessarily.

But don’t worry. I’m from the Internet, and I’m here to help.

Sidebar: “annotated what-now?!?”

The annotated tag is one git’s best-kept secrets. They’ve been available in git for practically forever (I’ve been using them since at least 2014, which is “practically forever” in software development), yet almost everyone I mention them to has never heard of them.

A “tag”, in git parlance, is a repository-unique named label that points to a single commit (as identified by the commit’s SHA1 hash). Annotating a tag is simply associating a block of free-form text with that tag.

Creating an annotated tag is simple-sauce: git tag -a tagname will open up an editor window where you can enter your annotation, and git tag -a -m "some annotation" tagname will create the tag with the annotation “some annotation”. Retrieving the annotation for a tag is straightforward, too: git show tagname will display the annotation along with all the other tag-related information.

Now that we know all about annotated tags, let’s talk about how to use them to make software releases freaking awesome.

Step 1: Create the Annotated Git Tag

As I just mentioned, creating an annotated git tag is pretty simple: just add a -a (or --annotate, if you enjoy typing) to your git tag command, and WHAM! annotation achieved.

Releases, though, typically have unique and ever-increasing version numbers, which we want to encode in the tag name. Rather than having to look at the existing tags and figure out the next version number ourselves, we can have software do the hard work for us.

Enter: git-version-bump. This straightforward program takes one mandatory argument: major, minor, or patch, and bumps the corresponding version number component in line with Semantic Versioning principles. If you pass it -n, it opens an editor for you to enter the release notes, and when you save out, the tag is automagically created with the appropriate name.

Because the program is called git-version-bump, you can call it as a git command: git version-bump. Also, because version-bump is long and unwieldy, I have it aliased to vb, with the following entry in my ~/.gitconfig:

[alias] vb = version-bump -n

Of course, you don’t have to use git-version-bump if you don’t want to (although why wouldn’t you?). The important thing is that the only step you take to go from “here is our current codebase in main” to “everything as of this commit is version X.Y.Z of this software”, is the creation of an annotated tag that records the version number being released, and the metadata that goes along with that release.

Step 2: Run git release

As I said earlier, I’ve been using this release process for over a decade now. So long, in fact, that when I started, GitHub Actions didn’t exist, and so a lot of the things you’d delegate to a CI runner these days had to be done locally, or in a more ad-hoc manner on a server somewhere.

This is why step 2 in the release process is “run git release”. It’s because historically, you can’t do everything in a CI run. Nowadays, most of my repositories have this in the .git/config:

[alias] release = push --tags

Older repositories which, for one reason or another, haven’t been updated to the new hawtness, have various other aliases defined, which run more specialised scripts (usually just rake release, for Ruby libraries), but they’re slowly dying out.

The reason why I still have this alias, though, is that it standardises the release process. Whether it’s a Ruby gem, a Rust crate, a bunch of protobuf definitions, or whatever else, I run the same command to trigger a release going out. It means I don’t have to think about how I do it for this project, because every project does it exactly the same way.

The Wiring Behind the Button

It wasn’t the button that was the problem. It was the miles of wiring, the hundreds of miles of cables, the circuits, the relays, the machinery. The engine was a massive, sprawling, complex, mind-bending nightmare of levers and dials and buttons and switches. You couldn’t just slap a button on the wall and expect it to work. But there should be a button. A big, fat button that you could press and everything would be fine again. Just press it, and everything would be back to normal.

  • Red Dwarf: Better Than Life

Once you’ve accepted that your release process should be as simple as creating an annotated tag and running one command, you do need to consider what happens afterwards. These days, with the near-universal availability of CI runners that can do anything you need in an isolated, reproducible environment, the work required to go from “annotated tag” to “release artifacts” can be scripted up and left to do its thing.

What that looks like, of course, will probably vary greatly depending on what you’re releasing. I can’t really give universally-applicable guidance, since I don’t know your situation. All I can do is provide some of my open source work as inspirational examples.

For starters, let’s look at a simple Rust crate I’ve written, called strong-box. It’s a straightforward crate, that provides ergonomic and secure cryptographic functionality inspired by the likes of NaCl. As it’s just a crate, its release script is very straightforward. Most of the complexity is working around Cargo’s inelegant mandate that crate version numbers are specified in a TOML file. Apart from that, it’s just a matter of building and uploading the crate. Easy!

Slightly more complicated is action-validator. This is a Rust CLI tool which validates GitHub Actions and Workflows (how very meta) against a published JSON schema, to make sure you haven’t got any syntax or structural errors. As not everyone has a Rust toolchain on their local box, the release process helpfully build binaries for several common OSes and CPU architectures that people can download if they choose. The release process in this case is somewhat larger, but not particularly complicated. Almost half of it is actually scaffolding to build an experimental WASM/NPM build of the code, because someone seemed rather keen on that.

Moving away from Rust, and stepping up the meta another notch, we can take a look at the release process for git-version-bump itself, my Ruby library and associated CLI tool which started me down the “Just Tag It Already” rabbit hole many years ago. In this case, since gemspecs are very amenable to programmatic definition, the release process is practically trivial. Remove the boilerplate and workarounds for GitHub Actions bugs, and you’re left with about three lines of actual commands.

These approaches can certainly scale to larger, more complicated processes. I’ve recently implemented annotated-tag-based releases in a proprietary software product, that produces Debian/Ubuntu, RedHat, and Windows packages, as well as Docker images, and it takes all of the information it needs from the annotated tag. I’m confident that this approach will successfully serve them as they expand out to build AMIs, GCP machine images, and whatever else they need in their release processes in the future.

Objection, Your Honour!

I can hear the howl of the “but, actuallys” coming over the horizon even as I type. People have a lot of Big Feelings about why this release process won’t work for them. Rather than overload this article with them, I’ve created a companion article that enumerates the objections I’ve come across, and answers them. I’m also available for consulting if you’d like a personalised, professional opinion on your specific circumstances.

DVD Bonus Feature: Pre-releases

Unless you’re addicted to surprises, it’s good to get early feedback about new features and bugfixes before they make it into an official, general-purpose release. For this, you can’t go past the pre-release.

The major blocker to widespread use of pre-releases is that cutting a release is usually a pain in the behind. If you’ve got to edit changelogs, and modify version numbers in a dozen places, then you’re entirely justified in thinking that cutting a pre-release for a customer to test that bugfix that only occurs in their environment is too much of a hassle.

The thing is, once you’ve got releases building from annotated tags, making pre-releases on every push to main becomes practically trivial. This is mostly due to another fantastic and underused Git command: git describe.

How git describe works is, basically, that it finds the most recent commit that has an associated annotated tag, and then generates a string that contains that tag’s name, plus the number of commits between that tag and the current commit, with the current commit’s hash included, as a bonus. That is, imagine that three commits ago, you created an annotated release tag named v4.2.0. If you run git describe now, it will print out v4.2.0-3-g04f5a6f (assuming that the current commit’s SHA starts with 04f5a6f).

You might be starting to see where this is going. With a bit of light massaging (essentially, removing the leading v and replacing the -s with .s), that string can be converted into a version number which, in most sane environments, is considered “newer” than the official 4.2.0 release, but will be superceded by the next actual release (say, 4.2.1 or 4.3.0). If you’re already injecting version numbers into the release build process, injecting a slightly different version number is no work at all.

Then, you can easily build release artifacts for every commit to main, and make them available somewhere they won’t get in the way of the “official” releases. For example, in the proprietary product I mentioned previously, this involves uploading the Debian packages to a separate component (prerelease instead of main), so that users that want to opt-in to the prerelease channel simply modify their sources.list to change main to prerelease. Management have been extremely pleased with the easy availability of pre-release packages; they’ve been gleefully installing them willy-nilly for testing purposes since I rolled them out.

In fact, even while I’ve been writing this article, I was asked to add some debug logging to help track down a particularly pernicious bug. I added the few lines of code, committed, pushed, and went back to writing. A few minutes later (next week’s job is to cut that in-process time by at least half), the person who asked for the extra logging ran apt update; apt upgrade, which installed the newly-built package, and was able to progress in their debugging adventure.

Continuous Delivery: It’s Not Just For Hipsters.

“+1, Informative”

Hopefully, this has spurred you to commit your immortal soul to the Church of the Annotated Tag. You may tithe by buying me a refreshing beverage. Alternately, if you’re really keen to adopt more streamlined release management processes, I’m available for consulting engagements.

Categories: FLOSS Project Planets

Pages