Feeds

Guido Günther: Free Software Activities April 2024

Planet Debian - Wed, 2024-05-01 11:06

A short status update of what happened on my side last month. Maintenance and code review keep to be the top time sinks (in a positive way).

Categories: FLOSS Project Planets

Lullabot: Understanding What Drupal Editors and Authors Need

Planet Drupal - Wed, 2024-05-01 10:58

The Drupal Administration UI initiative, introduced in June 2023, continues to improve the user experience of Drupal core. The initiative was launched with the commitment and goal of improving the Drupal experience for all users using Drupal and reversing the existing negative impression of the Drupal user interface (UI).

Categories: FLOSS Project Planets

Antoine Beaupré: Tor migrates from Gitolite/GitWeb to GitLab

Planet Debian - Wed, 2024-05-01 10:55

Note: I've been awfully silent here for the past ... (checks notes) oh dear, 3 months! But that's not because I've been idle, quite the contrary, I've been very busy but just didn't have time to write about anything. So I've taken it upon myself to write something about my work this week, and published this post on the Tor blog which I copy here for a broader audience. Let me know if you like this or not.

Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server.

Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:

https://git.torproject.org/

to:

https://gitlab.torproject.org/

In your Git configuration.

The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab.

Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab.

User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate.

The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining.

The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code?

Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request.

An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page.

Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost.

Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones.

A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form.

That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab.

Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for gitweb.torproject.org, instead search for git.torproject.org.

In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why?

This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago.

The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories.

There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab.

TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab.

In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab

The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules.

When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories.

When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.)

So a relatively large amount of Python code was produced to basically do the following:

  1. check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
  2. for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for multiple matches
  4. if a match is found, redirect if the repository is non-empty
    • we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
    • in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
  5. otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change

User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks.

Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order.

The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions.

The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:

anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration [...] INFO: skipping project project/help/infra in category Migrated to GitLab INFO: skipping project project/help/wiki in category Migrated to GitLab INFO: skipping project project/jenkins/jobs in category Migrated to GitLab INFO: skipping project project/jenkins/tools in category Migrated to GitLab INFO: searching for projects matching fastlane INFO: Successfully connected to https://gitlab.torproject.org import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane' INFO: building a new connect to cupani INFO: defaulting name to fastlane INFO: importing project into GitLab INFO: Successfully connected to https://gitlab.torproject.org INFO: loading group legacy/gitolite/project/tor-browser INFO: archiving project INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab [...]

In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:

$ fab -H cupani.torproject.org gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*' INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab [...] INFO: skipping project user/phw/atlas in category Migrated to GitLab INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic) INFO: Successfully connected to https://gitlab.torproject.org INFO: user repository detected, trying to find fork phw/obfsproxy WARNING: no existing fork found, entering user fork subroutine INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git) 0 legacy/gitolite/debian/obfsproxy 1 legacy/gitolite/debian/obfsproxy-legacy 2 legacy/gitolite/user/asn/obfsproxy 3 legacy/gitolite/user/ioerror/obfsproxy 4 tpo/anti-censorship/pluggable-transports/obfsproxy 5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy select parent to fork from, or enter to abort: ^G4 INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414 fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw INFO: waiting for fork to complete... INFO: fork status: started, sleeping... INFO: fork finished INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> => {'id': 2723, 'name': 'master', 'push_access_levels': [{'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None}], 'merge_access_levels': [{'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers'}], 'allow_force_push': False} INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'... INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy remote: remote: To create a merge request for bug_10887, visit: remote: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887 remote: [...] To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy + 2bf9d09...a8e54d5 master -> master (forced update) * [new branch] bug_10887 -> bug_10887 [...] INFO: migrating repo INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic) INFO: user repository detected, trying to find fork phw/scramblesuit WARNING: no existing fork found, entering user fork subroutine WARNING: no matching gitlab project found for user/phw/scramblesuit INFO: user fork subroutine failed, resuming normal procedure INFO: searching for projects matching scramblesuit import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository' INFO: importing project into GitLab INFO: Successfully connected to https://gitlab.torproject.org INFO: loading group legacy/gitolite/user/phw INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit INFO: archiving project INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab [...]

Acute eyes will notice the bell used as a notification mechanism as well in this transcript.

A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious.

This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone.

Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record.

The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules

Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.

RewriteEngine on # this RewriteMap connects the gitweb projects to their GitLab # equivalent RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt" # if this becomes a performance bottleneck, convert to a DBM map with: # # $ httxt2dbm -i mapfile.txt -o mapfile.map # # and: # # RewriteMap mapname "dbm:/etc/apache/mapfile.map" # # according to reports lavamind found online, we hit such a # performance bottleneck only around millions of entries, which is not our case # those two rules can go away once all the projects are # migrated to GitLab # # this matches the request URI so we can check the RewriteMap # for a match next # # WARNING: this won't match URLs without .git in them, which # *do* work now. one possibility would be to match the request # URI (without query string!) with: # # /git/(.*)(.git)?/(((branches|hooks|info|objects/).*)|git-.*|upload-pack|receive-pack|HEAD|config|description)?. # # I haven't been able to figure out the actual structure of # those URLs, so it's really hard to figure out the boundaries # of the project name here. I stopped after pouring around the # http-backend.c code in git # itself. https://www.git-scm.com/docs/http-protocol is also # kind of incomplete and unsatisfying. RewriteCond %{REQUEST_URI} ^/(git/)?(.*).git/.*$ # this makes the RewriteRule match only if there's a match in # the rewrite map RewriteCond ${gitolite2gitlab:%2|NOT_FOUND} !NOT_FOUND RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$2}.git/$3 [R=302,L] # Fallback everything else to GitLab RewriteRule (.*) https://gitlab.torproject.org [R=302,L] gitweb.torproject.org rewrite rules

Those are the vastly more complicated GitWeb to GitLab rewrite rules.

Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.

RewriteEngine on # this RewriteMap connects the gitweb projects to their GitLab # equivalent RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt" # special rule to process targets of the old spec.tpo site and # bring them to the right redirect on the new spec.tpo site. that should turn, for example: # # https://gitweb.torproject.org/torspec.git/tree/address-spec.txt # # into: # # https://spec.torproject.org/address-spec RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302] # list of endpoints taken from cgit's cmd.c # those two RewriteCond are necessary because we don't move # all repositories at once. once the migration is completed, # they can be removed. # # and yes, they are copied all over the place below # # create a match for the project name to check if the project # has been moved to GitLab RewriteCond %{REQUEST_URI} ^/(.*).git(/.*)?$ # this makes the RewriteRule match only if there's a match in # the rewrite map RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND # main project page, like summary below RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # summary RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # about RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # commit RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))id=([^&]*)(&.*)?$" RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L] # diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1 [R=302,L,QSD] # patch RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.patch [R=302,L,QSD] # rawdiff, incomplete because can show only one file diff, which GitLab cannot RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.diff [R=302,L,QSD] # log RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD$2 [R=302,L] # atom RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L,QSD] # refs, incomplete because two pages in GitLab, defaulting to "tags" RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags [R=302,L] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags/%1 [R=302,L,QSD] # tree RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/%1$2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD$2 [R=302,L] # /-/tree has no good default in GitLab, revert to HEAD which is a good # approximation (we can't assume "master" here anymore) RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD [R=302,L] # plain RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/%1$2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/HEAD$2 [R=302,L] # blame: disabled #RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ #RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND #RewriteCond %{QUERY_STRING} h=([^&]*) #RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/%1$2 [R=302,L,QSD] # same default as tree above #RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ #RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND #RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/HEAD/$2 [R=302,L] # stats RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/graphs/HEAD [R=302,L] # still TODO: # repolist: once migration is complete # # cannot be done: # atom: needs a feed token, user must be logged in # blob: no direct equivalent # info: not working on main cgit website? # ls_cache: not working, irrelevant? # objects: undocumented? # snapshot: pattern too hard to match on cgit's side # special case, we keep a copy of the main index on the archive RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L] # Fallback: everything else to GitLab RewriteRule .* https://gitlab.torproject.org [R=302,L]

The reference copy of those is available in our (currently private) Puppet git repository.

Categories: FLOSS Project Planets

Anarcat: Tor migrates from Gitolite/GitWeb to GitLab

Planet Python - Wed, 2024-05-01 10:55

Note: I've been awfully silent here for the past ... (checks notes) oh dear, 3 months! But that's not because I've been idle, quite the contrary, I've been very busy but just didn't have time to write about anything. So I've taken it upon myself to write something about my work this week, and published this post on the Tor blog which I copy here for a broader audience. Let me know if you like this or not.

Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server.

Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:

https://git.torproject.org/

to:

https://gitlab.torproject.org/

In your Git configuration.

The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab.

Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab.

User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate.

The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining.

The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code?

Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request.

An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page.

Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost.

Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones.

A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form.

That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab.

Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for gitweb.torproject.org, instead search for git.torproject.org.

In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why?

This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago.

The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories.

There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab.

TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab.

In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab

The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules.

When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories.

When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.)

So a relatively large amount of Python code was produced to basically do the following:

  1. check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
  2. for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for multiple matches
  4. if a match is found, redirect if the repository is non-empty
    • we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
    • in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
  5. otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change

User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks.

Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order.

The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions.

The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:

anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration [...] INFO: skipping project project/help/infra in category Migrated to GitLab INFO: skipping project project/help/wiki in category Migrated to GitLab INFO: skipping project project/jenkins/jobs in category Migrated to GitLab INFO: skipping project project/jenkins/tools in category Migrated to GitLab INFO: searching for projects matching fastlane INFO: Successfully connected to https://gitlab.torproject.org import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane' INFO: building a new connect to cupani INFO: defaulting name to fastlane INFO: importing project into GitLab INFO: Successfully connected to https://gitlab.torproject.org INFO: loading group legacy/gitolite/project/tor-browser INFO: archiving project INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab [...]

In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:

$ fab -H cupani.torproject.org gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*' INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab [...] INFO: skipping project user/phw/atlas in category Migrated to GitLab INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic) INFO: Successfully connected to https://gitlab.torproject.org INFO: user repository detected, trying to find fork phw/obfsproxy WARNING: no existing fork found, entering user fork subroutine INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git) 0 legacy/gitolite/debian/obfsproxy 1 legacy/gitolite/debian/obfsproxy-legacy 2 legacy/gitolite/user/asn/obfsproxy 3 legacy/gitolite/user/ioerror/obfsproxy 4 tpo/anti-censorship/pluggable-transports/obfsproxy 5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy select parent to fork from, or enter to abort: ^G4 INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414 fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw INFO: waiting for fork to complete... INFO: fork status: started, sleeping... INFO: fork finished INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> => {'id': 2723, 'name': 'master', 'push_access_levels': [{'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None}], 'merge_access_levels': [{'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers'}], 'allow_force_push': False} INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'... INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy remote: remote: To create a merge request for bug_10887, visit: remote: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887 remote: [...] To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy + 2bf9d09...a8e54d5 master -> master (forced update) * [new branch] bug_10887 -> bug_10887 [...] INFO: migrating repo INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic) INFO: user repository detected, trying to find fork phw/scramblesuit WARNING: no existing fork found, entering user fork subroutine WARNING: no matching gitlab project found for user/phw/scramblesuit INFO: user fork subroutine failed, resuming normal procedure INFO: searching for projects matching scramblesuit import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository' INFO: importing project into GitLab INFO: Successfully connected to https://gitlab.torproject.org INFO: loading group legacy/gitolite/user/phw INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit INFO: archiving project INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab [...]

Acute eyes will notice the bell used as a notification mechanism as well in this transcript.

A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious.

This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone.

Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record.

The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules

Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.

RewriteEngine on # this RewriteMap connects the gitweb projects to their GitLab # equivalent RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt" # if this becomes a performance bottleneck, convert to a DBM map with: # # $ httxt2dbm -i mapfile.txt -o mapfile.map # # and: # # RewriteMap mapname "dbm:/etc/apache/mapfile.map" # # according to reports lavamind found online, we hit such a # performance bottleneck only around millions of entries, which is not our case # those two rules can go away once all the projects are # migrated to GitLab # # this matches the request URI so we can check the RewriteMap # for a match next # # WARNING: this won't match URLs without .git in them, which # *do* work now. one possibility would be to match the request # URI (without query string!) with: # # /git/(.*)(.git)?/(((branches|hooks|info|objects/).*)|git-.*|upload-pack|receive-pack|HEAD|config|description)?. # # I haven't been able to figure out the actual structure of # those URLs, so it's really hard to figure out the boundaries # of the project name here. I stopped after pouring around the # http-backend.c code in git # itself. https://www.git-scm.com/docs/http-protocol is also # kind of incomplete and unsatisfying. RewriteCond %{REQUEST_URI} ^/(git/)?(.*).git/.*$ # this makes the RewriteRule match only if there's a match in # the rewrite map RewriteCond ${gitolite2gitlab:%2|NOT_FOUND} !NOT_FOUND RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$2}.git/$3 [R=302,L] # Fallback everything else to GitLab RewriteRule (.*) https://gitlab.torproject.org [R=302,L] gitweb.torproject.org rewrite rules

Those are the vastly more complicated GitWeb to GitLab rewrite rules.

Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.

RewriteEngine on # this RewriteMap connects the gitweb projects to their GitLab # equivalent RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt" # special rule to process targets of the old spec.tpo site and # bring them to the right redirect on the new spec.tpo site. that should turn, for example: # # https://gitweb.torproject.org/torspec.git/tree/address-spec.txt # # into: # # https://spec.torproject.org/address-spec RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302] # list of endpoints taken from cgit's cmd.c # those two RewriteCond are necessary because we don't move # all repositories at once. once the migration is completed, # they can be removed. # # and yes, they are copied all over the place below # # create a match for the project name to check if the project # has been moved to GitLab RewriteCond %{REQUEST_URI} ^/(.*).git(/.*)?$ # this makes the RewriteRule match only if there's a match in # the rewrite map RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND # main project page, like summary below RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # summary RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # about RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # commit RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))id=([^&]*)(&.*)?$" RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L] # diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1 [R=302,L,QSD] # patch RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.patch [R=302,L,QSD] # rawdiff, incomplete because can show only one file diff, which GitLab cannot RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.diff [R=302,L,QSD] # log RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD$2 [R=302,L] # atom RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L,QSD] # refs, incomplete because two pages in GitLab, defaulting to "tags" RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags [R=302,L] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags/%1 [R=302,L,QSD] # tree RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/%1$2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD$2 [R=302,L] # /-/tree has no good default in GitLab, revert to HEAD which is a good # approximation (we can't assume "master" here anymore) RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD [R=302,L] # plain RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/%1$2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/HEAD$2 [R=302,L] # blame: disabled #RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ #RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND #RewriteCond %{QUERY_STRING} h=([^&]*) #RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/%1$2 [R=302,L,QSD] # same default as tree above #RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ #RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND #RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/HEAD/$2 [R=302,L] # stats RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/graphs/HEAD [R=302,L] # still TODO: # repolist: once migration is complete # # cannot be done: # atom: needs a feed token, user must be logged in # blob: no direct equivalent # info: not working on main cgit website? # ls_cache: not working, irrelevant? # objects: undocumented? # snapshot: pattern too hard to match on cgit's side # special case, we keep a copy of the main index on the archive RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L] # Fallback: everything else to GitLab RewriteRule .* https://gitlab.torproject.org [R=302,L]

The reference copy of those is available in our (currently private) Puppet git repository.

Categories: FLOSS Project Planets

Ed Crewe: Software Engineering Hiring and Firing

Planet Python - Wed, 2024-05-01 10:47



The jump in interest rates to the highest level in over 20 years that hit in summer 2023 for the US, UK and many other countries is still impacting the Software industry. Rates may be due to drop soon, but currently it has choked off investment, upped borrowing costs and lead to many software companies making engineers redundant to please the markets. 

For the UK the estimate is around 8% of software industry jobs made redundant. Although strangely, the overall trend in vacancies for software engineers continues to march upwards, the initial surge after the pandemic dipped last summer but has now recovered.
But if you work in the industry you are bound to have colleagues and friends who have been made redundant, if you are lucky enough to have not been impacted personally.

Given recent history, I thought it may be worth reflecting on my personal experience of the whole hiring and firing process, in the tech industry. It is a UK centric view, but the companies I have worked for in the last 8 years are US software companies.

I have been fired, hired and conducted technical interviews to hire others. Giving me a few different perspectives. 

This post is NOT about getting your first Software job

I first got a coding job in the public sector and it was as a self taught web developer in the 1990s, before web development was a thing you could get a degree in. So I initially got a  job in IT support, volunteered to act up (ie no pay increase) and built some websites that were needed, then became a full time web developer through a portfolio of work, ie sites. 

Today junior developers may have to prove themselves suitable by artificial measures. I skipped these, so I do not have any professional certifications, or any to recommend. I also don't know how to ace coding algorithm or personality profile assessments.

Once you are 5-10 years in to a software career - none of those approaches are used for hiring decisions. 

Only large companies are likely to subject you to them, and that is really out of fairness on the juniors who have to go through them, and to screen out dodgy applicants. Screening just needs to be passed, it will not have any input into whether you get the job. Hence Acing the coding interview as promoted by sites such as LeetCode is not even a thing, only passing coding exercise systems in order to start, or switch to, a career as a developer. I would recommend starting an open source project instead, to demonstrate you can actually code.

The majority of small to medium software companies and of job vacancies require experience and in effect have no vacancies for the most junior software grades with less than 3 years under their belt. So they tend not to use any of these filtering methods. They just want to see proof that you are already a developer, and usually base that on face to face interviews and examples of your code you provide them. So much like how I was originally hired back in the 1990s.

I have only been subject to a LeetCode style test once, which was for a generic job application, ie hiring for numbers of SREs of various seniority, for a FANNG.


 F I R E D When you get that unexpected one to one Zoom call with you manager appear in your calendar these days, it is unlikely to be great news 😓
In the majority of cases the firing process, or to be more polite, redundancy, is all about balancing the finances of the whole company or institution. As such it is very unlikely to be about you.

Of course people are also fired as individuals for various reasons, one of which is actually not being any good at their job, failing to get along with their manager, being a bad culture fit, jobs turning out not to be what was advertised, or expressing political views. Since unlike where I once worked, in the UK Education sector, where 50% of staff are union members, US software companies will have less than 1% membership, so don't tend to respond well to dissent.

Mostly this happens via failing probation, at around 15%  then maybe another 5% annually for disciplinary / performance improvement failure.

If you want to try getting individually fired then go the overemployed route. Get two or three jobs at once and test how long it takes before the company notices you giving 110% is now only 40% and fire you. The rule of thumb is the larger the company, the longer it takes!

But this post's focus isn't about individual firing, its about organizational hiring and firing.

Firing Reasons

  1. A company may be doing badly in a slow long term way, so it has to chop as part of a restructure and downsize to attempt to fix that.
  2. Alternatively the company could be doing really well. So it gets the attention of a big investment company and is bought up and merged with its rival. To fix overlap and justify the merger - both companies lose 20% of staff.
  3. Maybe it needs to pivot towards a new area (currently likely to be AI) and so chop 20% of its staff so it can hire 15% experienced, and pricey, AI developers.
  4. Or it may just have had a one off external impacting event that hit it financially. So to balance the earnings for that year and keep its share price good, it chops a bunch of staff. It will rehire next year, when it suits the balance sheet. This is the example in which I was made redundant along with 5% of staff, it was a big company, so that was a few thousand people globally.
  5. Finally it may be an industry wide phenomenon as it is with the current redundancies in the software industry. A world clamp down on easy cheap loans means investment company driven industries such as tech. are no longer awash with spare cash. Cut backs look good right now, and keep the share price high.
    Hence redundancies that are nothing to do with the industry itself or its future prospects.

That is mirrored in who is fired. Companies do not keep a log book of gold stars and black marks against each employee. They do not use organizational triggered rounds of redundancies to select individuals to fire. They certainly have not got the capability to accurately determine all the best employees and only fire the worst ones. You will be fired based on what part of the organization you are in, how much it is valued in the current strategy and how much you cost vs others who could do your job. If you are currently between teams / or in a new team or role which has yet to establish itself, when the music stops - like musical chairs, bad luck you are out.

The only personal element may be if a whole team is seen as under performing or difficult to manage it might be axed. No matter that it contains a star performer. Decisions may also be geographic. Lets axe the Greek office, save by withdrawing engineering from a country, which is again how I was made redundant, the rest of my team was in Greece.
Alternatively it may be, fire under 20 staff from each country, to avoid more burdensome regulation for bulk layoffs.

The organization could create an insecure / downturn atmosphere to encourage staff to leave. Because its a lot cheaper for people to leave than the company paying out redundancy settlements.

Redundancy keeps the average employees 😐

As a result in response to significant redundancies an organisation will tend to lose more of the best employees - since they are the most able to move, the most likely to get a big pay rise if they move and the least likely to want to stick around if they see negative organisational change.  The software industry has very high staff turnover at almost 20%. Out weighing any nominal idea of removing less efficient staff. 

If a company handles things well it may only lose a representative productivity range of staff from best to worst. But a bulk redundancy process is likely to lead to the biggest loss in the top talent, get rid of slightly more of the bottom dwellers and so result in maximising the mediocre!

In summary the answer to 'Why me?' in group redundancies is "because you were there" ... and you didn't have a personal friendship with the CEO 😉. Of course that is why new CEOs are often brought in to restructure - the first step of which is to take an axe to the current C-suite. 

Some of the best software engineers I have worked with have been made redundant at some point in their career. Group redundancies are not about you or how well you do your job. But taking it personally and challenging the messenger, with why me?, as demonstrated by recent viral videos, is an understandable emotional response to rejection, and the misguided belief that work aims to be some form of meritocracy, in the same way college might.

LIFO and FIFO

LIFO rather than FIFO is the norm in Firing. New hires are less likely to have established themselves as essential to the company, and have less personal connections within it. More importantly many countries redundancy legislation doesn't kick in until over 2 years of employment and the longer you have been employed the more the company will have to pay to terminate you.

Which means a new hire who has uprooted for their new tech job, will be the most likely to find themselves losing that job when bulk redundancies hit.
But FIFO has its place, next would be older engineers. Some companies don't even hire hands on engineers much over the age of 40, anyway. But staff near retirement have at most only a few years left to contribute and may cost more for the same grade. So encouraging early retirement can be part of the bulk redundancy process.

Prejudicial Firing

Whilst redundancy is all about costs and not about your personal performance. That is not to say companies who pass the redundancy choices down to junior managers may not end up with firing disproportionate numbers of workers who are not from the same background as their manager, ie white USA males, ideally younger than the manager. But prejudice is not personal either. That is pretty much what defines it as prejudice, a pre-judgement of people based on physical characteristics rather than their ability at the job. Also people are least likely to fire staff that they have the most in common with, resulting in prejudicial firing. 
Unfortunately it seems many companies with a good diversity policy for hiring, may not have adequate ones for firing. Again resulting in losing more of the higher performing staff.

I have heard of a case where someone got a new manager, who on joining was told to cut from his team, so he fired everyone outside the USA. The worker was so keen to stay at their current employer they went over the head of their manager to senior management and asked for their redundancy to be repealed. Since they had been at the company many years and personally knew senior management, this worked.
Alternatively a more purely cost based restructure may hire all developers from cheaper countries and fire most of them in the US. As happened with Google's Python team recently.

Fight for your job?

The company may set up a pool process for bulk redundancies if numbers are high enough per country, where you can fight for a place on the lifeboat of remaining positions. 

In both cases I would recommend that you don't waste time on a company that doesn't value you. If you do stay you risk, dealing with the bulk redundancy aftermath. Which will be present unless the redundancies were for a pivot (3) or one off event (4).
An increased workload, pay freezes, no bonus, needing to over work to justify being kept on, plus a negative work atmosphere.

In a case where I stayed after the department I was in was axed, I had to reapply for a new job which was moved to a different division. The work was less worthwhile and at the time, the employment of in-house software developers as a whole, was questioned as being unnecessary for the organisation. I outstayed my welcome for 18 months of legacy commercial software support, before getting the message and quitting.

Lesson learnt, if you must ask to to stay in your company, via senior management, a pool or reapplication. Make sure you look around and apply for other jobs outside of it at the same time.

You also miss out on a minimum of a couple of months tax free pay as a settlement.

On the other hand, if the redundancy round is for a more minor pivot, and you are happy in the role, it may be well worth staying around to see how things pan out.

Of course you may get no choice in the matter, in which case, get straight into GET HIRED mode, and start the job search. If you can manage it fast enough, you will benefit financially from the whole process. Although if the reason is (5) a sector wide reduction, then it will be take longer and be harder to obtain the usual 20% pay increase that a new position can offer.


 H I R E D Why change jobs (aside from being fired!)
  • It is a lot easier to get a pay rise or promotion by changing companies, than being promoted internally. To fast track your career to a principal or architect top IC role. Or just get a pay rise.
  • Changing jobs gives you much wider experience, of different technology, approaches and cultures. Making you a better engineer.
  • If you have been in your current job over 10 years without significant internal promotions or changes of role then it is detrimental to your CV, indicating you are stuck in a rut and unable to handle change, eg. new technology.
  • You want to shift sectors.
    I changed from public sector web developer, to commercial cloud engineer with one move.
  • You want to get into new technology that is not used in your current role.
    I changed from a Python, Ruby config management automation engineer to a Kubernetes Golang engineer with another.
  • You want to change your role in tech, or leave it entirely. For example get out of sales as a solution architect and back into a more technical role as an SRE.
On that basis many software engineers change jobs every 2 or 3 years for part of their careers. Its expected, the average engineer in a FAANG stays less than 3 years.
Of course you probably need to be in a job for at least 2 years to fully master it.
If your CV has loads of similar positions where you barely make it past the probation period, its marking you out as a failure at those roles == Fail hiring at the first step, the HR CV check.
Upskilling
The other problem is that changing jobs to change roles, even if its just to use a new language or framework can be blocked by roles requiring experience in that area on the CV to get interviewed for the job in the first place. For software engineering that is less of an issue. Since tech changes faster than any other sector.
You just need to prove you have a range of experience and software languages and are willing to learn, early in a technology boom. To catch the cloud engineer bus, I got a job in it in 2016. The US cloud sector was $8 billion back then. It is $600 billion now. Similarly to get on board with Golang and Kubernetes in 2019. In the first few years of a tech boom most companies will initially have to cross train engineers without direct experience. The corollary of that is that in the current downturn attempting to pivot to an established technology, which k8s has become, is going to be much harder.

Market rates
Clearly ML ops and AI data science are current booming areas. The demand so far outstrips current supply that for switching to a more junior Python AI role in them may pay as well as a senior Django web developer for example.

So around £60k for a junior role, but in 3-4 years it should jump to at least £100k for a senior AI engineer. Of course for US salaries add 30%, plus usually free medical, life insurance etc. The lower tax rates cancels out the higher cost of living in the US ... so its UK salary +30% in real terms*. Researching the going rate for the particular role, technical skills and sector you are applying for is a necessary part of the hiring process. In order that you don't let recruitment bargain you down too low.

* Note that geographic software pay differences are why you often come across engineers of other nationalities emigrating to, and working in the higher paying countries, USA, Canada and Australia. I have worked with many people from the UK and Europe who live in the USA, and Indians who live there or Europe, for example.
Of course as a cheap foreign worker myself, I too stick with US companies partly because they pay rather more than UK ones, even if a lot lower than what I would get if I moved there 😉

Now is the time when such a switch will be easier to accomplish without having to work nights doing courses, certifications and personal projects. The usual means of demonstrating your ability without any work experience.
The caveat here is that moving jobs in a down turn, as we are arguably experiencing currently, can depress the market salary rates and if you are already at the top of those when made redundant, can mean you have to take a pay cut for a year or two rather than face the cost of long term unemployment.
The hiring process for an experienced software engineer roleInterview to Offer should take a month.

If not then the recruitment is likely for a group of roles in an expansion process and from screening and CV, you are not one of the top candidates. You may waiting on the backlog of potential interviewees for a couple of extra months before it properly kicks off.
Or you are told the post is no longer available, sorry!
Even if you would eventually get a post it may stretch your redundancy settlement. Therefore I would not bother pursuing any application process that is looking to be stretching on past 6 weeks.
Start date will be 5 weeks from contract (partly to cater for notice, referee and compliance checks etc)

That makes it 2 months minimum from applying for a role to starting.


The process will consist of a technical assessment task and at least 3 interviews, screening, manager and technical.
With another for introduction to team mates / office etc. which is unlikely to have any effect on the hiring decision unless you are your potential new manager take an instant personal dislike to each other.
The HR screening interview, just checks you are a genuine candidate for the job.The Manager interview similarly is more about checking you will fit in with the company and team, plus that you have basic personal communication skills.
The Technical Interview is what matters
Passing the technical interview is what really decides whether you will get a job offer. Sometimes the tech interview may be split into two, one more task and questionnaire based and the other more discussion. Often the initial task part will be given as WFH.

The technical interview will consist of technical questions to explore whether you have the knowledge and experience required, plus some thing to confirm you can write code and discuss that code, for a developer or SRE role. For the former it would likely be application code whilst for the latter automation code.
For a more purely system administration / IT support role it will involve specifying your processes for resolving issues.

If you are unlucky and it is an in person interview, you may have to whiteboard pseudo code live in response to a changing task described to you on the spot. Although I have only had that once. More common, especially for hybrid / remote roles, is the take away task. To be completed in a 'few hours' at most.

It is possible that either of the above could be replaced by another source for your code. Talking through one of your open source packages, if you have any. Or talking through one or two longer automated coding exercise assessed tasks. I have never come across either of these though.

The main point is that the core of any technical interview for a developer related role will involve talking through code you have written, as a kicking off point to check your understanding of the code, how it could be improved, how you would tackle scaling, or a new exemplar functional requirement. Its faults and features.

You will be asked to talk through past code or technical work in a more generic manner in response to standard questions along the lines of examples of your past work that show how you fit the job. 

Preparing for the Technical Interview

It doesn't take much to work out that a 20% pay rise is worth, a day's worth of work a week.
Assuming you stay in your new job for 2 or 3 years - that is equivalent to 6 months pay.
On that basis even doing a week of work to apply to, and prepare for a single job is still very well worth it, if you get the job.

Adopting a scatter gun approach, ie applying with a generic CV and covering letter to 10 or more jobs, is a waste of time in my view. If you need a new job, then it should be one you are genuinely interested in and research. That means probably you should only have a maximum of 3 tailored applications on the go at once. Even when I was made redundant (and about to get married) I think I limited myself to 4 job applications in total, with one primary one that thankfully I did end up getting.

There are many sites that can advise how best to do that, based on the framework that your hiring will be decided upon I have outlined above. I think preparing some Challenge Action Result stories targeted at the details of the new employer is useful. Plus spending a day or so refining that '2 hours'  development task. Researching the company and preparing specific questions and perhaps suggestions for your interviewers.

Being a Technical Interviewer

From the other side of the table, clearly candidates need to show sufficient competency for the post. They may show it, but only within a totally different technical stack. Smaller companies tend to have less capacity and time to get people up to speed with new tech. So will likely fail these candidates even though they are capable of doing the job eventually.
The technical interviewers will tend to pair on the assessment - to improve its consistency. Swapping partners for interviews regularly also helps.

The assessment process is likely to use some online system such as Jobvite or Greenhouse where each interviewer assesses the candidate. Finally summarising it all with a recommendation for strong pass, pass or fail. Sometimes for a specific post and grade, otherwise the assessment can include a grade recommendation. The manager then rubber stamps that assuming appropriate funding is available. HR's job is to beat the candidate down to the lowest reasonable price, without going so low the candidate walks away.

A healthy growing company will tend to have a rolling recruitment process as they expect to be increasing head count in proportion to customers and revenue. On that basis they will likely be aiming to recruit anyone with a good pass, plus maybe most of the passes too.

Given that engineering jobs are highly specialised and require relevant experience I have not seen cases of way more interviewees than jobs. Currently, even with all the redundancies, there is still an under supply of engineers.
Also the approach for HR will be to set experience and skills pre-requisites for the roles that will keep the numbers down for those who make it to technical interview to around double the number of vacancies. Since it takes out a day of work for each interviewing engineer, to prep, interview and assess.

HIRING SUMMARY

You must pass each of the first 5 or 6 steps to get to the next one and get the job.

  1. HR check written application is a plausible candidate
  2. FAANG sized company - automated quiz / Leetcode style challenge - to reduce the numbers - because they get way more speculative applicants.
  3. Recruiter chat to check candidate is genuine and available
  4. CV skills / experience check vs other applicants to shortlist those worth interviewing
  5. Technical task could be takeaway or whiteboard / questionairre interview
  6. Technical interview, in person or Zoom with engineers

  7. Manager interview / introduction to team mates. 
  8. Recruiter chat. Negotiate exact salary. Agree start date.
  9. Contract is signed, YOU ARE HIRED.






Categories: FLOSS Project Planets

Real Python: Python Sequences: A Comprehensive Guide

Planet Python - Wed, 2024-05-01 10:00

A phrase you’ll often hear is that everything in Python is an object, and every object has a type. This points to the importance of data types in Python. However, often what an object can do is more important than what it is. So, it’s useful to discuss categories of data types and one of the main categories is Python’s sequence.

In this tutorial, you’ll learn about:

  • Basic characteristics of a sequence
  • Operations that are common to most sequences
  • Special methods associated with sequences
  • Abstract base classes Sequence and MutableSequence
  • User-defined mutable and immutable sequences and how to create them

This tutorial assumes that you’re familiar with Python’s built-in data types and with the basics of object-oriented programming.

Get Your Code: Click here to download the free sample code that you’ll use to learn about Python sequences in this comprehensive guide.

Take the Quiz: Test your knowledge with our interactive “Python Sequences: A Comprehensive Guide” quiz. You’ll receive a score upon completion to help you track your learning progress:

Interactive Quiz

Python Sequences: A Comprehensive Guide

In this quiz, you'll test your understanding of sequences in Python. You'll revisit the basic characteristics of a sequence, operations common to most sequences, special methods associated with sequences, and how to create user-defined mutable and immutable sequences.

Building Blocks of Python Sequences

It’s likely you used a Python sequence the last time you wrote Python code, even if you don’t know it. The term sequence doesn’t refer to a specific data type but to a category of data types that share common characteristics.

Characteristics of Python Sequences

A sequence is a data structure that contains items arranged in order, and you can access each item using an integer index that represents its position in the sequence. You can always find the length of a sequence. Here are some examples of sequences from Python’s basic built-in data types:

Python >>> # List >>> countries = ["USA", "Canada", "UK", "Norway", "Malta", "India"] >>> for country in countries: ... print(country) ... USA Canada UK Norway Malta India >>> len(countries) 6 >>> countries[0] 'USA' >>> # Tuple >>> countries = "USA", "Canada", "UK", "Norway", "Malta", "India" >>> for country in countries: ... print(country) ... USA Canada UK Norway Malta India >>> len(countries) 6 >>> countries[0] 'USA' >>> # Strings >>> country = "India" >>> for letter in country: ... print(letter) ... I n d i a >>> len(country) 5 >>> country[0] 'I' Copied!

Lists, tuples, and strings are among Python’s most basic data types. Even though they’re different types with distinct characteristics, they have some common traits. You can summarize the characteristics that define a Python sequence as follows:

  • A sequence is an iterable, which means you can iterate through it.
  • A sequence has a length, which means you can pass it to len() to get its number of elements.
  • An element of a sequence can be accessed based on its position in the sequence using an integer index. You can use the square bracket notation to index a sequence.

There are other built-in data types in Python that also have all of these characteristics. One of these is the range object:

Python >>> numbers = range(5, 11) >>> type(numbers) <class 'range'> >>> len(numbers) 6 >>> numbers[0] 5 >>> numbers[-1] 10 >>> for number in numbers: ... print(number) ... 5 6 7 8 9 10 Copied!

You can iterate through a range object, which makes it iterable. You can also find its length using len() and fetch items through indexing. Therefore, a range object is also a sequence.

You can also verify that bytes and bytearray objects, two of Python’s built-in data structures, are also sequences. Both are sequences of integers. A bytes sequence is immutable, while a bytearray is mutable.

Special Methods Associated With Python Sequences

In Python, the key characteristics of a data type are determined using special methods, which are defined in the class definitions. The special methods associated with the properties of sequences are the following:

  • .__iter__(): This special method makes an object iterable using Python’s preferred iteration protocol. However, it’s possible for a class without an .__iter__() special method to create iterable objects if the class has a .__getitem__() special method that supports iteration. Most sequences have an .__iter__() special method, but it’s possible to have a sequence without this method.
  • .__len__(): This special method defines the length of an object, which is normally the number of elements contained within it. The len() built-in function calls an object’s .__len__() special method. Every sequence has this special method.
  • .__getitem__(): This special method enables you to access an item from a sequence. The square brackets notation can be used to fetch an item. The expression countries[0] is equivalent to countries.__getitem__(0). For sequences, .__getitem__() should accept integer arguments starting from zero. Every sequence has this special method. This method can also ensure an object is iterable if the .__iter__() special method is missing.

Therefore, all sequences have a .__len__() and a .__getitem__() special method and most also have .__iter__().

However, it’s not sufficient for an object to have these special methods to be a sequence. For example, many mappings also have these three methods but mappings aren’t sequences.

A dictionary is an example of a mapping. You can find the length of a dictionary and iterate through its keys using a for loop or other iteration techniques. You can also fetch an item from a dictionary using the square brackets notation.

This characteristic is defined by .__getitem__(). However, .__getitem__() needs arguments that are dictionary keys and returns their matching values. You can’t index a dictionary using integers that refer to an item’s position in the dictionary. Therefore, dictionaries are not sequences.

Slicing in Python Sequences Read the full article at https://realpython.com/python-sequences/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Goal Sprint 2024

Planet KDE - Wed, 2024-05-01 09:00

Last week, like many other people, I was in Berlin for the Mega^WGoals sprint. Natually the three goals (Sustainability, Accessibility, and Automatability; the three abilities) attracted a diverse crowd of people that brought also other topics, so it turned into a proper megasprint.

Being interested in all three goals made it a bit callenging to follow all relevant discussions unfortunately, but on the flip side it never got boring.

Most of my goal-related work was towards the automation goal. One thing I was working on is a CI job that checks for spelling mistakes in the code, so that those can be caught when doing a MR. A while ago I create a website that tracks which KDE projects are ported to Qt6. What started out as a joke for a talk turned out to be a useful tool for planning porting work. During the print I fixed the site to actually work correctly again and, most importantly, changed the text from “No” to “Yes” since most projects actually use Qt6 now. For a while the site also has auto-generated reports for other things, like showing which projects don’t yet have clang-format applied to them or which projects don’t enforce passing test in the CI. I used the latter list to enable this for the remaining projects that don’t have failing tests right now and prepared a change to the CI system that enforces passing tests by default. In the same spirit others and I also fixed some currently failing tests. We also discussed the idea of extending the site with more checks and turning it into a proper KDE site that isn’t hosted on my personal infrastructure.

Harald, Carl, and I worked on a dashboard to show the CI status for our project. This is something we haven’t really had since switching to Gitlab CI but is very useful e.g. as part of the release checklist. We do have a working prototype, but some things remain to be ironed out. As part of this we also fixed some of the currently failing builds.

My main contribution to the systainability goal was debugging why Nate’s NeoChat is using too much CPU. With a team effort we eventually pinpointed this to an invisible animation constantly repainting the window, which was then promptly fixed.

In terms of accessibility I was mainly involved in discussions about challenges and new developments with accessibility on Wayland. Expect to hear more on this soon.

In “off-topic” topics there were plently of discussions about visions, ideas, and challenges for our application development story. This included discussions on visions for design/UX, theming, API design, and software distribution. Being KDE’s Software Platform Engineer it is part of my responsibilities to facilitate these kinds of discussions. Later this year I want to host a sprint dedicated to application design to discuss and establish our vision there. If you are interested do reach out to me.

All in all it was a fun few days with great people. Thanks to MBition and Aleix for hosting us, and thanks to those donating to KDE to make these sprints possible.

Categories: FLOSS Project Planets

Colin Watson: Free software activity in April 2024

Planet Debian - Wed, 2024-05-01 07:34

My Debian contributions this month were all sponsored by Freexian.

  • I’m trying to get back into bugs.debian.org administration, so I spent some time catching up on my owner@bugs.debian.org mailbox and answering a number of support requests there.
  • I fixed a regression I’d introduced last year where groff’s PDF output had invalid date headers, both upstream and in Debian.
  • I released man-db 2.12.1.
  • openssh:
    • I did a little more testing of Luca Boccassi’s modifications to upstream’s inline systemd notification patch.
    • I did an extensive review of some of the choices in Debian’s OpenSSH packaging, in light of last month’s xz-utils backdoor.
    • I fixed a build failure on ppc64el, forwarded upstream.
    • I proposed reducing shared library linkage in tcp-wrappers; its maintainer accepted this by disabling NIS support.
    • I applied a suggestion to improve ordering of systemd services in relation to nss-user-lookup.target.
  • I updated putty to 0.81.
  • Python team:
  • I did some inconclusive investigation of flaky tests in gcr4. More work is needed there.
  • I proposed a patch for a build failure in gyoto, both upstream and in Debian.

You can support my work directly via Liberapay.

Categories: FLOSS Project Planets

Talking Drupal: Skills Upgrade #9

Planet Drupal - Wed, 2024-05-01 06:53

Welcome back to “Skills Upgrade” a Talking Drupal mini-series following the journey of a D7 developer learning D10. This is the final episode, 9.

Topics
  • Review status of Chad's Smart Date test
  • Panel discussion
    • Chad, What was your biggest takeaway?
    • Mike, How do you approach this type of one on one mentorship differently than your courses?
    • AmyJune, do you think there are other types of focused mentorship like this that would be valuable to the community?
    • Chad, what was the most surprising thing you learned in Modern Drupal vs Drupal 7?
    • Michael, what did you learn through this process?
    • How do you think people will use this journey to help their learning process?
    • Chad, what are your plans for your next contribution?
Resources

Chad's Drupal 10 Learning Curriclum & Journal Chad's Drupal 10 Learning Notes

The Linux Foundation is offering a discount of 30% off e-learning courses, certifications and bundles with the code, all uppercase DRUPAL24 and that is good until June 5th https://training.linuxfoundation.org/certification-catalog/

Hosts

Nic Laflin - www.nlightened.net AmyJune Hineline - @volkswagenchick

Guests

Chad Hester - chadkhester.com @chadkhest Mike Anello - DrupalEasy.com @ultimike

Categories: FLOSS Project Planets

Bits from Debian: Infomaniak First Platinum Sponsor of DebConf24

Planet Debian - Wed, 2024-05-01 06:08

We are pleased to announce that Infomaniak has committed to sponsor DebConf24 as a Platinum Sponsor.

Infomaniak is an independent cloud service provider recognised throughout Europe for its commitment to privacy, the local economy and the environment. Recording growth of 18% in 2023, the company is developing a suite of online collaborative tools and cloud hosting, streaming, marketing and events solutions.

Infomaniak uses exclusively renewable energy, builds its own data centers and develops its solutions in Switzerland at the heart of Europe, without relocating. The company powers the website of the Belgian radio and TV service (RTBF) and provides streaming for more than 3,000 TV and radio stations in Europe.

With this commitment as Platinum Sponsor, Infomaniak is contributing to the Debian annual Developers' conference, directly supporting the progress of Debian and Free Software. Infomaniak contributes to strengthen the community that collaborates on Debian projects from all around the world throughout all of the year.

Thank you very much, Infomaniak, for your support of DebConf24!

Become a sponsor too!

DebConf24 will take place from 28th July to 4th August 2024 in Busan, South Korea, and will be preceded by DebCamp, from 21st to 27th July 2024.

DebConf24 is accepting sponsors! Interested companies and organizations should contact the DebConf team through sponsors@debconf.org, or viisit the DebConf24 website at https://debconf24.debconf.org/sponsors/become-a-sponsor/.

Categories: FLOSS Project Planets

Tag1 Consulting: 4 Important Improvements to Drupal Core Performance

Planet Drupal - Wed, 2024-05-01 02:04

Curious about Gander and its potential impact on your workflows and site performance? This testing framework was instrumental in four important improvements made to Drupal core performance. Let’s break down each bottleneck and the solutions that led to these improvements. Learn more about this framework that provides a sophisticated way to detect and correct performance issues.

janez Wed, 05/01/2024 - 09:03
Categories: FLOSS Project Planets

Tag1 Consulting: Migrating Your Data from Drupal 7 to Drupal 10: Source Site Audit - An In depth Analysis

Planet Drupal - Wed, 2024-05-01 02:04

In the previous article, we analyzed the site from a high-level perspective. It is time to dive deeper into the site structure.

mauricio Wed, 05/01/2024 - 06:00
Categories: FLOSS Project Planets

Capellic: Frontend performance optimization for Drupal websites: Part 2

Planet Drupal - Wed, 2024-05-01 00:00
This is part 2 of a series of articles that defines our approach to frontend performance optimization. In this part we get into image optimization.
Categories: FLOSS Project Planets

Russ Allbery: Review: To Each This World

Planet Debian - Tue, 2024-04-30 23:39

Review: To Each This World, by Julie E. Czerneda

Publisher: DAW Copyright: November 2022 ISBN: 0-7564-1543-8 Format: Kindle Pages: 676

To Each This World is a standalone science fiction novel.

Henry m'Yama t'Nowak is the Arbiter of New Earth. This is somewhat akin to a president, but only in very specific ways. Henry's job is to deal with the Kmet.

New Earth was settled by slower-than-light colony ship from old Earth, our Earth. It is, so far as they know, the last of humanity in the universe. Origin Earth fell silent hundreds of years previous, before the colonists even landed. New Earth is now a carefully and thoughtfully managed world where humans survived, thrived, and at one point sent out six slower-than-light colony ships of its own. All were feared lost after a rushed launch due to a solar storm.

As this story opens, a probe from one of those ships arrives.

This is cause for rejoicing, but there are two small problems. The first is that the culture of New Earth has changed drastically since the days when they launched the Halcyon colony ships. New Earth is now part of the Duality, a new alliance with aliens painstakingly negotiated after their portal appeared in orbit. The Kmet were peaceful, eager to form an alliance and offer new technology, although they struggled with concepts such as individuality and insisted on interacting only with the Arbiter. Their technological gifts and the apparent loss of the Halcyon colony ships refocused New Earth on safety and caution. This unexpected message is a somewhat tricky political problem, a reminder of the path not taken.

The other small problem is that the reaction of the Kmet to this message is... dramatic.

This book has several problems, but the most serious is that it is simply too long. If you have read any other Czerneda novels, you know that she tends towards sprawling world-building, but usually there are enough twists and turns in the plot to keep the story moving while the protagonists slowly puzzle out the scientific mysteries. To Each This World is not sufficiently twisty for 676 pages. I think you could have cut half the novel without losing any major plot points.

The interesting parts of this book, to me, were figuring out what's going on with the Kmet, some of the political tensions within the New Earth government, and understanding what Henry and Pilot Killian's story had to do with the apparently-unrelated but intriguing interludes following Beth Seeker in a strange place called Doublet. All that stuff is in here, but it's alongside a whole lot of Henry wrestling with lifeboat ethics in situations where he thinks he needs to lie to and manipulate people for their own good. We also get several extended tours of societies that, while vaguely interesting in a science fiction world-building way, have essentially nothing to do with the plot.

We also get a whole lot of Henry's eagerly helpful AI polymorph Flip. I wanted to like this character, and I occasionally managed, but I felt like there was a constant mismatch between, in hindsight, how Czerneda meant for me to see Flip and what I thought she was signaling while I was reading. I wanted Flip to either be a fascinatingly weird companion or to be directly relevant to the plot, but instead there were hundreds of pages of unnerving creepiness mixed with obsequiousness and emotional neediness, all of which I think I read more into than Czerneda had intended. The overall experience was more exhausting than fun.

The core of the plot is solid, and if you like SF novels built around world-building and scientific mysteries, there's a lot here to enjoy. I think Czerneda's Species Imperative series (starting with Survival) is a better execution of some of the same ideas, but I liked that series a lot and was willing to read another take on it. Czerneda is one of the SF writers who takes biology seriously and is willing to write very alien aliens, and that leads to a few satisfying twists. Also, Beth Seeker is a great character (I wish we'd seen more of her), and Killian, while a bit generic, is a serviceable protagonist when Czerneda needs someone to go poke things with a stick.

Henry... I'm not sure what I think of Henry, and your enjoyment of this book may depend on how much you click with him.

Henry is a diplomat and an extrovert. His greatest joy and talent is talking to people, navigating political situations, and negotiating. Science fiction is full of protagonists who should be this character, but they rarely are this character, probably because a lot of writers are introverts. I think Czerneda deserves real credit for making her charismatic politician sufficiently accurate that his thought processes occasionally felt alien. For me, Henry was easiest to appreciate when Killian was the viewpoint protagonist and I could look him through someone else's eyes, but Henry's viewpoint mostly worked as well. There's a lot of competence porn enjoyment in watching him do his thing.

The problem for me is that I thought several of his actions were unforgivably unethical, but no one in the book who matters seems to agree. I can see why he reached those unethical decisions, but they were profound violations of consent. He directly lies to people because he thinks telling the truth would be too risky and not get them to do what he wants them to do, and Czerneda sets up the story to imply that he might be right.

This is not necessarily a bad choice in a novel, but the author has to do some work to bring me along, and Czerneda didn't do enough of that work. I kept wanting there to be some twist or sting or complication that forced Henry to come to terms with what he was doing, but it never happens. He has to pick between two moral principles that I consider rather finely balanced, if not tilted in the opposite direction that he does, and he treats one principle as inviolable and the other as mostly unimportant. The plans he makes on that basis work fine, and those on the other side of that decision are never heard from again. It left a bad taste in my mouth, particularly given how much of the book is built around Henry making tough, tricky decisions under pressure.

I don't know about this book. I have a lot of mixed feelings. Parts of it I quite enjoyed. Parts of it I mostly enjoyed but wish were much less dragged out. Parts of it frustrated or bored me. It's one of those books where the more I thought about it after reading it, the more the parts I disliked annoyed me.

If you like Czerneda's style of world-building and biology, and if you have more tolerance for Henry's decisions than I did, you may well like this, but read Species Imperative first. I should probably also warn that there is a lot of magical technology in this book that blatantly violates some core principles of physics. I have a high tolerance for that sort of thing, but if you don't, you're going to be grumbling.

Rating: 6 out of 10

Categories: FLOSS Project Planets

Matthew Palmer: The Mediocre Programmer's Guide to Rust

Planet Debian - Tue, 2024-04-30 20:00

Me: “Hi everyone, my name’s Matt, and I’m a mediocre programmer.”

Everyone: “Hi, Matt.”

Facilitator: “Are you an alcoholic, Matt?”

Me: “No, not since I stopped reading Twitter.”

Facilitator: “Then I think you’re in the wrong room.”

Yep, that’s my little secret – I’m a mediocre programmer. The definition of the word “hacker” I most closely align with is “someone who makes furniture with an axe”. I write simple, straightforward code because trying to understand complexity makes my head hurt.

Which is why I’ve always avoided the more “academic” languages, like OCaml, Haskell, Clojure, and so on. I know they’re good languages – people far smarter than me are building amazing things with them – but the time I hear the word “endofunctor”, I’ve lost all focus (and most of my will to live). My preferred languages are the ones that come with less intellectual overhead, like C, PHP, Python, and Ruby.

So it’s interesting that I’ve embraced Rust with significant vigour. It’s by far the most “complicated” language that I feel at least vaguely comfortable with using “in anger”. Part of that is that I’ve managed to assemble a set of principles that allow me to almost completely avoid arguing with Rust’s dreaded borrow checker, lifetimes, and all the rest of the dark, scary corners of the language. It’s also, I think, that Rust helps me to write better software, and I can feel it helping me (almost) all of the time.

In the spirit of helping my fellow mediocre programmers to embrace Rust, I present the principles I’ve assembled so far.

Neither a Borrower Nor a Lender Be

If you know anything about Rust, you probably know about the dreaded “borrow checker”. It’s the thing that makes sure you don’t have two pieces of code trying to modify the same data at the same time, or using a value when it’s no longer valid.

While Rust’s borrowing semantics allow excellent performance without compromising safety, for us mediocre programmers it gets very complicated, very quickly. So, the moment the compiler wants to start talking about “explicit lifetimes”, I shut it up by just using “owned” values instead.

It’s not that I never borrow anything; I have some situations that I know are “borrow-safe” for the mediocre programmer (I’ll cover those later). But any time I’m not sure how things will pan out, I’ll go straight for an owned value.

For example, if I need to store some text in a struct or enum, it’s going straight into a String. I’m not going to start thinking about lifetimes and &'a str; I’ll leave that for smarter people. Similarly, if I need a list of things, it’s a Vec<T> every time – no &'b [T] in my structs, thank you very much.

Attack of the Clones

Following on from the above, I’ve come to not be afraid of .clone(). I scatter them around my code like seeds in a field. Life’s too short to spend time trying to figure out who’s borrowing what from whom, if I can just give everyone their own thing.

There are warnings in the Rust book (and everywhere else) about how a clone can be “expensive”. While it’s true that, yes, making clones of data structures consumes CPU cycles and memory, it very rarely matters. CPU cycles are (usually) plentiful and RAM (usually) relatively cheap. Mediocre programmer mental effort is expensive, and not to be spent on premature optimisation. Also, if you’re coming from most any other modern language, Rust is already giving you so much more performance that you’re probably ending up ahead of the game, even if you .clone() everything in sight.

If, by some miracle, something I write gets so popular that the “expense” of all those spurious clones becomes a problem, it might make sense to pay someone much smarter than I to figure out how to make the program a zero-copy masterpiece of efficient code. Until then… clone early and clone often, I say!

Derive Macros are Powerful Magicks

If you start .clone()ing everywhere, pretty quickly you’ll be hit with this error:

error[E0599]: no method named `clone` found for struct `Foo` in the current scope

This is because not everything can be cloned, and so if you want your thing to be cloned, you need to implement the method yourself. Well… sort of.

One of the things that I find absolutely outstanding about Rust is the “derive macro”. These allow you to put a little marker on a struct or enum, and the compiler will write a bunch of code for you! Clone is one of the available so-called “derivable traits”, so you add #[derive(Clone)] to your structs, and poof! you can .clone() to your heart’s content.

But there are other things that are commonly useful, and so I’ve got a set of traits that basically all of my data structures derive:

#[derive(Clone, Debug, Default)] struct Foo { // ... }

Every time I write a struct or enum definition, that line #[derive(Clone, Debug, Default)] goes at the top.

The Debug trait allows you to print a “debug” representation of the data structure, either with the dbg!() macro, or via the {:?} format in the format!() macro (and anywhere else that takes a format string). Being able to say “what exactly is that?” comes in handy so often, not having a Debug implementation is like programming with one arm tied behind your Aeron.

Meanwhile, the Default trait lets you create an “empty” instance of your data structure, with all of the fields set to their own default values. This only works if all the fields themselves implement Default, but a lot of standard types do, so it’s rare that you’ll define a structure that can’t have an auto-derived Default. Enums are easily handled too, you just mark one variant as the default:

#[derive(Clone, Debug, Default)] enum Bar { Something(String), SomethingElse(i32), #[default] // <== mischief managed Nothing, } Borrowing is OK, Sometimes

While I previously said that I like and usually use owned values, there are a few situations where I know I can borrow without angering the borrow checker gods, and so I’m comfortable doing it.

The first is when I need to pass a value into a function that only needs to take a little look at the value to decide what to do. For example, if I want to know whether any values in a Vec<u32> are even, I could pass in a Vec, like this:

fn main() { let numbers = vec![0u32, 1, 2, 3, 4, 5]; if has_evens(numbers) { println!("EVENS!"); } } fn has_evens(numbers: Vec<u32>) -> bool { numbers.iter().any(|n| n % 2 == 0) }

Howver, this gets ugly if I’m going to use numbers later, like this:

fn main() { let numbers = vec![0u32, 1, 2, 3, 4, 5]; if has_evens(numbers) { println!("EVENS!"); } // Compiler complains about "value borrowed here after move" println!("Sum: {}", numbers.iter().sum::<u32>()); } fn has_evens(numbers: Vec<u32>) -> bool { numbers.iter().any(|n| n % 2 == 0) }

Helpfully, the compiler will suggest I use my old standby, .clone(), to fix this problem. But I know that the borrow checker won’t have a problem with lending that Vec<u32> into has_evens() as a borrowed slice, &[u32], like this:

fn main() { let numbers = vec![0u32, 1, 2, 3, 4, 5]; if has_evens(&numbers) { println!("EVENS!"); } } fn has_evens(numbers: &[u32]) -> bool { numbers.iter().any(|n| n % 2 == 0) }

The general rule I’ve got is that if I can take advantage of lifetime elision (a fancy term meaning “the compiler can figure it out”), I’m probably OK. In less fancy terms, as long as the compiler doesn’t tell me to put 'a anywhere, I’m in the green. On the other hand, the moment the compiler starts using the words “explicit lifetime”, I nope the heck out of there and start cloning everything in sight.

Another example of using lifetime elision is when I’m returning the value of a field from a struct or enum. In that case, I can usually get away with returning a borrowed value, knowing that the caller will probably just be taking a peek at that value, and throwing it away before the struct itself goes out of scope. For example:

struct Foo { id: u32, desc: String, } impl Foo { fn description(&self) -> &str { &self.desc } }

Returning a reference from a function is practically always a mortal sin for mediocre programmers, but returning one from a struct method is often OK. In the rare case that the caller does want the reference I return to live for longer, they can always turn it into an owned value themselves, by calling .to_owned().

Avoid the String Tangle

Rust has a couple of different types for representing strings – String and &str being the ones you see most often. There are good reasons for this, however it complicates method signatures when you just want to take some sort of “bunch of text”, and don’t care so much about the messy details.

For example, let’s say we have a function that wants to see if the length of the string is even. Using the logic that since we’re just taking a peek at the value passed in, our function might take a string reference, &str, like this:

fn is_even_length(s: &str) -> bool { s.len() % 2 == 0 }

That seems to work fine, until someone wants to check a formatted string:

fn main() { // The compiler complains about "expected `&str`, found `String`" if is_even_length(format!("my string is {}", std::env::args().next().unwrap())) { println!("Even length string"); } }

Since format! returns an owned string, String, rather than a string reference, &str, we’ve got a problem. Of course, it’s straightforward to turn the String from format!() into a &str (just prefix it with an &). But as mediocre programmers, we can’t be expected to remember which sort of string all our functions take and add & wherever it’s needed, and having to fix everything when the compiler complains is tedious.

The converse can also happen: a method that wants an owned String, and we’ve got a &str (say, because we’re passing in a string literal, like "Hello, world!"). In this case, we need to use one of the plethora of available “turn this into a String” mechanisms (.to_string(), .to_owned(), String::from(), and probably a few others I’ve forgotten), on the value before we pass it in, which gets ugly real fast.

For these reasons, I never take a String or an &str as an argument. Instead, I use the Power of Traits to let callers pass in anything that is, or can be turned into, a string. Let us have some examples.

First off, if I would normally use &str as the type, I instead use impl AsRef<str>:

fn is_even_length(s: impl AsRef<str>) -> bool { s.as_ref().len() % 2 == 0 }

Note that I had to throw in an extra as_ref() call in there, but now I can call this with either a String or a &str and get an answer.

Now, if I want to be given a String (presumably because I plan on taking ownership of the value, say because I’m creating a new instance of a struct with it), I use impl Into<String> as my type:

struct Foo { id: u32, desc: String, } impl Foo { fn new(id: u32, desc: impl Into<String>) -> Self { Self { id, desc: desc.into() } } }

We have to call .into() on our desc argument, which makes the struct building a bit uglier, but I’d argue that’s a small price to pay for being able to call both Foo::new(1, "this is a thing") and Foo::new(2, format!("This is a thing named {name}")) without caring what sort of string is involved.

Always Have an Error Enum

Rust’s error handing mechanism (Results… everywhere), along with the quality-of-life sugar surrounding it (like the short-circuit operator, ?), is a delightfully ergonomic approach to error handling. To make life easy for mediocre programmers, I recommend starting every project with an Error enum, that derives thiserror::Error, and using that in every method and function that returns a Result.

How you structure your Error type from there is less cut-and-dried, but typically I’ll create a separate enum variant for each type of error I want to have a different description. With thiserror, it’s easy to then attach those descriptions:

#[derive(Clone, Debug, thiserror::Error)] enum Error { #[error("{0} caught fire")] Combustion(String), #[error("{0} exploded")] Explosion(String), }

I also implement functions to create each error variant, because that allows me to do the Into<String> trick, and can sometimes come in handy when creating errors from other places with .map_err() (more on that later). For example, the impl for the above Error would probably be:

impl Error { fn combustion(desc: impl Into<String>) -> Self { Self::Combustion(desc.into()) } fn explosion(desc: impl Into<String>) -> Self { Self::Explosion(desc.into()) } }

It’s a tedious bit of boilerplate, and you can use the thiserror-ext crate’s thiserror_ext::Construct derive macro to do the hard work for you, if you like. It, too, knows all about the Into<String> trick.

Banish map_err (well, mostly)

The newer mediocre programmer, who is just dipping their toe in the water of Rust, might write file handling code that looks like this:

fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error> { let mut f = File::open(name.as_ref()) .map_err(|e| Error::FileOpenError(name.as_ref().to_string(), e))?; let mut buf = vec![0u8; 30]; f.read(&mut buf) .map_err(|e| Error::ReadError(e))?; String::from_utf8(buf) .map_err(|e| Error::EncodingError(e))? .parse::<u32>() .map_err(|e| Error::ParseError(e)) }

This works great (or it probably does, I haven’t actually tested it), but there are a lot of .map_err() calls in there. They take up over half the function, in fact. With the power of the From trait and the magic of the ? operator, we can make this a lot tidier.

First off, assume we’ve written boilerplate error creation functions (or used thiserror_ext::Construct to do it for us)). That allows us to simplify the file handling portion of the function a bit:

fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error> { let mut f = File::open(name.as_ref()) // We've dropped the `.to_string()` out of here... .map_err(|e| Error::file_open_error(name.as_ref(), e))?; let mut buf = vec![0u8; 30]; f.read(&mut buf) // ... and the explicit parameter passing out of here .map_err(Error::read_error)?; // ...

If that latter .map_err() call looks weird, without the |e| and such, it’s passing a function-as-closure, which just saves on a few characters typing. Just because we’re mediocre, doesn’t mean we’re not also lazy.

Next, if we implement the From trait for the other two errors, we can make the string-handling lines significantly cleaner. First, the trait impl:

impl From<std::string::FromUtf8Error> for Error { fn from(e: std::string::FromUtf8Error) -> Self { Self::EncodingError(e) } } impl From<std::num::ParseIntError> for Error { fn from(e: std::num::ParseIntError) -> Self { Self::ParseError(e) } }

(Again, this is boilerplate that can be autogenerated, this time by adding a #[from] tag to the variants you want a From impl on, and thiserror will take care of it for you)

In any event, no matter how you get the From impls, once you have them, the string-handling code becomes practically error-handling-free:

Ok( String::from_utf8(buf)? .parse::<u32>()? )

The ? operator will automatically convert the error from the types returned from each method into the return error type, using From. The only tiny downside to this is that the ? at the end strips the Result, and so we’ve got to wrap the returned value in Ok() to turn it back into a Result for returning. But I think that’s a small price to pay for the removal of those .map_err() calls.

In many cases, my coding process involves just putting a ? after every call that returns a Result, and adding a new Error variant whenever the compiler complains about not being able to convert some new error type. It’s practically zero effort – outstanding outcome for the mediocre programmer.

Just Because You’re Mediocre, Doesn’t Mean You Can’t Get Better

To finish off, I’d like to point out that mediocrity doesn’t imply shoddy work, nor does it mean that you shouldn’t keep learning and improving your craft. One book that I’ve recently found extremely helpful is Effective Rust, by David Drysdale. The author has very kindly put it up to read online, but buying a (paper or ebook) copy would no doubt be appreciated.

The thing about this book, for me, is that it is very readable, even by us mediocre programmers. The sections are written in a way that really “clicked” with me. Some aspects of Rust that I’d had trouble understanding for a long time – such as lifetimes and the borrow checker, and particularly lifetime elision – actually made sense after I’d read the appropriate sections.

Finally, a Quick Beg

I’m currently subsisting on the kindness of strangers, so if you found something useful (or entertaining) in this post, why not buy me a refreshing beverage? It helps to know that people like what I’m doing, and helps keep me from having to sell my soul to a private equity firm.

Categories: FLOSS Project Planets

Amaroking FreeBSD

Planet KDE - Tue, 2024-04-30 18:00

Looking back at my blog, I find lots of mention of Amarok, the KDE audio player, in 2008, 2009, some KDE4-on-OpenSolaris stuff mentions it, and then a long silence until 2021. About a year ago, early 2023, the audio/amarok port was removed from FreeBSD ports. So naturally I was intrigued – maybe even excited – to see Amarok return from vacation with a 3.0 release. And I needed to try it on FreeBSD.

Many of the dependencies are (still) packaged on FreeBSD, so I installed a handful and tried building it on my still-KF5-based X11-based so last-gen Plasma Desktop workstation.

Somewhat to my surprise – I don’t imagine there is FreeBSD CI for Amarok – everything built, ninja install did the right things, and it starts! And toots and parples, clangs, bangs, blips and bloops are reproduced with excellent fidelity.

So, congratulations Amarok folk. It might even return to the FreeBSD ports collection, although – I’m gonna be honest here – I’m not sure it offers me anything in the way of music appreciation that command-line gst-play doesn’t give me. I am just that much more previous-last-gen.

Categories: FLOSS Project Planets

PyCoder’s Weekly: Issue #627 (April 30, 2024)

Planet Python - Tue, 2024-04-30 15:30

#627 – APRIL 30, 2024
View in Browser »

PEP 686: Make UTF-8 Mode Default

This Python Enhancement Proposal outlines making UTF-8 the default throughout Python. This takes the addition of Unicode introduced in Python 3 to its full extent, applying it to file encoding, pipes, and more. Mechanisms for other encoding are still supported. This PEP is targeted for Python 3.15.
PEPS

What’s Lazy Evaluation in Python?

This tutorial explores lazy evaluation in Python and looks at the advantages and disadvantages of using lazy and eager evaluation methods. By the end of this tutorial, you’ll clearly understand which approach is best for you, depending on your needs.
REAL PYTHON

Build Your Own AI CLI Agent with Open Source by Pieces (OSP)

Unlock the power of Pieces, right in your terminal! Our open-source CLI plugin helps you manage code snippets, chat with your on-device AI copilot, and even auto-generate commit messages. Join our community to refine your Python skills and influence a product used by 1000s of devs. Contribute today →
PIECES sponsor

Serverless Python in 2024

Talk Python interviews Tony Sherman and they discuss the current state of serverless computing in the Python world, including some of the newer tools and best practices.
KENNEDY & SHERMAN podcast

Django Developers Survey 2023 Results

JETBRAINS

Djangonauts Space Session 2 Applications Open!

DJANGONAUTS

PyPy v7.3.16 Release

PYPY

PEP 745: Python 3.14 Release Schedule

PEPS

Quiz: Writing Unit Tests for Your Code With unittest

REAL PYTHON

Discussions High Quality Python Scripts or Small Libraries to Learn From?

HACKER NEWS

Articles & Tutorials Filter Sensitive Contents From Django’s Error Reports

Django has the ability to automatically email admins when a 500 error occurs. These kinds of errors can potentially contain sensitive information though, so there are decorators to hide these values. This post covers those as well as how to filter data when using Sentry.
GONÇALO VALÉRIO

Asyncio Coroutine Object Methods in Python

The async and await keywords that form Python’s coroutine mechanism can be used for class methods as well as the more common case of functions. This article shows you how you can use asyncio with your objects.
JASON BROWNLEE

How to Prevent Data Leakage in pandas & scikit-learn

How you impute missing values in machine learning data sets can effect the quality of your training. This article teaches you what data leakage is and what steps you should take to avoid it.
DATASCHOOL

An Open Letter Regarding the DjangoCon Europe CfP

Putting on a conference is a complex matter and an attempt to clarify how future DjangoCons in Europe would be structured has resulted in push-back. This open letter is by a Django board member explaining the situation and a hope of how to move forward.
DJANGO SOFTWARE FOUNDATION

Python Basics: Lists and Tuples

In this video course, you’ll learn about Python lists and tuples, including how to define and manipulate them in your code. By the end of the course, you’ll be ready to effectively use lists and tuples in your programming projects.
REAL PYTHON course

Don’t Lie in Interviews

This strongly worded opinion piece by Nat is in reaction to common advice given on Reddit and similar boards. Nat counters it all with “don’t lie in interviews”. Strong language warning.
NAT BENNETT

Fake Job Interviews Target Devs With New Python Backdoor

“A new campaign tracked as “Dev Popper” is targeting software developers with fake job interviews in an attempt to trick them into installing a Python remote access trojan (RAT).”
BILL TOULAS

Why You Need a “WTF Notebook”

There’s a very specific reputation Nat wants to have on a team: “Nat helps me solve my problems. Nat get things I care about done.” Keeping a WTF notebook helps him do just that.
NAT BENNETT

Write Unit Tests for Your Python Code With ChatGPT

In this tutorial, you’ll learn how to use ChatGPT to generate tests for your Python code. You’ll use the chat to create doctest, unittest, and pytest tests for your code.
REAL PYTHON

Leibniz Formula for Π in Python, JavaScript, and Ruby

This is a bare-bones, side-by-side comparison of the Leibniz formula for calculating pi in Python, JavaScript, and Ruby, along with performance measurements.
PETER BENGTSSON

Better Test Parametrisation in pytest

This “Things I’ve Learned” post discusses how to take advantage of test parameterisation in pytest.
RODRIGO GIRÃO SERRÃO

Projects & Code All Python 2023 Conference Talks Google Sheet

HH91

zpy: ZSH Helpers for Python Venvs, With Uv or Pip-Tools

GITHUB.COM/ANDYDECLEYRE

django-typescript-routes: Typescript Routes From a URL Conf

GITHUB.COM/BUTTONDOWN

pipxu: Install in Isolated Environments Using UV

GITHUB.COM/BULLETMARK

PyOptInterface: Interface for Mathematical Optimization

GITHUB.COM/METAB0T

Events Weekly Real Python Office Hours Q&A (Virtual)

May 1, 2024
REALPYTHON.COM

Canberra Python Meetup

May 2, 2024
MEETUP.COM

Sydney Python User Group (SyPy)

May 2, 2024
SYPY.ORG

TOUFU

May 4 to May 5, 2024
OHTOUFU.COM

PyDelhi User Group Meetup

May 4, 2024
MEETUP.COM

Melbourne Python Users Group, Australia

May 6, 2024
J.MP

Happy Pythoning!
This was PyCoder’s Weekly Issue #627.
View in Browser »

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

Categories: FLOSS Project Planets

PyCharm: PyCharm 2024.1.1 Is Here! AI Assistant in Community Edition, Enhanced Endpoints Tool Window, and Navigation and Refactoring Across Notebooks and Scripts

Planet Python - Tue, 2024-04-30 14:28

Enhancements in the Endpoints tool window, extended GitHub gists support for notebooks, and navigation and refactoring across notebooks and scripts – these are just some of the improvements you’ll find in PyCharm 2024.1.1! 

You can download the latest version from our download page or update your current version through our free Toolbox App

Key features JetBrains AI Assistant in PyCharm Community Edition

JetBrains AI Assistant is now available in version 2024.1.1 of PyCharm Community Edition! With features ranging from smart suggestions to code generation, now Community Edition users can also enhance their coding journey with AI Assistant.

Improvements to the Endpoints tool window

Search URLs faster and more efficiently with the improved Endpoints tool window in PyCharm 2024.1.1. Use the dedicated Endpoints tab in Search Anywhere to have all your endpoints grouped by application with their routes displayed.

Navigation and refactoring across notebooks and scripts

Enjoy navigation and refactoring between notebooks and Python scripts within a single project in PyCharm. Find declarations or usages easily, use the Rename refactoring, and have our full spectrum of code inspections at your disposal. Changes are synchronized across file types, so if you employ any of these features in a notebook, they will automatically be applied to the related script, and vice versa. 

Learn more Create gists from Jupyter notebooks

You can share Jupyter notebooks seamlessly and quickly now that PyCharm offers full support for GitHub gists. Create a gist for a single notebook or select several files in the Project tool window and create a Git repo with all of them at once.

DataFrame statistics and distribution histograms

Review essential statistics directly within DataFrame headers in both Jupyter notebooks and Python scripts. Gain instant insights into how your data is distributed via the histograms provided in the DataFrame header.

IPython config file in the console

Save time by configuring your IPython console automatically using config files. Eliminate the need to import dependencies manually every time you use the console.

Download PyCharm 2024.1.1

And that’s not all! Please visit our What’s New page to discover other improvements in PyCharm 2024.1.1. You can also check out our full release notes for all the details to ensure you don’t miss out on trying any of the enhancements.

Thank you for your continued support as we strive to improve your PyCharm experience. Please report any bugs through our issue tracker so we can take care of them as soon as possible. Connect with us on X (formerly Twitter) to share your valuable feedback on PyCharm 2024.1.1!

Categories: FLOSS Project Planets

Pages