Planet Python

Subscribe to Planet Python feed
Planet Python - http://planetpython.org/
Updated: 11 hours 50 min ago

Eli Bendersky: Unification

Sat, 2024-05-04 22:46

In logic and computer science, unification is a process of automatically solving equations between symbolic terms. Unification has several interesting applications, notably in logic programming and type inference. In this post I want to present the basic unification algorithm with a complete implementation.

Let's start with some terminology. We'll be using terms built from constants, variables and function applications:

  • A lowercase letter represents a constant (could be any kind of constant, like an integer or a string)
  • An uppercase letter represents a variable
  • f(...) is an application of function f to some parameters, which are terms themselves

This representation is borrowed from first-order logic and is also used in the Prolog programming language. Some examples:

  • V: a single variable term
  • foo(V, k): function foo applied to variable V and constant k
  • foo(bar(k), baz(V)): a nested function application
Pattern matching

Unification can be seen as a generalization of pattern matching, so let's start with that first.

We're given a constant term and a pattern term. The pattern term has variables. Pattern matching is the problem of finding a variable assignment that will make the two terms match. For example:

  • Constant term: f(a, b, bar(t))
  • Pattern term: f(a, V, X)

Trivially, the assignment V=b and X=bar(t) works here. Another name to call such an assignment is a substitution, which maps variables to their assigned values. In a less trivial case, variables can appear multiple times in a pattern:

  • Constant term: f(top(a), a, g(top(a)), t)
  • Pattern term: f(V, a, g(V), t)

Here the right substitution is V=top(a).

Sometimes, no valid substitutions exist. If we change the constant term in the latest example to f(top(b), a, g(top(a)), t), then there is no valid substitution becase V would have to match top(b) and top(a) simultaneously, which is not possible.

Unification

Unification is just like pattern matching, except that both terms can contain variables. So we can no longer say one is the pattern term and the other the constant term. For example:

  • First term: f(a, V, bar(D))
  • Second term f(D, k, bar(a))

Given two such terms, finding a variable substitution that will make them equivalent is called unification. In this case the substitution is {D=a, V=k}.

Note that there is an infinite number of possible unifiers for some solvable unification problem. For example, given:

  • First term: f(X, Y)
  • Second term: f(Z, g(X))

We have the substitution {X=Z, Y=g(X)} but also something like {X=K, Z=K, Y=g(K)} and {X=j(K), Z=j(K), Y=g(j(K))} and so on. The first substitution is the simplest one, and also the most general. It's called the most general unifier or mgu. Intuitively, the mgu can be turned into any other unifier by performing another substitution. For example {X=Z, Y=g(X)} can be turned into {X=j(K), Z=j(K), Y=g(j(K))} by applying the substitution {Z=j(K)} to it. Note that the reverse doesn't work, as we can't turn the second into the first by using a substitution. So we say that {X=Z, Y=g(X)} is the most general unifier for the two given terms, and it's the mgu we want to find.

An algorithm for unification

Solving unification problems may seem simple, but there are a number of subtle corner cases to be aware of. In his 1991 paper Correcting a Widespread Error in Unification Algorithms, Peter Norvig noted a common error that exists in many books presenting the algorithm, including SICP.

The correct algorithm is based on J.A. Robinson's 1965 paper "A machine-oriented logic based on the resolution principle". More efficient algorithms have been developed over time since it was first published, but our focus here will be on correctness and simplicity rather than performance.

The following implementation is based on Norvig's, and the full code (with tests) is available on GitHub. This implementation uses Python 3, while Norvig's original is in Common Lisp. There's a slight difference in representations too, as Norvig uses the Lisp-y (f X Y) syntax to denote an application of function f. The two representations are isomorphic, and I'm picking the more classical one which is used in most papers on the subject. In any case, if you're interested in the more Lisp-y version, I have some Clojure code online that ports Norvig's implementation more directly.

We'll start by defining the data structure for terms:

class Term: pass class App(Term): def __init__(self, fname, args=()): self.fname = fname self.args = args # Not shown here: __str__ and __eq__, see full code for the details... class Var(Term): def __init__(self, name): self.name = name class Const(Term): def __init__(self, value): self.value = value

An App represents the application of function fname to a sequence of arguments.

def unify(x, y, subst): """Unifies term x and y with initial subst. Returns a subst (map of name->term) that unifies x and y, or None if they can't be unified. Pass subst={} if no subst are initially known. Note that {} means valid (but empty) subst. """ if subst is None: return None elif x == y: return subst elif isinstance(x, Var): return unify_variable(x, y, subst) elif isinstance(y, Var): return unify_variable(y, x, subst) elif isinstance(x, App) and isinstance(y, App): if x.fname != y.fname or len(x.args) != len(y.args): return None else: for i in range(len(x.args)): subst = unify(x.args[i], y.args[i], subst) return subst else: return None

unify is the main function driving the algorithm. It looks for a substitution, which is a Python dict mapping variable names to terms. When either side is a variable, it calls unify_variable which is shown next. Otherwise, if both sides are function applications, it ensures they apply the same function (otherwise there's no match) and then unifies their arguments one by one, carefully carrying the updated substitution throughout the process.

def unify_variable(v, x, subst): """Unifies variable v with term x, using subst. Returns updated subst or None on failure. """ assert isinstance(v, Var) if v.name in subst: return unify(subst[v.name], x, subst) elif isinstance(x, Var) and x.name in subst: return unify(v, subst[x.name], subst) elif occurs_check(v, x, subst): return None else: # v is not yet in subst and can't simplify x. Extend subst. return {**subst, v.name: x}

The key idea here is recursive unification. If v is bound in the substitution, we try to unify its definition with x to guarantee consistency throughout the unification process (and vice versa when x is a variable). There's another function being used here - occurs_check; I'm retaining its classical name from early presentations of unification. Its goal is to guarantee that we don't have self-referential variable bindings like X=f(X) that would lead to potentially infinite unifiers.

def occurs_check(v, term, subst): """Does the variable v occur anywhere inside term? Variables in term are looked up in subst and the check is applied recursively. """ assert isinstance(v, Var) if v == term: return True elif isinstance(term, Var) and term.name in subst: return occurs_check(v, subst[term.name], subst) elif isinstance(term, App): return any(occurs_check(v, arg, subst) for arg in term.args) else: return False

Let's see how this code handles some of the unification examples discussed earlier in the post. Starting with the pattern matching example, where variables are just one one side:

>>> unify(parse_term('f(a, b, bar(t))'), parse_term('f(a, V, X)'), {}) {'V': b, 'X': bar(t)}

Now the examples from the Unification section:

>>> unify(parse_term('f(a, V, bar(D))'), parse_term('f(D, k, bar(a))'), {}) {'D': a, 'V': k} >>> unify(parse_term('f(X, Y)'), parse_term('f(Z, g(X))'), {}) {'X': Z, 'Y': g(X)}

Finally, let's try one where unification will fail due to two conflicting definitions of variable X.

>>> unify(parse_term('f(X, Y, X)'), parse_term('f(r, g(X), p)'), {}) None

Lastly, it's instructive to trace through the execution of the algorithm for a non-trivial unification to see how it works. Let's unify the terms f(X,h(X),Y,g(Y)) and f(g(Z),W,Z,X):

  • unify is called, sees the root is an App of function f and loops over the arguments.
    • unify(X, g(Z)) invokes unify_variable because X is a variable, and the result is augmenting subst with X=g(Z)
    • unify(h(X), W) invokes unify_variable because W is a variable, so the subst grows to {X=g(Z), W=h(X)}
    • unify(Y, Z) invokes unify_variable; since neither Y nor Z are in subst yet, the subst grows to {X=g(Z), W=h(X), Y=Z} (note that the binding between two variables is arbitrary; Z=Y would be equivalent)
    • unify(g(Y), X) invokes unify_variable; here things get more interesting, because X is already in the subst, so now we call unify on g(Y) and g(Z) (what X is bound to)
      • The functions match for both terms (g), so there's another loop over arguments, this time only for unifying Y and Z
      • unify_variable for Y and Z leads to lookup of Y in the subst and then unify(Z, Z), which returns the unmodified subst; the result is that nothing new is added to the subst, but the unification of g(Y) and g(Z) succeeds, because it agrees with the existing bindings in subst
  • The final result is {X=g(Z), W=h(X), Y=Z}
Efficiency

The algorithm presented here is not particularly efficient, and when dealing with large unification problems it's wise to consider more advanced options. It does too much copying around of subst, and also too much work is repeated because we don't try to cache terms that have already been unified.

For a good overview of the efficiency of unification algorithms, I recommend checking out two papers:

  • "An Efficient Unificaiton algorithm" by Martelli and Montanari
  • "Unification: A Multidisciplinary survey" by Kevin Knight
Categories: FLOSS Project Planets

Eli Bendersky: Elegant Python code for a Markov chain text generator

Sat, 2024-05-04 22:46

While preparing the post on minimal char-based RNNs, I coded a simple Markov chain text generator to serve as a comparison for the quality of the RNN model. That code turned out to be concise and quite elegant (IMHO!), so it seemed like I should write a few words about it.

It's so short I'm just going to paste it here in its entirety, but this link should have it in a Python file with some extra debugging information for tinkering, along with a sample input file.

from collections import defaultdict, Counter import random import sys # This is the length of the "state" the current character is predicted from. # For Markov chains with memory, this is the "order" of the chain. For n-grams, # n is STATE_LEN+1 since it includes the predicted character as well. STATE_LEN = 4 data = sys.stdin.read() model = defaultdict(Counter) print('Learning model...') for i in range(len(data) - STATE_LEN): state = data[i:i + STATE_LEN] next = data[i + STATE_LEN] model[state][next] += 1 print('Sampling...') state = random.choice(list(model)) out = list(state) for i in range(400): out.extend(random.choices(list(model[state]), model[state].values())) state = state[1:] + out[-1] print(''.join(out))

Without going into too much details, a Markov Chain is a model describing the probabilities of events based on the current state only (without having to recall all past states). It's very easy to implement and "train".

In the code shown above, the most important part to grok is the data structure model. It's a dictionary mapping a string state to the probabilities of characters following this state. The size of that string is configurable, but let's just assume it's 4 for the rest of the discussion. This is the order of the Markov chain. For every string seen in the input, we look at the character following it and increment a counter for that character; the end result is a dictionary mapping the alphabet to integers. For example, we may find that for the state "foob", 'a' appeared 75 times right after it, 'b' appeared 25 times, 'e' 44 times and so on.

The learning process is simply sliding a "window" of 4 characters over the input, recording these appearances:

The learning loop is extremely concise; this is made possible by the right choice of Python data structures. First, we use a defaultdict for the model itself; this lets us avoid existence checks or try for states that don't appear in the model at all.

Second, the objects contained inside model are of type Counter, which is a subclass of dict with some special sauce. In its most basic usage, a counter is meant to store an integer count for its keys - exactly what we need here. So a lot of power is packed into this simple statement:

model[state][next] += 1

If you try to rewrite it with model being a dict of dicts, it will become much more complicated to keep track of the corner cases.

With the learning loop completed, we have in model every 4-letter string encountered in the text, mapped to its Counter of occurrences for the character immediately following it. We're ready to generate text, or "sample from the model".

We start by picking a random state that was seen in the training text. Then, we loop for an arbitrary bound and at every step we randomly select the following character, and update the current state. The following character is selected using weighted random selection - precisely the right idiom here, as we already have in each counter the "weights" - the more often some char was observed after a given state, the higher the chance to select it for sampling will be.

Starting with Python 3.6, the standard library has random.choices to implement weighted random selection. Before Python 3.6 we'd have to write that function on our own (Counter has the most_common() method that would make it easier to write an efficient version).

Categories: FLOSS Project Planets

Python People: Rob Ludwick - Getting the most out of PyCon, including juggling

Sat, 2024-05-04 17:57

PyCon US is just around the corner.  I've asked Rob Ludwick to come on the show to discuss how to get the most out of your PyCon experience. There's a lot to do. A lot of activities to juggle, including actual juggling, which is where we start the conversation.

Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.

We talk about: 
- Juggling at PyCon
- How to get the most out of PyCon
    - Watching talks
    - Hallway track
    - Open spaces
    - Lightening talks
    - Expo hall / vendor space
    - Poster sessions
    - Job fair
    - A welcoming community
    - Tutorials 
    - Sprints
    - But mostly about the people of Python and PyCon.

"Python enables smart people to work faster" - Rob Ludwick


The Complete pytest Course

★ Support this podcast on Patreon ★ <p>PyCon US is just around the corner.  I've asked Rob Ludwick to come on the show to discuss how to get the most out of your PyCon experience. There's a lot to do. A lot of activities to juggle, including actual juggling, which is where we start the conversation.</p><p>Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.</p><p>We talk about: <br>- Juggling at PyCon<br>- How to get the most out of PyCon<br>    - Watching talks<br>    - Hallway track<br>    - Open spaces<br>    - Lightening talks<br>    - Expo hall / vendor space<br>    - Poster sessions<br>    - Job fair<br>    - A welcoming community<br>    - Tutorials <br>    - Sprints<br>    - But mostly about the people of Python and PyCon.</p><p>"Python enables smart people to work faster" - Rob Ludwick</p> <br><p><strong>The Complete pytest Course</strong></p><ul><li>Level up your testing skills and save time during coding and maintenance.</li><li>Check out <a href="https://courses.pythontest.com/p/complete-pytest-course">courses.pythontest.com</a></li></ul> <strong> <a href="https://www.patreon.com/PythonPeople" rel="payment" title="★ Support this podcast on Patreon ★">★ Support this podcast on Patreon ★</a> </strong>
Categories: FLOSS Project Planets

Test and Code: 220: Getting the most out of PyCon, including juggling - Rob Ludwick

Sat, 2024-05-04 17:54

PyCon US is just around the corner.  I've asked Rob Ludwick to come on the show to discuss how to get the most out of your PyCon experience. There's a lot to do. A lot of activities to juggle, including actual juggling, which is where we start the conversation.

Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.

I recorded this interview as an episode for one of my other podcasts, Python People. But I think it's got some great pre-conference advice, so I'm sharing it here on Python Test as well.

We talk about: 
- Juggling at PyCon
- How to get the most out of PyCon
    - Watching talks
    - Hallway track
    - Open spaces
    - Lightening talks
    - Expo hall / vendor space
    - Poster sessions
    - Job fair
    - A welcoming community
    - Tutorials 
    - Sprints
    - But mostly about the people of Python and PyCon.

"Python enables smart people to work faster" - Rob Ludwick


Sponsored by Mailtrap.io

  • An Email Delivery Platform that developers love. 
  • An email-sending solution with industry-best analytics, SMTP, and email API, SDKs for major programming languages, and 24/7 human support. 
  • Try for Free at MAILTRAP.IO

Sponsored by PyCharm Pro

The Complete pytest Course

  • For the fastest way to learn pytest, go to courses.pythontest.com
  • Whether your new to testing or pytest, or just want to maximize your efficiency and effectiveness when testing.
<p>PyCon US is just around the corner.  I've asked Rob Ludwick to come on the show to discuss how to get the most out of your PyCon experience. There's a lot to do. A lot of activities to juggle, including actual juggling, which is where we start the conversation.</p><p>Even if you never get a chance to go to PyCon, I hope this interview helps you get a feel for the welcoming aspect of the Python community.</p><p>I recorded this interview as an episode for one of my other podcasts, Python People. But I think it's got some great pre-conference advice, so I'm sharing it here on Python Test as well.</p><p>We talk about: <br>- Juggling at PyCon<br>- How to get the most out of PyCon<br>    - Watching talks<br>    - Hallway track<br>    - Open spaces<br>    - Lightening talks<br>    - Expo hall / vendor space<br>    - Poster sessions<br>    - Job fair<br>    - A welcoming community<br>    - Tutorials <br>    - Sprints<br>    - But mostly about the people of Python and PyCon.</p><p>"Python enables smart people to work faster" - Rob Ludwick</p> <br><p><strong>Sponsored by Mailtrap.io</strong></p><ul><li>An Email Delivery Platform that developers love. </li><li>An email-sending solution with industry-best analytics, SMTP, and email API, SDKs for major programming languages, and 24/7 human support. </li><li>Try for Free at <a href="https://l.rw.rw/pythontest">MAILTRAP.IO</a></li></ul><p><strong>Sponsored by PyCharm Pro</strong></p><ul><li>Use code PYTEST for 20% off PyCharm Professional at <a href="https://www.jetbrains.com/pycharm/">jetbrains.com/pycharm</a></li><li>Now with Full Line Code Completion</li><li>See how easy it is to run pytest from PyCharm at <a href="https://pythontest.com/pycharm/">pythontest.com/pycharm</a></li></ul><p><strong>The Complete pytest Course</strong></p><ul><li>For the fastest way to learn pytest, go to <a href="https://courses.pythontest.com/p/complete-pytest-course">courses.pythontest.com</a></li><li>Whether your new to testing or pytest, or just want to maximize your efficiency and effectiveness when testing.</li></ul>
Categories: FLOSS Project Planets

Trey Hunner: Installing a custom Python build with pyenv

Sat, 2024-05-04 00:26

I am so excited about the new Python REPL that will likely land in Python 3.13. I’ve been following this CPython pull request since I heard Pablo and Łukasz announce their work on the new Python REPL in episode 1 of their new core.py podcast.

Github notifications? 🤔

That pull request was quiet for many months, but in the last couple weeks, I started seeing email notifications in my inbox about it. I’ve never fancied myself a competent C developer and I try to steer clear from understanding TTY magic, so I have no idea what most of the commits do. But seeing activity on this pull request rejuvenated my excitement about this upcoming feature!

I also remember reading that the Python 3.13 feature freeze is coming up soon, so I’ve been silently cheering for that PR to make the cut before the deadline.

In the last few days, I decided that I should try committing to use this new REPL locally as my default Python environment. When I type python on my machine, I want to live in this new shiny REPL. I figure this will make it easier to spot bugs that might not have been noticed yet… though honestly it’ll mostly just allow me to try out this fancy new REPL first-hand.

Installing a custom CPython build in pyenv

I use pyenv to manage the many Python versions I have installed on my machine. I wondered whether it was possible to install a custom build of CPython with pyenv.

Instead of going to the pyenv documentation to figure out an answer, I argued with an AI until it gave me a working answer. I tried a few AI systems at first, but Claude seemed to give me the most promising-looking answer so it was the one I argued with for 5-10 minutes until I got a working solution.

First, I created this ~/.pyenv/plugins/python-build/share/python-build/3.13.0-pyrepl file:

1 2 3 prefer_openssl11 export PYTHON_BUILD_CONFIGURE_WITH_OPENSSL=1 install_package "pyrepl" "https://github.com/pablogsal/cpython/archive/pyrepl.tar.gz" standard verify_py39 ensurepip

Then I ran this command, which took a couple minutes:

1 $ pyenv install 3.13.0-pyrepl

After that, pyenv versions showed a new 3.13.0-pyrepl version:

1 2 3 4 5 6 7 8 $ pyenv versions system * 3.8.18 (set by /home/trey/.pyenv/version) * 3.9.18 (set by /home/trey/.pyenv/version) * 3.10.13 (set by /home/trey/.pyenv/version) * 3.11.6 (set by /home/trey/.pyenv/version) * 3.12.0 (set by /home/trey/.pyenv/version) 3.13.0-pyrepl

I then added 3.13.0-pyrepl to the top of my ~/.pyenv/version file to make this my default Python:

1 2 3 4 5 6 3.13.0-pyrepl 3.12.0 3.11.6 3.10.13 3.9.18 3.8.18

And it worked! Tying python showed the new colorful prompt.

Is is a bad idea to make this not-even-beta version of CPython the default Python on my machine? I have no idea. Everything’s been fine for the last 10 hours at least… 🤷

If you ever need to try installing a custom CPython build with pyenv, maybe the above instructions will work. They’re mostly generated by a large language model that didn’t give me a working answer until the third response… so feel free to let me know if it’s all wrong (or all right?).

After this adventure, I checked my podcast feed this evening only to realize that there’s a new core.py episode all about exactly this feature! If you’d like to hear some core developers nerd out about CPython development, give core.py a listen. You don’t need to understand how CPython development works to enjoy their enthusiasm. 💖

Categories: FLOSS Project Planets

scikit-learn: Note on Inline Authorship Information in scikit-learn

Fri, 2024-05-03 20:00
Author: Adrin Jalali

Historically, scikit-learn’s files have included authorship information similar to the following format:

# Authors: Author1, Author2, ... # License: BSD 3 clause

However, after a series of discussions which you can see in detail in this issue, we could list the following caveats to the status quo:

  • Authorship information was not up-to-date and in most cases, but not always, reflect the original authors of the file;
  • It was unfair to all other contributors who have been contributing to the code-base;
  • One can check the real authors and the history of the authors of any part of the code-base using git blame and other git tools.

Therefore we came to the conclusion to standardize all authorship information to mention “The scikit-learn developers”, and have the license notice as:

# Authors: The scikit-learn developers # License: BSD-3-Clause

The change is to happen gradually in the coming months after April 2024.

Categories: FLOSS Project Planets

Python Engineering at Microsoft: Python in Visual Studio Code – May 2024 Release

Fri, 2024-05-03 10:44

We’re excited to announce the May 2024 release of the Python and Jupyter extensions for Visual Studio Code!

This release includes the following announcements:

  • “Implement all inherited abstract classes” code action
  • New auto indentation setting
  • Debugpy removed from the Python extension in favor of the Python Debugger extension
  • Socket disablement now possible during testing
  • Pylance performance updates

If you’re interested, you can check the full list of improvements in our changelogs for the Python, Jupyter and Pylance extensions.

“Implement all inherited abstract classes” Code Action

Abstract classes serve as “blueprints” for other classes and help build modular, reusable code by promoting clear structure and requirements for subclasses to adhere to. To define an abstract class in Python, you can create a class that inherits from the ABC class in the abc module, and annotate its methods with the @abstractmethod decorator. Then, you can create new classes that inherit from this abstract class, and define an implementation for the base methods. Implementing these classes is easier with the latest Pylance pre-release! When defining a new class that inherits from an abstract one, you can now use the “Implement all inherited abstract classes” Code Action to automatically implement all abstract methods and properties from the parent class:

New auto indentation setting

Previously, Pylance’s auto indentation behavior was controlled through the editor.formatOnType setting, which used to be problematic if one would want to disable auto indentation, but enable format on type through other supported tools. To solve this problem, Pylance’s latest pre-release now has its own setting to control auto indentation behavior, python.analysis.autoIndent, which is enabled by default.

Debugpy removed from the Python extension in favor of the Python Debugger extension

In our February 2024 release blog, we announced moving all debugging functionality to the Python Debugger extension, which is installed by default alongside the Python extension. In this release, we have removed duplicate debugging code from the Python extension, which helps to decrease the extension download size. As part of this change, "type": "python" and "type": "debugpy" specified in your launch.json configuration file are both interpreted as references to the Python Debugger extension path. This ensures a seamless transition without requiring any modifications to existing configuration files to run and debug effectively. Moving forward, we recommend using "type": "debugpy" as this directly corresponds to the Python Debugger extension which provides support for both legacy and modern Python versions.

Socket disablement now possible during testing

You can now run tests with socket disablement from the testing UI. This is made possible by a switch in the communication between the Python extension and the test run subprocess to now use named-pipes as opposed to numbered ports. This feature is available on the Python Testing Rewrite, which is rolled out to all users by default and will soon be fully adopted in the Python extension.

Pylance Performance

The Pylance team has been receiving feedback that Pylance’s performance has degraded over the past few releases. As a result, we have made several smaller improvements to memory consumption and indexing including:

  • Improved performance for third-party packages indexing
  • Skipped Python files from workspace .conda environments from being scanned (@pylance-release#5191)
  • Skipped index on unnecessary py.typed file checks (@pyright#7652)
  • Reduced memory consumption by refactoring tokenizer and parser output (@pyright#7602)
  • Improved memory consumption for token creation (@pyright#7434)

For those who may still be experiencing performance issues with Pylance, we are kindly requesting for issues to be filed through the Pylance: Report Issue command from the Command Palette, ideally with logs, code samples and/or the packages that are installed in the working environment.

Additionally, we have added a couple of features in the latest Pylance pre-release version to help identify potential performance issues and gather additional information about issues you are facing. There is a new notification that prompts you to file an issue in the Pylance repo when the extension detects there may be a performance issue. Moreover, Pylance now provides a profiling command Pylance: Start Profiling that generates cpuprofile for all worker threads. This file is generated after starting and stopping profiling by triggering the Pylance: Start Profiling and Pylance: Stop Profiling commands and can be provided as additional data in an issue.

With these smaller improvements and additional ways to report performance issues, we hope to continue to make improvements to performance. We greatly appreciate the feedback and collaboration as we work to address issues!

Other Changes and Enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python and Jupyter Notebooks in Visual Studio Code. Some notable changes include:

  • Test Explorer displays projects using testscenarios with unittest and parameterized tests inside nested classes correctly (@vscode-python#22870).
  • Test Explorer now handles tests in workspaces with symlinks, specifically workspace roots which are children of symlink-ed paths, which is particularly helpful in WSL scenarios (@vscode-python#22658).

We would also like to extend special thanks to this month’s contributors:

Call for Community Feedback

As we are planning and prioritizing future work, we value your feedback! Below are a few issues we would love feedback on:

Try out these new improvements by downloading the Python extension and the Jupyter extension from the Marketplace, or install them directly from the extensions view in Visual Studio Code (Ctrl + Shift + X or ⌘ + ⇧ + X). You can learn more about Python support in Visual Studio Code in the documentation. If you run into any problems or have suggestions, please file an issue on the Python VS Code GitHub page.

The post Python in Visual Studio Code – May 2024 Release appeared first on Python.

Categories: FLOSS Project Planets

Real Python: The Real Python Podcast – Episode #203: Embarking on a Relaxed and Friendly Python Coding Journey

Fri, 2024-05-03 08:00

Do you get stressed while trying to learn Python? Do you prefer to build small programs or projects as you continue your coding journey? This week on the show, Real Python author Stephen Gruppetta is here to talk about his new book, "The Python Coding Book."

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Python Software Foundation: The PSF's 2023 Annual Impact Report is here!

Fri, 2024-05-03 06:36

 

2023 was an exciting year of growth for the Python Software Foundation! We’ve captured some of the key numbers, details, and information in our latest Annual Impact Report. Some highlights of what you’ll find in the report include:

  • A letter from our Executive Director, Deb Nicholson
  • Notes from Our PyCon US Chair, Marietta Wijaya, and PSF Board of Director Chair, Dawn Wages
  • Updates on the achievements and activities of a couple of our Developers-in-Residence, Łukasz Langa and Seth Larson—and announcing more members of the DiR team!
  • An overview of what our PyPI Safety & Security Engineer, Mike Fiedler, has accomplished- as well as some eye-watering PyPI stats!
  • A celebration and summary of PyCon US 2023, the event’s 20th anniversary, and the theme for 2023’s report cover
  • A highlight of our Fiscal Sponsorees (we brought on 7 new organizations this year!)
  • Sponsors who generously supported our work and the Python ecosystem
  • An overview of PSF Financials, including a consolidated financial statement and grants data

We hope you check out the report, share it with your Python friends, and let us know what you think! You can comment here, find us on social media (Mastodon, X, LinkedIn), or share your thoughts on our forum.

Categories: FLOSS Project Planets

TestDriven.io: Building Reusable Components in Django

Thu, 2024-05-02 18:28
This tutorial looks at how to build server-side UI components in Django.
Categories: FLOSS Project Planets

Django Weblog: June 2024 marks 10 incredible years of Django Girls magic! 🥳✨

Thu, 2024-05-02 11:10

June 2024 marks 10 incredible years of Django Girls magic! 🥳✨

We couldn't have reached this milestone without YOU! Whether you attended a workshop, volunteered, financially supported us, or cheered us on, you've been vital. From the bottom of our hearts, thank you for being part of the Django Girls community. 💕

To celebrate, we're reflecting on our impact and want to hear from YOU! Share your stories in a short survey courtesy of JetBrains and PyCharm. Your feedback will help us improve and reach more people.

The Theme for our 10th anniversary is “The Django Girls Glow Up!” ✨💃

We want to celebrate your positive transformations over the years!

In the survey, please share a photo 📸 or video and tell us how Django Girls has impacted your life. As a thank you, you could win a $100 Amazon gift card or a 1-year JetBrains All Products Pack subscription. Plus, everyone gets a three-month PyCharm Professional trial!

Ready to join the celebration? Click the link to complete the survey and let your Django Girl glow shine! ✨

Take the Survey Now: https://surveys.jetbrains.com/s3/dn-django-girls-survey-2024

When you’ve finished the survey, head over to our socials, and let’s continue celebrating there. Use the #DjangoGirlsGlowUp hashtag to share your photos and stories, and let's spread the love! 🚀💖

Find us on our socials:

Thank you for being part of our journey. Here's to another 10 years of glowing up together! 🌟💫

Categories: FLOSS Project Planets

Python Morsels: Variables are pointers in Python

Thu, 2024-05-02 11:00

Python's variables are not buckets that contain objects; they're pointers. Assignment statements don't copy: they point a variable to a value (and multiple variables can "point" to the same value).

Table of contents

  1. Changing two lists at once...?
  2. Variables are separate from objects
  3. Assignment statements don't copy
  4. Explicitly copying a list
  5. Variables are like pointers, not buckets

Changing two lists at once...?

Here we have a variable a that points to a list:

>>> a = [2, 1, 3, 4]

Let's make a new variable b and assign it to a:

>>> a = [2, 1, 3, 4] >>> b = a

If we append a new item to b, what will its length be?

>>> b.append(7) >>> len(b)

Initially, the b list had four items, so now it should have five items. And it does:

>>> len(b) 5

How many items do you think a has? What's your guess?

>>> len(a)

Is it five, the same as b? Or is it still four, as it was before?

The a list also has five items:

>>> len(a) 5

What's going on here?

Well, the variables a and b, both point to the same list.

If we look up the unique ID for the object that each of these variables points to, we'll see that they both point to the same object:

>>> id(a) 140534104117312 >>> id(b) 140534104117312

This is possible because variables in Python are not buckets, but pointers.

Variables are separate from objects

Let's say we've made three …

Read the full article: https://www.pythonmorsels.com/variables-are-pointers/
Categories: FLOSS Project Planets

Mike Driscoll: The Python Show Podcast Ep 39 – Buttondown – A Python SaaS with Justin Duke

Thu, 2024-05-02 08:58

In this episode, we invite the founder of Buttondown, a Python-based Software as a Service (SaaS) application for creating and managing newsletters.

Mike Driscoll, the host of the show, chats with Justin about the following topics:

  • Why he created a SaaS with Python
  • Favorite Python packages or modules
  • Python web frameworks
  • Entrepreneurship
  • AI and programming
  • and more!

The post The Python Show Podcast Ep 39 – Buttondown – A Python SaaS with Justin Duke appeared first on Mouse Vs Python.

Categories: FLOSS Project Planets

Real Python: Quiz: The Python calendar Module

Thu, 2024-05-02 08:00

In this quiz, you’ll test your understanding of creating calendars in Python using the calendar module.

By working through this quiz, you’ll revisit the fundamental functions and methods provided by the calendar module.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

Categories: FLOSS Project Planets

Talk Python to Me: #460: Dropbase: Build Internal Tools with Python

Thu, 2024-05-02 04:00
Do you find yourself or your team building internal apps frequently for your company? Are you familiar with the term "forms over data"? They are super empowering for your org but they can be pretty repetitive and you might find yourself spending more time than you'd like working on them rather than core products and services. I invited Jimmy Chan from Dropbase to tell us about their service who's tagline is "Build internal web apps with just Python." It's a cool service and a fun conversation.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/mailtrap'>Mailtrap</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Build internal web apps with just Python.</b>: <a href="https://www.dropbase.io" target="_blank" rel="noopener">dropbase.io</a><br/> <b>Dropbase on Github</b>: <a href="https://github.com/DropbaseHQ/dropbase" target="_blank" rel="noopener">github.com</a><br/> <b>Dropbase @ LinkedIn</b>: <a href="https://www.linkedin.com/company/dropbase" target="_blank" rel="noopener">linkedin.com</a><br/> <b>Dropbase on Twitter</b>: <a href="https://twitter.com/dropbasehq" target="_blank" rel="noopener">twitter.com</a><br/> <b>Jimmy Chan</b>: <a href="https://www.linkedin.com/in/jimmyechan/" target="_blank" rel="noopener">linkedin.com</a><br/> <b>Jimmy on Twitter</b>: <a href="https://twitter.com/jimmyechan" target="_blank" rel="noopener">twitter.com</a><br/> <b>Dropbase Docs</b>: <a href="https://docs.dropbase.io" target="_blank" rel="noopener">docs.dropbase.io</a><br/> <b>Dropbase</b>: <a href="https://dropbase.io" target="_blank" rel="noopener">dropbase.io</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=0VULU3g8wqU" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/460/dropbase-build-internal-tools-with-python" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>
Categories: FLOSS Project Planets

The Python Show: 39 - Buttondown - A Python SaaS with Justin Duke

Wed, 2024-05-01 22:07

In this episode, we invite the founder of Buttondown, a Python-based Software as a Service (SaaS) application for creating and managing newsletters.

Mike Driscoll, the host of the show, chats with Justin about the following topics:

  • Why he created a SaaS with Python

  • Favorite Python packages or modules

  • Python web frameworks

  • Entrepreneurship

  • AI and programming

  • and more!

Categories: FLOSS Project Planets

Wingware: Wing Python IDE Version 10.0.4 - May 3, 2024

Wed, 2024-05-01 21:00

Wing 10.0.4 improves performance of the Python 3.12+ debugger, fixes debugging the Python Shell with Python 3.12, and makes several other improvements.

See the change log for details.

Download Wing 10 Now: Wing Pro | Wing Personal | Wing 101 | Compare Products


What's New in Wing 10

AI Assisted Development

Wing Pro 10 takes advantage of recent advances in the capabilities of generative AI to provide powerful AI assisted development, including AI code suggestion, AI driven code refactoring, description-driven development, and AI chat. You can ask Wing to use AI to (1) implement missing code at the current input position, (2) refactor, enhance, or extend existing code by describing the changes that you want to make, (3) write new code from a description of its functionality and design, or (4) chat in order to work through understanding and making changes to code.

Examples of requests you can make include:

"Add a docstring to this method" "Create unit tests for class SearchEngine" "Add a phone number field to the Person class" "Clean up this code" "Convert this into a Python generator" "Create an RPC server that exposes all the public methods in class BuildingManager" "Change this method to wait asynchronously for data and return the result with a callback" "Rewrite this threaded code to instead run asynchronously"

Yes, really!

Your role changes to one of directing an intelligent assistant capable of completing a wide range of programming tasks in relatively short periods of time. Instead of typing out code by hand every step of the way, you are essentially directing someone else to work through the details of manageable steps in the software development process.

Read More

Support for Python 3.12

Wing 10 adds support for Python 3.12, including (1) faster debugging with PEP 669 low impact monitoring API, (2) PEP 695 parameterized classes, functions and methods, (3) PEP 695 type statements, and (4) PEP 701 style f-strings.

Poetry Package Management

Wing Pro 10 adds support for Poetry package management in the New Project dialog and the Packages tool in the Tools menu. Poetry is an easy-to-use cross-platform dependency and package manager for Python, similar to pipenv.

Ruff Code Warnings & Reformatting

Wing Pro 10 adds support for Ruff as an external code checker in the Code Warnings tool, accessed from the Tools menu. Ruff can also be used as a code reformatter in the Source > Reformatting menu group. Ruff is an incredibly fast Python code checker that can replace or supplement flake8, pylint, pep8, and mypy.


Try Wing 10 Now!

Wing 10 is a ground-breaking new release in Wingware's Python IDE product line. Find out how Wing 10 can turbocharge your Python development by trying it today.

Downloads: Wing Pro | Wing Personal | Wing 101 | Compare Products

See Upgrading for details on upgrading from Wing 9 and earlier, and Migrating from Older Versions for a list of compatibility notes.

Categories: FLOSS Project Planets

Seth Michael Larson: Isolating risk in the CPython release process

Wed, 2024-05-01 20:00
Isolating risk in the CPython release process AboutBlogNewsletterLinks Isolating risk in the CPython release process

Published 2024-05-02 by Seth Larson
Reading time: minutes

This critical role would not be possible without funding from the Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

The first stage of the CPython release process produces source and docs artifacts. In terms of "supply chain integrity", the source artifacts are the most important artifact produced by this process. These tarballs are what propagates down into containers, pyenv, and operating system distributions, so reducing the risk that these artifacts are modified in-flight is critical.

A few weeks ago I published that CPythons' release process for source and docs artifacts was moved from developers machines onto GitHub Actions, which provides an isolated build environment.

This already reduces risk of artifacts being accidentally or maliciously modified during the release process. The layout of the build and release process before used a build script which built the software from source, built the docs, and then ran tests all in the same isolated job. This was totally fine on a developers' machine where there isn't any isolation possible between different stages.

Build DependenciesBuild Dependenci...Build SourceBuild SourceSource ArtifactsSource Artifa...Docs ArtifactsDocs ArtifactsSource
CodeSource...Docs DependenciesDocs DependenciesBuild DependenciesBuild Dependenci...Build SourceBuild SourceSource ArtifactsSource Artifa...Docs ArtifactsDocs ArtifactsSource
CodeSource...Docs DependenciesDocs DependenciesBuild DocsBuild DocsTestingTestingBuild DocsBuild DocsTestingTestingSource ArtifactsSource Artifa...Text is not SVG - cannot display
Before and after splitting up build stages

With GitHub Actions we can isolate each stage from the others and remove the need to install all dependencies for all jobs into the same stage. This drastically reduces the number of dependencies, each representing a small amount of risk, for the stages that are critical for supply chain security of CPython (specifically the building of source artifacts).

Above you can see on the left the previous process which pulls all dependencies into the same job (represented as a gray box) and the right being the new process having split up the builds and testing and the source and docs builds.

After doing this split the "Build Source" task only needs ~170 dependencies instead of over 800 dependencies (mostly for documentation LaTeX and PDFs) and all of those dependencies either come with the operating system and thus can't be reduced further or are pinned in a lock file.

The testing stage still has access to the source artifacts, but only after they've been uploaded to GitHub Action Artifacts and aren't able to modify them.

I plan to write a separate in-depth article about dependencies, pinning, and related topics, stay tuned for that.

SOSS Community Day 2024 recordings

The recordings for my talk and the OpenSSF tabletop session have been published to YouTube:

Embrace the Differences: Securing open source software ecosystems where they are

In the talk I discuss the technical and also social aspects of why it's sometimes difficult to adopt security changes into an open source ecosystem. Ecosystem-agnostic work (think memory safety, provenance, reproducible builds, vulnerabilities) tends to operate at a much higher level than the individual ecosystems where the work ends up being applied.

OpenSSF Tabletop Session

The tabletop session had many contributors representing all the different aspects of discovering, debugging, disclosing, fixing, and patching a zero-day vulnerability in an open source component that's affecting production systems.

Tabletop Session moderated by Dana Wang

Mentoring for Google Summer of Code

Google Summer of Code 2024 recently published its program and among the projects and contributors accepted was CPython's project for adopting the Hardened Compiler Options Guide for C/C++. I'm mentoring the contributor through the process of contributing to CPython and hopefully being successful in adopting hardened compiler options.

Other items

That's all for this week! 👋 If you're interested in more you can read last week's report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.

This work is licensed under CC BY-SA 4.0

Categories: FLOSS Project Planets

Tryton News: Tryton Release 7.2

Wed, 2024-05-01 12:00

We are proud to announce the 7.2 release of Tryton.
This release provides many bug fixes, performance improvements and some fine tuning. It also adds 5 new modules.
You can give it a try on the demo server, use the docker image or download it here.
As usual upgrading from previous series is fully supported but some manual steps are needed to update from 7.0 to 7.2.

Here is a list of the most noticeable changes:

Changes for the User Clients

You can now request to reset your password from the login dialog. Doing this sends a temporary password to your email address.

The PYSON widgets display the value using operators which are more user-friendly.

Web Client

The binary and image widgets now support drag and drop to set their value.

Desktop Client

On list and tree views, there is now a contextual menu that allows you to copy the contents of a cell or a column.

Accounting

It is now possible to modify the dates of a period even if it contains posted moves as long as the existing moves stay inside the new period dates. This useful to correct mistakes or even extend a period.

A warning is now raised when you validate an invoice for which some lines do not have the expected default taxes. This helps to detect mistakes.

When an invoice in another currency is paid, the currency exchange amount is now booked automatically into a configured account.

You can now enter the amount of the transaction in a second currency on statements. This makes it easier to do the reconciliation between the statement and invoices based on a second currency.

Company

Employees are now automatically deactivated once their end date has passed.

It is now possible to use some placeholders in the header and footer of company reports like the company name, phone, website etc.

Marketing

Some reports are now available on marketing scenario and activities. They calculate and display the open, click and click-through rates.

UTM parameters can be added to marketing emails so you can follow their results.

Product

You can now store the Manufacturer Part Number and brand as a product identifier.

Tryton now supports to adding images to product categories.

You can now use non-square images on products. The module resizes the images to fit the requested size but keeps the aspect ratio.

Production

The production number is now only set when the order progresses to waiting. This prevents the supply module from consuming number for production request that are subsequently deleted.

Purchase

It is now possible to remove ignored invoices and stock moves from purchases. This is useful when you have ignored the invoice or shipping exception by mistake and need to correct it.

Sale

It is now possible to remove the ignored invoices and stock moves from sales. This is useful when you have ignored the invoice or shipping exception by mistake and need to correct it.

The product on sale opportunity lines can be omitted, a description and a note can be used instead.

Stock

The drop shipment (like the other shipments) can now be split. This is useful to match exactly how the supplier shipped the products.

The shipment numbers are now only set when it progresses to a waiting state. This prevents consuming sequences numbers for requests that are going to be deleted.

The lot trace now optionally displays the source and destination locations. This can be useful when investigating the traceability of a lot.

Web Shop

It is now possible to limit a web shop by country.

The web shop supports price lists to calculate the sale price and the non sale price.

New Modules Stock Product Location Place

The Stock Product Location Place Module allows defining the place where each product is stored within each location.

Account SYSCOHADA

The Account SYSCOHADA Module provides templates for the chart of account for OHADA countries.

Account Export

The Account Export Module provides the basis to allow accounting moves to be exported to external accounting software.

Account Export WinBooks

The Account Export WinBooks Module adds support to export accounting data to WinBooks.

Web Shop Product Data Feed

The Web Shop Product Data Feed Module exposes web shop products as a data feed for Google Merchant and Meta for business.

Changes for the System Administrator Server

It is now possible to update the database without updating the indexes or to create the indexes concurrently. These are useful options when updating busy system.

It is possible to define a timeout for some RPC calls. This helps preventing users from overloading the system with expensive requests.

Changes for the Developer Server

We added send_message methods to simplify sending emails using python’s Message.

A new kind of field fmany2one is now available, which is a type of many2one field but stores a different field to the id. It is used mainly in the infrastructure to create foreign keys based on a model or field name.

The read-only relational fields are no longer copied by default. This was source of various bugs as developers often forgot to disable these from the copy.

Clients

The clients read the xxx2many fields using dotted notation. This avoids making multiple requests when displaying a form with these fields.

The XML ID of a record is now displayed in the log window.

Script

It is possible to configure the scripting client to skip any warning.

Product

It is now possible to generate barcodes for a product using a different type than the one on the identifier.

Stock

The done buttons have been renamed to do.

Location name fields have been added to stock moves. This is useful to customize the information displayed in reports about the source and destination locations.

3 posts - 2 participants

Read full topic

Categories: FLOSS Project Planets

Anarcat: Tor migrates from Gitolite/GitWeb to GitLab

Wed, 2024-05-01 10:55

Note: I've been awfully silent here for the past ... (checks notes) oh dear, 3 months! But that's not because I've been idle, quite the contrary, I've been very busy but just didn't have time to write about anything. So I've taken it upon myself to write something about my work this week, and published this post on the Tor blog which I copy here for a broader audience. Let me know if you like this or not.

Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server.

Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:

https://git.torproject.org/

to:

https://gitlab.torproject.org/

In your Git configuration.

The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab.

Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab.

User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate.

The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining.

The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code?

Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request.

An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page.

Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost.

Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones.

A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form.

That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab.

Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for gitweb.torproject.org, instead search for git.torproject.org.

In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why?

This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago.

The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories.

There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab.

TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab.

In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab

The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules.

When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories.

When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.)

So a relatively large amount of Python code was produced to basically do the following:

  1. check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
  2. for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for multiple matches
  4. if a match is found, redirect if the repository is non-empty
    • we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
    • in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
  5. otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change

User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks.

Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order.

The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions.

The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:

anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration [...] INFO: skipping project project/help/infra in category Migrated to GitLab INFO: skipping project project/help/wiki in category Migrated to GitLab INFO: skipping project project/jenkins/jobs in category Migrated to GitLab INFO: skipping project project/jenkins/tools in category Migrated to GitLab INFO: searching for projects matching fastlane INFO: Successfully connected to https://gitlab.torproject.org import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane' INFO: building a new connect to cupani INFO: defaulting name to fastlane INFO: importing project into GitLab INFO: Successfully connected to https://gitlab.torproject.org INFO: loading group legacy/gitolite/project/tor-browser INFO: archiving project INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab [...]

In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:

$ fab -H cupani.torproject.org gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*' INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab [...] INFO: skipping project user/phw/atlas in category Migrated to GitLab INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic) INFO: Successfully connected to https://gitlab.torproject.org INFO: user repository detected, trying to find fork phw/obfsproxy WARNING: no existing fork found, entering user fork subroutine INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git) 0 legacy/gitolite/debian/obfsproxy 1 legacy/gitolite/debian/obfsproxy-legacy 2 legacy/gitolite/user/asn/obfsproxy 3 legacy/gitolite/user/ioerror/obfsproxy 4 tpo/anti-censorship/pluggable-transports/obfsproxy 5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy select parent to fork from, or enter to abort: ^G4 INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414 fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw INFO: waiting for fork to complete... INFO: fork status: started, sleeping... INFO: fork finished INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> => {'id': 2723, 'name': 'master', 'push_access_levels': [{'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None}], 'merge_access_levels': [{'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers'}], 'allow_force_push': False} INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'... INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy remote: remote: To create a merge request for bug_10887, visit: remote: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887 remote: [...] To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy + 2bf9d09...a8e54d5 master -> master (forced update) * [new branch] bug_10887 -> bug_10887 [...] INFO: migrating repo INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic) INFO: user repository detected, trying to find fork phw/scramblesuit WARNING: no existing fork found, entering user fork subroutine WARNING: no matching gitlab project found for user/phw/scramblesuit INFO: user fork subroutine failed, resuming normal procedure INFO: searching for projects matching scramblesuit import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository' INFO: importing project into GitLab INFO: Successfully connected to https://gitlab.torproject.org INFO: loading group legacy/gitolite/user/phw INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit INFO: archiving project INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab" INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab [...]

Acute eyes will notice the bell used as a notification mechanism as well in this transcript.

A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious.

This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone.

Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record.

The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules

Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.

RewriteEngine on # this RewriteMap connects the gitweb projects to their GitLab # equivalent RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt" # if this becomes a performance bottleneck, convert to a DBM map with: # # $ httxt2dbm -i mapfile.txt -o mapfile.map # # and: # # RewriteMap mapname "dbm:/etc/apache/mapfile.map" # # according to reports lavamind found online, we hit such a # performance bottleneck only around millions of entries, which is not our case # those two rules can go away once all the projects are # migrated to GitLab # # this matches the request URI so we can check the RewriteMap # for a match next # # WARNING: this won't match URLs without .git in them, which # *do* work now. one possibility would be to match the request # URI (without query string!) with: # # /git/(.*)(.git)?/(((branches|hooks|info|objects/).*)|git-.*|upload-pack|receive-pack|HEAD|config|description)?. # # I haven't been able to figure out the actual structure of # those URLs, so it's really hard to figure out the boundaries # of the project name here. I stopped after pouring around the # http-backend.c code in git # itself. https://www.git-scm.com/docs/http-protocol is also # kind of incomplete and unsatisfying. RewriteCond %{REQUEST_URI} ^/(git/)?(.*).git/.*$ # this makes the RewriteRule match only if there's a match in # the rewrite map RewriteCond ${gitolite2gitlab:%2|NOT_FOUND} !NOT_FOUND RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$2}.git/$3 [R=302,L] # Fallback everything else to GitLab RewriteRule (.*) https://gitlab.torproject.org [R=302,L] gitweb.torproject.org rewrite rules

Those are the vastly more complicated GitWeb to GitLab rewrite rules.

Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.

RewriteEngine on # this RewriteMap connects the gitweb projects to their GitLab # equivalent RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt" # special rule to process targets of the old spec.tpo site and # bring them to the right redirect on the new spec.tpo site. that should turn, for example: # # https://gitweb.torproject.org/torspec.git/tree/address-spec.txt # # into: # # https://spec.torproject.org/address-spec RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302] # list of endpoints taken from cgit's cmd.c # those two RewriteCond are necessary because we don't move # all repositories at once. once the migration is completed, # they can be removed. # # and yes, they are copied all over the place below # # create a match for the project name to check if the project # has been moved to GitLab RewriteCond %{REQUEST_URI} ^/(.*).git(/.*)?$ # this makes the RewriteRule match only if there's a match in # the rewrite map RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND # main project page, like summary below RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # summary RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # about RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/ [R=302,L] # commit RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond "%{QUERY_STRING}" "(.*(?:^|&))id=([^&]*)(&.*)?$" RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L] # diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1 [R=302,L,QSD] # patch RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.patch [R=302,L,QSD] # rawdiff, incomplete because can show only one file diff, which GitLab cannot RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commit/%1.diff [R=302,L,QSD] # log RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD$2 [R=302,L] # atom RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/%1 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/commits/HEAD [R=302,L,QSD] # refs, incomplete because two pages in GitLab, defaulting to "tags" RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags [R=302,L] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tags/%1 [R=302,L,QSD] # tree RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} id=([^&]*) RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/%1$2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD$2 [R=302,L] # /-/tree has no good default in GitLab, revert to HEAD which is a good # approximation (we can't assume "master" here anymore) RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/tree/HEAD [R=302,L] # plain RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteCond %{QUERY_STRING} h=([^&]*) RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/%1$2 [R=302,L,QSD] RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/raw/HEAD$2 [R=302,L] # blame: disabled #RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ #RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND #RewriteCond %{QUERY_STRING} h=([^&]*) #RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/%1$2 [R=302,L,QSD] # same default as tree above #RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ #RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND #RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/blame/HEAD/$2 [R=302,L] # stats RewriteCond %{REQUEST_URI} ^/(.*).git/.*$ RewriteCond ${gitolite2gitlab:%1|NOT_FOUND} !NOT_FOUND RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/${gitolite2gitlab:$1}/-/graphs/HEAD [R=302,L] # still TODO: # repolist: once migration is complete # # cannot be done: # atom: needs a feed token, user must be logged in # blob: no direct equivalent # info: not working on main cgit website? # ls_cache: not working, irrelevant? # objects: undocumented? # snapshot: pattern too hard to match on cgit's side # special case, we keep a copy of the main index on the archive RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L] # Fallback: everything else to GitLab RewriteRule .* https://gitlab.torproject.org [R=302,L]

The reference copy of those is available in our (currently private) Puppet git repository.

Categories: FLOSS Project Planets

Pages