Planet Python
Real Python: Python range(): Represent Numerical Ranges
In Python, the range() function generates a sequence of numbers, often used in loops for iteration. By default, it creates numbers starting from 0 up to but not including a specified stop value. You can also reverse the sequence with reversed(). If you need to count backwards, then you can use a negative step, like range(start, stop, -1), which counts down from start to stop.
The range() function is not just about iterating over numbers. It can also be used in various programming scenarios beyond simple loops. By mastering range(), you can write more efficient and readable Python code. Explore how range() can simplify your code and when alternatives might be more appropriate.
By the end of this tutorial, you’ll understand that:
- A range in Python is an object representing an interval of integers, often used for looping.
- The range() function can be used to generate sequences of numbers that can be converted to lists.
- for i in range(5) is a loop that iterates over the numbers from 0 to 4, inclusive.
- The range parameters start, stop, and step define where the sequence begins, ends, and the interval between numbers.
- Ranges can go backward in Python by using a negative step value and reversed by using reversed().
A range is a Python object that represents an interval of integers. Usually, the numbers are consecutive, but you can also specify that you want to space them out. You can create ranges by calling range() with one, two, or three arguments, as the following examples show:
Python >>> list(range(5)) [0, 1, 2, 3, 4] >>> list(range(1, 7)) [1, 2, 3, 4, 5, 6] >>> list(range(1, 20, 2)) [1, 3, 5, 7, 9, 11, 13, 15, 17, 19] Copied!In each example, you use list() to explicitly list the individual elements of each range. You’ll study these examples in more detail later on.
A range can be an effective tool. However, throughout this tutorial, you’ll also explore alternatives that may work better in some situations. You can click the link below to download the code that you’ll see in this tutorial:
Get Your Code: Click here to download the free sample code that shows you how to represent numerical ranges in Python.
Construct Numerical RangesIn Python, range() is built in. This means that you can always call range() without doing any preparations first. Calling range() constructs a range object that you can put to use. Later, you’ll see practical examples of how to use range objects.
You can provide range() with one, two, or three integer arguments. This corresponds to three different use cases:
- Ranges counting from zero
- Ranges of consecutive numbers
- Ranges stepping over numbers
You’ll learn how to use each of these next.
Count From ZeroWhen you call range() with one argument, you create a range that counts from zero and up to, but not including, the number you provided:
Python >>> range(5) range(0, 5) Copied!Here, you’ve created a range from zero to five. To see the individual elements in the range, you can use list() to convert the range to a list:
Python >>> list(range(5)) [0, 1, 2, 3, 4] Copied!Inspecting range(5) shows that it contains the numbers zero, one, two, three, and four. Five itself is not a part of the range. One nice property of these ranges is that the argument, 5 in this case, is the same as the number of elements in the range.
Count From Start to StopYou can call range() with two arguments. The first value will be the start of the range. As before, the range will count up to, but not include, the second value:
Python >>> range(1, 7) range(1, 7) Copied!The representation of a range object just shows you the arguments that you provided, so it’s not super helpful in this case. You can use list() to inspect the individual elements:
Python >>> list(range(1, 7)) [1, 2, 3, 4, 5, 6] Copied! Read the full article at https://realpython.com/python-range/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Efficient String Concatenation in Python
Python string concatenation is a fundamental operation that combines multiple strings into a single string. In Python, you can concatenate strings using the + operator or the += operator for appending. For more efficient concatenation of multiple strings, the .join() method is recommended, especially when working with strings in a list. Other techniques include using StringIO for large datasets or the print() function for quick screen outputs.
By the end of this tutorial, you’ll understand that:
- You can concatenate strings in Python using the + operator and the += operator.
- You can use += to append a string to an existing string.
- The .join() method is used to combine strings in a list in Python.
- You can handle a stream of strings efficiently by using StringIO as a container with a file-like interface.
To get the most out of this tutorial, you should have a basic understanding of Python, especially its built-in string data type.
Get Your Code: Click here to download the free sample code that shows you how to efficiently concatenate strings in Python.
Doing String Concatenation With Python’s Plus Operator (+)String concatenation is a pretty common operation consisting of joining two or more strings together end to end to build a final string. Perhaps the quickest way to achieve concatenation is to take two separate strings and combine them with the plus operator (+), which is known as the concatenation operator in this context:
Python >>> "Hello, " + "Pythonista!" 'Hello, Pythonista!' >>> head = "String Concatenation " >>> tail = "is Fun in Python!" >>> head + tail 'String Concatenation is Fun in Python!' Copied!Using the concatenation operator to join two strings provides a quick solution for concatenating only a few strings.
For a more realistic example, say you have an output line that will print an informative message based on specific criteria. The beginning of the message might always be the same. However, the end of the message will vary depending on different criteria. In this situation, you can take advantage of the concatenation operator:
Python >>> def age_group(age): ... if 0 <= age <= 9: ... result = "a Child!" ... elif 9 < age <= 18: ... result = "an Adolescent!" ... elif 19 < age <= 65: ... result = "an Adult!" ... else: ... result = "in your Golden Years!" ... print("You are " + result) ... >>> age_group(29) You are an Adult! >>> age_group(14) You are an Adolescent! >>> age_group(68) You are in your Golden Years! Copied!In the above example, age_group() prints a final message constructed with a common prefix and the string resulting from the conditional statement. In this type of use case, the plus operator is your best option for quick string concatenation in Python.
The concatenation operator has an augmented version that provides a shortcut for concatenating two strings together. The augmented concatenation operator (+=) has the following syntax:
Python string += other_string Copied!This expression will concatenate the content of string with the content of other_string. It’s equivalent to saying string = string + other_string.
Here’s a short example of how the augmented concatenation operator works in practice:
Python >>> word = "Py" >>> word += "tho" >>> word += "nis" >>> word += "ta" >>> word 'Pythonista' Copied!In this example, every augmented assignment adds a new syllable to the final word using the += operator. This concatenation technique can be useful when you have several strings in a list or any other iterable and want to concatenate them in a for loop:
Python >>> def concatenate(iterable, sep=" "): ... sentence = iterable[0] ... for word in iterable[1:]: ... sentence += (sep + word) ... return sentence ... >>> concatenate(["Hello,", "World!", "I", "am", "a", "Pythonista!"]) 'Hello, World! I am a Pythonista!' Copied!Inside the loop, you use the augmented concatenation operator to quickly concatenate several strings in a loop. Later you’ll learn about .join(), which is an even better way to concatenate a list of strings.
Python’s concatenation operators can only concatenate string objects. If you use them with a different data type, then you get a TypeError:
Python >>> "The result is: " + 42 Traceback (most recent call last): ... TypeError: can only concatenate str (not "int") to str >>> "Your favorite fruits are: " + ["apple", "grape"] Traceback (most recent call last): ... TypeError: can only concatenate str (not "list") to str Copied!The concatenation operators don’t accept operands of different types. They only concatenate strings. A work-around to this issue is to explicitly use the built-in str() function to convert the target object into its string representation before running the actual concatenation:
Python >>> "The result is: " + str(42) 'The result is: 42' Copied!By calling str() with your integer number as an argument, you’re retrieving the string representation of 42, which you can then concatenate to the initial string because both are now string objects.
Read the full article at https://realpython.com/python-string-concatenation/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Zero to Mastery: Python Monthly Newsletter 💻🐍
Seth Michael Larson: How do I pay the publisher of a web page?
Published 2024-11-24 by Seth Larson
Reading time: minutes
Here's an unanswered question:
I have money and I have a URL, how do I send money to the publisher of that URL?
URLs tell you where to get content on the web, but they don't tell you anything about how to support the person who created the content. This story might sound similar to paying open source maintainers, where a user can almost abstract an entire project to a single download URL.
There are tons of people creating content for the web and plenty of ways to get paid (Patreon, Kofi, GitHub Sponsors, YouTube Paid Membership), but there's no standardized way to direct someone interested in paying for the content of a page in the right direction.
We have HTML meta headers for many things, including where to find an RSS feed or what my Fediverse handle is, but none for enumerating options to pay the creator of the content. I wish I could click a button to easily send a "tip" to someone who created something I enjoy or to browse other options for supporting them.
Existing technology Payment Request APIThere are things like the web "Payment Request API" which gives you a JavaScript API for generating a payment, but this doesn't fit my criteria.
For one: this means that every person creating content for the web needs to add JavaScript to their page. This is a much higher bar than simply linking to existing payment methods that a creator already likely uses to get paid. Being difficult means it's unlikely for large numbers of people to do the work.
I also don't see being able to automate this because of the JavaScript. Web creators likely have existing payment pages that they'd much rather link out to instead of trying to handle payments themselves individually.
Lastly, this API exists and I don't see it being used by creators today. That should say something about either its ease-of-use or return on investment from potential supporters.
Linking to payment methods in the pageYeah, we could scrape the payment URLs we know about embedded in the page. But there's a difference between potential URLs in the page due to non-creator generated content (links in comments, etc) and whatever the "authoritative" URLs are for paying the creator of the page. Being able to set <meta> tags in <head> is typically a higher bar than setting arbitrary URLs in the <body>.
Podcasting 2.0 RSS <podcast:funding> tagPodcasting 2.0 supports basically the exact tag that I want to use which encodes a URL and a human-readable name for that URL into the metadata description of a podcast publication. Really great to see some prior art here.
Thanks to DamonHD for sending me this reference.
FlattrFlattr is a service that tried to turn a "subscription" from users into micro-payouts based on a users' browsing history. Flattr shut down in 2023. This approach isn't one I'm interested in replicating for a few reasons:
- Access to the entire browsing history feels like a privacy nightmare. Yes you receive a "complete" sample of which web pages a user has visited but, yeah this doesn't seem great?
- Flattr tried to create its own payment platform for creators, rather than pushing users to send monetary support through existing stable payment methods like Patreon. They had to do this because of the micro-payments thing.
In general this is making me think micro-payments is extremely hard to do. I think having a handful of dedicated fans for small creators might be enough to "offset" the "loss" of micro-payments? Perhaps there can be a recommendation to note to users when certain creators are "niche" and therefore are receiving fewer payments relative to other creators and thus would benefit more from a contribution / boost?
Thanks to Quentin for sending me this reference.
BraveI know about Brave, and I would like to avoid crypto in my solution. Also many of the creators I pay for don't use crypto but do have multiple payment methods. I don't think the solution should require creators AND users adopt new technology to work.
What happens now?I'm no stranger to standards, so maybe I do some research and write a web standard proposal? Seems like fun! I'm imagining something like:
<head> <!-- ... --> <meta property="financial-support" content="https://patreon.com/c/MatthewCarlson"> </head>Because this is primarily for money, no doubt it will be abused to hell. First-party browsers probably wouldn't do anything with this information for the fear of legitimizing scammers' fake profiles.
The existence of the "Web Payments API" makes me think maybe it's not a huge deal and that whenever money gets involved peoples' spidey-senses start going off about whether a page is legitimate? Not sure.
Let me know what you think!
Have thoughts or questions? Let's chat over email or social:
sethmichaellarson@gmail.com
@sethmlarson@fosstodon.org
Want more articles like this one? Get notified of new posts by subscribing to the RSS feed or the email newsletter. I won't share your email or send spam, only whatever this is!
Want more content now? This blog's archive has ready-to-read articles. I also curate a list of cool URLs I find on the internet.
Find a typo? This blog is open source, pull requests are appreciated.
Thanks for reading! ♡ This work is licensed under CC BY-SA 4.0
︎Real Python: How to Iterate Through a Dictionary in Python
Python offers several ways to iterate through a dictionary, such as using .items() to access key-value pairs directly and .values() to retrieve values only.
By understanding these techniques, you’ll be able to efficiently access and manipulate dictionary data. Whether you’re updating the contents of a dictionary or filtering data, this guide will equip you with the tools you need.
By the end of this tutorial, you’ll understand that:
- You can directly iterate over the keys of a Python dictionary using a for loop and access values with dict_object[key].
- You can iterate through a Python dictionary in different ways using the dictionary methods .keys(), .values(), and .items().
- You should use .items() to access key-value pairs when iterating through a Python dictionary.
- The fastest way to access both keys and values when you iterate over a dictionary in Python is to use .items() with tuple unpacking.
To get the most out of this tutorial, you should have a basic understanding of Python dictionaries, know how to use Python for loops, and be familiar with comprehensions. Knowing other tools like the built-in map() and filter() functions, as well as the itertools and collections modules, is also a plus.
Get Your Code: Click here to download the sample code that shows you how to iterate through a dictionary with Python.
Take the Quiz: Test your knowledge with our interactive “Python Dictionary Iteration” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python Dictionary IterationDictionaries are one of the most important and useful data structures in Python. Learning how to iterate through a Dictionary can help you solve a wide variety of programming problems in an efficient way. Test your understanding on how you can use them better!
Getting Started With Python DictionariesDictionaries are a cornerstone of Python. Many aspects of the language are built around dictionaries. Modules, classes, objects, globals(), and locals() are all examples of how dictionaries are deeply wired into Python’s implementation.
Here’s how the Python official documentation defines a dictionary:
An associative array, where arbitrary keys are mapped to values. The keys can be any object with __hash__() and __eq__() methods. (Source)
There are a couple of points to notice in this definition:
- Dictionaries map keys to values and store them in an array or collection. The key-value pairs are commonly known as items.
- Dictionary keys must be of a hashable type, which means that they must have a hash value that never changes during the key’s lifetime.
Unlike sequences, which are iterables that support element access using integer indices, dictionaries are indexed by keys. This means that you can access the values stored in a dictionary using the associated key rather than an integer index.
The keys in a dictionary are much like a set, which is a collection of hashable and unique objects. Because the keys need to be hashable, you can’t use mutable objects as dictionary keys.
On the other hand, dictionary values can be of any Python type, whether they’re hashable or not. There are literally no restrictions for values. You can use anything as a value in a Python dictionary.
Note: The concepts and topics that you’ll learn about in this section and throughout this tutorial refer to the CPython implementation of Python. Other implementations, such as PyPy, IronPython, and Jython, could exhibit different dictionary behaviors and features that are beyond the scope of this tutorial.
Before Python 3.6, dictionaries were unordered data structures. This means that the order of items typically wouldn’t match the insertion order:
Python >>> # Python 3.5 >>> likes = {"color": "blue", "fruit": "apple", "pet": "dog"} >>> likes {'color': 'blue', 'pet': 'dog', 'fruit': 'apple'} Copied!Note how the order of items in the resulting dictionary doesn’t match the order in which you originally inserted the items.
In Python 3.6 and greater, the keys and values of a dictionary retain the same order in which you insert them into the underlying dictionary. From 3.6 onward, dictionaries are compact ordered data structures:
Python >>> # Python 3.6 >>> likes = {"color": "blue", "fruit": "apple", "pet": "dog"} >>> likes {'color': 'blue', 'fruit': 'apple', 'pet': 'dog'} Copied!Keeping the items in order is a pretty useful feature. However, if you work with code that supports older Python versions, then you must not rely on this feature, because it can generate buggy behaviors. With newer versions, it’s completely safe to rely on the feature.
Another important feature of dictionaries is that they’re mutable data types. This means that you can add, delete, and update their items in place as needed. It’s worth noting that this mutability also means that you can’t use a dictionary as a key in another dictionary.
Understanding How to Iterate Through a Dictionary in Python Read the full article at https://realpython.com/iterate-through-dictionary-python/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Eli Bendersky: GoMLX: ML in Go without Python
In the previous post I talked about running ML inference in Go through a Python sidecar process. In this post, let's see how we can accomplish the same tasks without using Python at all.
How ML models are implementedLet's start with a brief overview of how ML models are implemented under the hood [1]. The model is typically written in Python, using one of the ML frameworks like TensorFlow, JAX or PyTorch. The framework takes care of at least 2 high-level concerns for developers:
- Expressive way to describe the model architecture, including auto-differentiation for training.
- Efficient implementation of computational primitives on common HW: CPUs, GPUs and TPUs.
In-between these two concerns there exists a standardized model definition format (or several) that helps multiple tools interoperate. While it's by no means the only solution [2], let's look at the OpenXLA stack as a way to run models on diverse hardware:
- The top layer are the frameworks that provide high-level primitives to define ML models, and translate them to a common interchange format called StableHLO (where "HLO" stands for High-Level Operations). I've added the gopher on the very right - it will soon become clear why.
- The bottom layer is the HW that executes these models efficiently.
- In the middle is the OpenXLA system, which includes two major components: the XLA compiler translating HLO to HW machine code, and PJRT - the runtime component responsible for managing HW devices, moving data (tensors) between the host CPU and these devices, executing tasks, sharding and so on.
There's a huge amount of complexity hidden by the bottom layers of this diagram. Efficient compilation and code generation for diverse HW - including using fixed blocks and libraries (like cuDNN), runtime management etc. All of this is really something one shouldn't try to re-implement unless there's a really, really good reason to do so. And the best part? There's no Python there - this is C and C++; Python only exists on the upper layer - in the high-level ML frameworks.
GoMLXGoMLX is a relatively new Go package for ML that deserves some attention. GoMLX slots in as one of the frameworks, exactly where the Gopher is in the diagram above [3]. This is absolutely the right approach to the problem. There's no point in re-implementing the low-level primitives - whatever works for TF and JAX will work for Go as well! Google, NVIDIA, Intel and several other companies invest huge resources into these systems, and it's a good idea to benefit from these efforts.
In this post I will showcase re-implementations of some of the samples from the previous post, but with no Python in sight. But first, a few words about what GoMLX does.
GoMLX should be familiar if you've used one of the popular Python ML frameworks. You build a computational graph representing your model - the usual operations are supported and sufficient to implement anything from linear regression to cutting-edge transformers. Since GoMLX wraps XLA, it has access to all the same building blocks TF and JAX use (and it adds its own higher-level primitives, similarly to the Python frameworks).
GoMLX supports automatic differentiation to create the backward propagation operations required to update weights in training. It also provides many helpers for training and keeping track of progress, as well as Jupyter notebook support.
An image model for the CIFAR-10 dataset with GoMLXIn the previous post we built a CNN (convolutional neural network) model using TF+Keras in Python, and ran its inference in a sidecar process we could control from Go.
Here, let's build a similar model in Go, without using Python at all; we'll be training it on the same CIFAR-10 dataset we've used before.
The full code for this sample is here; it is heavily based on GoMLX's own example, with some modifications for simplicity and clarity. Here's the code defining the model graph:
func C10ConvModel(mlxctx *mlxcontext.Context, spec any, inputs []*graph.Node) []*graph.Node { batchedImages := inputs[0] g := batchedImages.Graph() dtype := batchedImages.DType() batchSize := batchedImages.Shape().Dimensions[0] logits := batchedImages layerIdx := 0 nextCtx := func(name string) *mlxcontext.Context { newCtx := mlxctx.Inf("%03d_%s", layerIdx, name) layerIdx++ return newCtx } // Convolution / activation layers logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 32, 32, 32) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(32).KernelSize(3).PadSame().Done() logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.3), true) logits.AssertDims(batchSize, 16, 16, 32) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(64).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 16, 16, 64) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 8, 8, 64) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = layers.Convolution(nextCtx("conv"), logits).Filters(128).KernelSize(3).PadSame().Done() logits.AssertDims(batchSize, 8, 8, 128) logits = activations.Relu(logits) logits = graph.MaxPool(logits).Window(2).Done() logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) logits.AssertDims(batchSize, 4, 4, 128) // Flatten logits, and apply dense layer logits = graph.Reshape(logits, batchSize, -1) logits = layers.Dense(nextCtx("dense"), logits, true, 128) logits = activations.Relu(logits) logits = layers.DropoutNormalize(nextCtx("dropout"), logits, graph.Scalar(g, dtype, 0.5), true) numClasses := 10 logits = layers.Dense(nextCtx("dense"), logits, true, numClasses) return []*graph.Node{logits} }As you might expect, the Go code is longer and more explicit (nodes are threaded explicitly between builder calls, instead of being magically accumulated). It's not hard to envision a Keras-like high level library on top of this.
Here's a snippet from the classifier (inference):
func main() { flagCheckpoint := flag.String("checkpoint", "", "Directory to load checkpoint from") flag.Parse() mlxctx := mlxcontext.New() backend := backends.New() _, err := checkpoints.Load(mlxctx).Dir(*flagCheckpoint).Done() if err != nil { panic(err) } mlxctx = mlxctx.Reuse() // helps sanity check the loaded context exec := mlxcontext.NewExec(backend, mlxctx.In("model"), func(mlxctx *mlxcontext.Context, image *graph.Node) *graph.Node { // Convert our image to a tensor with batch dimension of size 1, and pass // it to the C10ConvModel graph. image = graph.ExpandAxes(image, 0) // Create a batch dimension of size 1. logits := cnnmodel.C10ConvModel(mlxctx, nil, []*graph.Node{image})[0] // Take the class with highest logit value, then remove the batch dimension. choice := graph.ArgMax(logits, -1, dtypes.Int32) return graph.Reshape(choice) }) // classify takes a 32x32 image and returns a Cifar-10 classification according // to the models. Use C10Labels to convert the returned class to a string // name. The returned class is from 0 to 9. classify := func(img image.Image) int32 { input := images.ToTensor(dtypes.Float32).Single(img) outputs := exec.Call(input) classID := tensors.ToScalar[int32](outputs[0]) return classID } // ...Now classify is a function that takes an image.Image and runs it through the network, returning the index of the most likely label out of the list of CIFAR-10 labels.
The README file in the sample explains how to run it locally on a GPU; the model trains and runs successfully, with similar results to the TF+Keras model we trained in Python earlier.
Gemma2 with GoMLXFor a (much) more involved example, GoMLX has a full implementation of Gemma2 inference. The model implementation itself is in the transformers package. It should look fairly familiar if you've seen a transformer implementation in another language.
The official example in that repository shows how to run it with weights downloaded from HuggingFace; since I've already downloaded the Gemma2 weights from Kaggle for the previous post, here's a simple adaptation:
var ( flagDataDir = flag.String("data", "", "dir with converted weights") flagVocabFile = flag.String("vocab", "", "tokenizer vocabulary file") ) func main() { flag.Parse() ctx := context.New() // Load model weights from the checkpoint downloaded from Kaggle. err := kaggle.ReadConvertedWeights(ctx, *flagDataDir) if err != nil { log.Fatal(err) } // Load tokenizer vocabulary. vocab, err := sentencepiece.NewFromPath(*flagVocabFile) if err != nil { log.Fatal(err) } // Create a Gemma sampler and start sampling tokens. sampler, err := samplers.New(backends.New(), ctx, vocab, 256) if err != nil { log.Fatalf("%+v", err) } start := time.Now() output, err := sampler.Sample([]string{ "Are bees and wasps similar?", }) if err != nil { log.Fatalf("%+v", err) } fmt.Printf("\tElapsed time: %s\n", time.Since(start)) fmt.Printf("Generated text:\n%s\n", strings.Join(output, "\n\n")) }The complete code together with installation and setup instructions is here.
gomlx/gemma demonstrates that GoMLX has sufficiently advanced capabilities to run a real production-grade open LLM, without Python in the loop.
SummaryThe previous post discussed some options for incorporating ML inference into a Go project via a minimal Python sidecar process. Here, we take it a step further and implement ML inference in Go without using Python. We do so by leveraging GoMLX, which itself relies on XLA and PJRT to do the heavy lifting.
If we strip down a framework like TensorFlow to its layers, GoMLX reuses the bottom layers (which is where most of the magic lies), and replaces the model builder library with a Go variant.
Since GoMLX is still a relatively new project, it may be a little risky for production uses at this point. That said, I find this direction very promising and will be following the project's development with interest.
CodeThe full code for the samples in this post is on GitHub.
[1]This assumes you know the basics of neural network graphs, their training, etc. If not, check out this post and some of my other posts in the Machine Learning category. [2]It's likely the most common production solution, and pretty much the only way to access Google's TPUs. [3]It does so by including Go bindings for both XLA and PJRT; these are wrapped in higher-level APIs for users.EuroPython Society: 2024 General Assembly Announcement
We’re excited to invite you to this year’s General Assembly meeting! We’ll gather on Sunday, December 1st, 2024, from 20:00 to 21:00 CET. Just like in recent years, we’ll use Zoom, and additional joining instructions will be shared closer to the date.
The General Assembly is the highest decision making body of the society and EPS membership is required to participate. Membership is open to individuals who wish to actively engage in implementing the EPS mission. If you want to become a member of EuroPython Society you can sign-up here: https://www.europython-society.org/application/
You can find more details about the agenda of the meeting, as it is defined in our bylaws here https://www.europython-society.org/bylaws/ (Article 8).
One of the items on the Agenda is electing the new Board.
What does the Board do?The Board consists of a chairperson, a vice chairperson and 2-7 Board members. The duties and responsibilities of the Board are substantial: the board collectively takes up the fiscal and legal responsibility of the Society.
A major topic is the annual EuroPython conference. While we would like to transition to a model with an independent organising team, we are not there yet. Therefore, the Board still needs to be involved in the conference organisation.
Beyond the conference, the Board also manages several critical areas, including:
- Managing EPS membership
- Overseeing finances and budgets
- Running the grant programme
- Maintaining infrastructure and resources
Furthermore, specifically for 2025, and following the recommendation from the previous Board, we would like to focus on four key topics that are important for the Society&aposs future and sustainability:
- Hiring an Event Manager/Coordinator
- Selecting a location for 2026 and possibly 2027
- Strengthen community outreach
- Improving the fiscal and legal framework
The Society is entirely volunteer-driven and serving on the board requires a significant time commitment. Everyone has a different schedule, so most of the work is usually done asynchronously. However, all board members attend the 1.5-hour board call held every two weeks in the evening, CE(S)T timezone. Everyone&aposs time is valuable and please consider that the less time or effort you can dedicate, the more the workload may shift to other Board members.
All things considered you will need a few hours every week.
Who should apply?You want to invest your time and knowledge into building a better structure for the EuroPython Society? Or you want to work on building connections between different Python-based communities? Then this might be for you! Please keep in mind the time commitments mentioned above.
You are not expected to be perfect in any of the skills needed and you will be supported in learning how things work. That being said, having experience in a non-profit organisation, whether within the Python world (such as EPS, PSF, DSF, local Python communities etc.) or any other similar organisation, would be beneficial for onboarding and understanding the organisational structure, culture and dynamics.
In the past having or willing to learn the following skills helped organising the conference:
- Good communication skills
- Organisation skills
- Experience organising events with more than 1000 people
- Working with volunteer-based communities
- Working in big teams
You get the chance to shape and influence the future of EuroPython
You gain skills useful to run non-profits in different European countries - including cross border challenges
You can help grow and empower local communities
You can build relationships and connections with fellow community members
You can build a more diverse and inclusive Python community by serving the mission of EuroPython Society
I am interested, what should I do?If you’re considering running for the Board or nominating another EPS member, we’d love to hear from you! Although the formal deadline is during the General Assembly, we kindly request you send your nomination as early as possible to board@europython.eu. We will publish the initial list of candidates on Tuesday, 26th of November 2024. If you’re not sure if this is a good idea or not – please email anyway and we will help you figure it out! 🙂
If you&aposre on our EPS Organisers&apos Discord, there&aposs a dedicated channel for interested candidates. Please ask in the general channel, and we’ll be happy to add you.
You can find examples of previous nominations here: https://www.europython-society.org/list-of-eps-board-candidates-for-2023-2024/.
Your nomination should highlight why you want to run for the Board. What is your vision for EPS and in which projects you want to be involved. During the General Assembly, you will have the opportunity to introduce yourself and share with our members why you believe they should vote for you. Each candidate will typically be given one minute to present themselves before members cast their votes.
It sounds a lot, I want to help, but I can’t commit to thatThat’s completely understandable! Serving on the Board comes with significant responsibilities, time commitments, and administrative tasks. If that’s not the right fit for you, but you’re still interested in supporting us, we’d love your help! There are many other ways to get involved. We have several teams (see 2024 Teams Description document, as an example) that work on conference preparations during the months leading up to the event, and we also need volunteers to assist onsite during the conference.
Your help does not need to be limited to the conference. Infrastructure and connections need to be maintained all around the year for example. Your time and support would make a big difference! Stay tuned to our social platforms for announcements about these opportunities.
Real Python: The Real Python Podcast – Episode #229: The Joy of Tinkering & Python Free-Threading Performance
What keeps your spark alive for developing software and learning Python? Do you like to try new frameworks, build toy projects, or collaborate with other developers? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Talk Python to Me: #486: CSnakes: Embed Python code in .NET
Matt Layman: Huey Background Worker - Building SaaS #207
Seth Michael Larson: Visualizing the Python package SBOM data flow
Published 2024-11-22 by Seth Larson
Reading time: minutes
TLDR: Skip intro, take me to the visualization!
I'm working on improving measurability of Python packages by allowing Software Bill-of-Materials documents (SBOM) to be included in Python packages so that projects and build tools can record information about a package for downstream use.
This is a cross-functional project where I need input from Python projects, Python packaging tools (build backends+tools and installers), but also from folks completely outside the Python community like SBOM tooling maintainers. With projects like this, it can be difficult to "see the forest through the trees". When you're reviewing the packaging PEP, it can be difficult to imagine how or who is using the new standard. This article is to help visualize the end-to-end data flow.
How SBOM data will be included in Python packagesIn short, the proposal is:
- Allow Python projects to manually specify SBOM documents in pyproject.toml with [project].sbom-files = ["..."]
- Allow Python package archives to include self-describing SBOM documents and reference them in metadata via Sbom-File field.
- Zero-or-more SBOM documents per Python package archive. Each tool adding SBOM data creates a new SBOM inside the archive to avoid conflicts. End-user SBOM tools need to handle multiple SBOMs to "stitch" them together.
There are two Python packages being shown, Package A on the left and Package B on the right. Package A depends on Package B. Package A is a pure-Python package with no bundled dependencies. Package B uses binary extensions and uses auditwheel to bundle shared libraries.
@import url(https://fonts.googleapis.com/css2?family=Inter:wght@400;500);
AuditwheelAuditwheelPython EnvironmentPython EnvironmentBuild BackendBuild BackendPythonPackage
Python...Python
Package B
Python...Source ForgeSource ForgeSource Code BSource Code BSBOM GeneratorSBOM GeneratorSrc
SBOMSrc...Src
SBOMSrc...Build
SBOMBuild...3rd P
Deps3rd P...SO /
DLLsSO /...Build
SBOMBuild...Src
SBOMSrc...Build
SBOMBuild...3rd P
Deps3rd P...Py
Pkg BPy...Build
SBOMBuild...Src
SBOMSrc...Build
SBOMBuild...METADATAMETADATAPython
Package B
Python...METADATAMETADATAOperational SBOM (OBOM)Operational SBOM (OBOM)1122335566Package BPackage BDataDataDataDataDataDataDataDataBuild BackendBuild BackendPython
Package A
Python...Source ForgeSource ForgeSource Code ASource Code AMETADATAMETADATAPackage APackage ADataDataPython
Package A
Python...METADATAMETADATAPython Package IndexPython Package Indexinstall_requiresinstall_re...44DEPENDS_ONDEPENDS_ONrefrefrefrefrefrefText is not SVG - cannot display
How SBOM data flows from Python package source code, build, to an SBOM generation tool
Stage 1: If the Python project bundles third-party software in their own source code then the project may specify one or more SBOM documents through project.sbom-files in pyproject.toml. Build backends copy these documents into source distributions and wheels.
Stage 2: If the Python build-backend pulls dependencies (like Maturin and Cargo) while building a wheel those dependencies can be recorded in another SBOM document in the wheel.
Stage 3: If a tool that modifies wheels by adding dependencies is used (like auditwheel) then that tool can record modifications in an SBOM document. At this point there are three separate SBOM documents included in the Package B archive.
Stage 4: Archives are uploaded to an index like PyPI. The index can do some validation of included SBOM documents, if any.
Stage 5: Installers download and install the Python package archives. The SBOM files are placed into the .dist-info/sboms/ directory in the Python environment and referenced in package metadata.
Stage 6: SBOM generation tools scan the Python environment and using existing Python package metadata and new SBOM documents with per-package data stitch together an Operational SBOM (OBOM) detailing the Python environment.
Who does what?The plan is to allow each "actor" in the system adding SBOM data to a Python package to create their own SBOM document inside the Python package.
This means they can choose any SBOM standard (although we'll recommend sticking to a well-known one like CycloneDX and SPDX) and that intermediate tools won't need to "merge" SBOM data together. Avoiding this merging is extremely important, because cross-standard SBOM data merges are a very hard problem. This problem is deferred to SBOM generation tools which already need to support multiple SBOM standards.
- Pure-Python projects that don't vendor software are easy, there's nothing to do here.
- Python projects that vendor software can annotate that software using an SBOM and specify the SBOM in pyproject.toml. Keeping this up-to-date is a non-zero amount of work, but I am hoping that by providing this PEP it will enable these types of contributions. I'm also hoping to provide a lightweight pre-commit hook to help keeping these SBOM documents up-to-date, similar to what CPython already uses.
- Python project which use a build backend that pull dependencies should be able to annotate what those dependencies are at build time. There will be exceptions, looking into tools like Meson and multibuild to see what can be done.
- Python bundling tools like auditwheel, delocate, etc can annotate shared libraries and DLLs that are pulled into wheels.
My hope is that the most difficult part of this work (manually annotating a package if automatic tools can't) will enable a new type of contribution from users of Python packages to provide SBOM data. Previously there was no standardized method to have SBOM data propagate through Python packages, thus discouraged this type of contribution.
If you're interested in having your use-case covered or you have concerns about the approach, please open a GitHub issue on the project tracker.
That's all for this post! 👋 If you're interested in more you can read the last report.
Have thoughts or questions? Let's chat over email or social:
sethmichaellarson@gmail.com
@sethmlarson@fosstodon.org
Want more articles like this one? Get notified of new posts by subscribing to the RSS feed or the email newsletter. I won't share your email or send spam, only whatever this is!
Want more content now? This blog's archive has ready-to-read articles. I also curate a list of cool URLs I find on the internet.
Find a typo? This blog is open source, pull requests are appreciated.
Thanks for reading! ♡ This work is licensed under CC BY-SA 4.0
︎Django Weblog: 2024 Django Developers Survey
The DSF is once again partnering with JetBrains to run the 2024 Django Developers Survey 🌈
Please take a moment to fill it out! It should only take about 10 minutes to complete. It’s an important metric of Django usage, and is immensely helpful to guide future technical and community decisions.
The survey will be open until December 21st, 2024. After the survey is over, we will publish the aggregated results. JetBrains will also randomly choose 10 winners (from those who complete the survey in its entirety with meaningful answers), who will each receive a $100 Amazon Gift Card or a local equivalent.
How you can helpTake a moment to re-share the survey on socials, and with your respective communities? The more diverse the answers, the better the results for all of us.
Thank you for taking the time to contribute to this community effort, and thank you to JetBrains for their consistent support over the years!
Real Python: Quiz: Expression vs Statement in Python: What's the Difference?
In this quiz, you’ll test your understanding of Expression vs Statement in Python: What’s the Difference?
By working through this quiz, you’ll revisit the key differences between expressions and statements in Python, and how to use them effectively in your code.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: Interacting With Python
In this quiz, you’ll test your understanding of the different ways you can interact with Python.
By working through this quiz, you’ll revisit key concepts related to Python interaction in interactive mode using the Read-Eval-Print Loop (REPL), through Python script files, and within Integrated Development Environments (IDEs) and code editors.
You’ll also test your knowledge of some other options that may be useful, such as Jupyter Notebooks.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Django Weblog: Announcing the 6.x Steering Council elections 🚀
Today, we’re announcing early elections for the Django Software Foundation Steering Council over the 6.x Django release cycle. Elected members will be on the Steering Council for two years, from the end of those elections in December, until April 2027 with the scheduled start of the Django 7.x release cycle.
Why we have early electionsThe DSF Board of Directors previously shared Django’s technical governance challenges, and opportunities. Now that the Board elections are completed, we’re ready to proceed with this other, separate election, following existing processes. We will want a Steering Council who strives to meet the group’s intended goals:
-
To safeguard big decisions that affect Django projects at a fundamental level.
-
To help shepherd the project’s future direction.
We expect the new Steering Council will take on those known challenges, resolve those questions of technical leadership, and update Django’s technical governance. They will have the full support of the Board of Directors to address this threat to Django’s future. And the Board will also be more decisive in intervening, should similar issues keep arising.
Elections timelineHere are the important dates of the Steering Council elections, subject to change:
- 2024-11-21: announcement & opening of voter registration
- 2024-11-26 23:59 AoE (Anywhere on Earth): voter registration closes
- 2024-11-27: opening of Steering Council candidates registration
- 2024-12-04 23:59 AoE: candidates registration closes
- (one week gap per defined processes)
- 2024-12-10: voting starts
- 2024–12-17 23:59 AoE: voting ends
- 2024-12-18: results ratification by DSF Board of Directors
- 2024-12-19: results announcement
If you’re an Individual Member of the Django Software Foundation, you’re already registered to vote. There’s nothing further for you to do. If you aren’t, consider nominating yourself for individual membership. Once approved, you will be registered to vote for this election.
Alternatively, for members of our community who want to vote in this election but don’t want to become Individual Members, you can register to vote from now until 2024-11-26 23:59 Anywhere on Earth, use our form: Django 6.x Steering Council Voter Registration.
Candidate registrationIf you’re interested, don’t wait until formal candidate registration. You can already fill in our 6.x Steering Council expression of interest form. At the end of the form, select “I would like what my submissions to this form to be used as part of my candidate registration for the elections”.
Django 6.x Steering Council elections - Expression of interest
VotingOnce voting opens, those eligible to vote in this election will receive information on how to vote via email. Please check for an email with the subject line “6.x Steering Council elections voting”. Voting will be open until 23:59 on December 17, 2024 Anywhere on Earth.
—
Any questions? Ask on our dedicated forum discussion thread, or reach out via email to foundation\@djangoproject.com.
PyPodcats: Trailer: Episode 7 With Anna Makarudze
Sneak Peek of our chat with Anna Makarudze, hosted by Mariatta Wijaya and Cheuk Ting Ho.
Since discovering Python and Django in 2015, Anna has been actively involved in the Django community. She helped organize PyCon Zimbabwe, and she has coached at Django Girls in Harare and Windhoek.
She served on the Board of Directors at Django Software Foundation for five years, and she is currently a Django Girls Foundation Trustee & Fundraising Coordinator.
Anna became aware of the lack of representation of women in tech industry, something that became more evident as she attended Django Under the Hood in 2016 where most of the attendees were white men, and only a few are women. That’s when she realized the importance of communities like Django Girls in supporting more women in the Django Community.
In this chat, Anna shared ways on how you can contribute and help support Django Girls+ Foundation.
Full episode is coming on November 27, 2024! Subscribe to our podcast now!
Trey Hunner: Python Black Friday & Cyber Monday sales (2024)
Ready for some Python skill-building sales?
This is my seventh annual compilation of Python learning deals.
I’m publishing this post extra early this year, so bookmark this page and set a calendar event for yourself to check back on Friday November 29.
Currently live salesHere are Python-related sales that are live right now:
- Python Jumpstart with Python Morsels: 50% off my brand new Python course, an introduction to Python that’s very hands-on ($99 instead of $199)
- Rodrigo 50% off Rodrigo’s all books bundle with code BF24
- The Python Coding Place: 40% off The Python Coding Book and 40% off a lifetime membership to The Python Coding Place with code black2024
- Sundeep Agarwal: ~50% off Sundeep’s all book and Python bundles with code FestiveOffer
- O'Reilly Media: 40% off the first year with code CYBERWEEK24 ($299 instead of $499)
Here are sales that will be live soon:
- Data School 40% off all Kevin’s courses or get a bundle with all 5 of his courses
- Mike Driscoll: 35% off Mike’s Python books and courses with code BF24
Here are some sales I expect to see, but which haven’t been announced yet:
- Talk Python: usually holds a sale on a variety of courses
- Brian Okken: often holds a sale on his pytest course
- Reuven Lerner: usually holds a sale
- Pragmatic Bookshelf: I’m guessing they’ll hold a 40% off sale with code turkeycode2024
Also see Adam Johnson’s Django-related Deals for Black Friday 2024 for sales on Adam’s books, courses from the folks at Test Driven, Django templates, and various other Django-related deals.
And for non-Python/Django Python deals, see the Awesome Black Friday / Cyber Monday deals GitHub repository and the BlackFridayDeals.dev website.
If you know of another sale (or a likely sale) please comment below or email me.
Real Python: NumPy Practical Examples: Useful Techniques
The NumPy library is a Python library used for scientific computing. It provides you with a multidimensional array object for storing and analyzing data in a wide variety of ways. In this tutorial, you’ll see examples of some features NumPy provides that aren’t always highlighted in other tutorials. You’ll also get the chance to practice your new skills with various exercises.
In this tutorial, you’ll learn how to:
- Create multidimensional arrays from data stored in files
- Identify and remove duplicate data from a NumPy array
- Use structured NumPy arrays to reconcile the differences between datasets
- Analyze and chart specific parts of hierarchical data
- Create vectorized versions of your own functions
If you’re new to NumPy, it’s a good idea to familiarize yourself with the basics of data science in Python before you start. Also, you’ll be using Matplotlib in this tutorial to create charts. While it’s not essential, getting acquainted with Matplotlib beforehand might be beneficial.
Get Your Code: Click here to download the free sample code that you’ll use to work through NumPy practical examples.
Take the Quiz: Test your knowledge with our interactive “NumPy Practical Examples: Useful Techniques” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
NumPy Practical Examples: Useful TechniquesThis quiz will challenge your knowledge of working with NumPy arrays. You won't find all the answers in the tutorial, so you'll need to do some extra investigating. By finding all the answers, you're sure to learn some interesting things along the way.
Setting Up Your Working EnvironmentBefore you can get started with this tutorial, you’ll need to do some initial setup. In addition to NumPy, you’ll need to install the Matplotlib library, which you’ll use to chart your data. You’ll also be using Python’s pathlib library to access your computer’s file system, but there’s no need to install pathlib because it’s part of Python’s standard library.
You might consider using a virtual environment to make sure your tutorial’s setup doesn’t interfere with anything in your existing Python environment.
Using a Jupyter Notebook within JupyterLab to run your code instead of a Python REPL is another useful option. It allows you to experiment and document your findings, as well as quickly view and edit files. The downloadable version of the code and exercise solutions are presented in Jupyter Notebook format.
The commands for setting things up on the common platforms are shown below:
Fire up a Windows PowerShell(Admin) or Terminal(Admin) prompt, depending on the version of Windows that you’re using. Now type in the following commands:
Windows PowerShell PS> python -m venv venv\ PS> venv\Scripts\activate (venv) PS> python -m pip install numpy matplotlib jupyterlab (venv) PS> jupyter lab Copied!Here you create a virtual environment named venv\, which you then activate. If the activation is successful, then the virtual environment’s name will precede your Powershell prompt. Next, you install numpy and matplotlib into this virtual environment, followed by the optional jupyterlab. Finally, you start JupyterLab.
Note: When you activate your virtual environment, you may receive an error stating that your system can’t run the script. Modern versions of Windows don’t allow you to run scripts downloaded from the Internet as a security feature.
To fix this, you need to type the command Set-ExecutionPolicy RemoteSigned, then answer Y to the question. Your computer will now run scripts that Microsoft has verified. Once you’ve done this, the venv\Scripts\activate command should work.
Fire up a terminal and type in the following commands:
Shell $ python -m venv venv/ $ source venv/bin/activate (venv) $ python -m pip install numpy matplotlib jupyterlab (venv) $ jupyter lab Copied!Here you create a virtual environment named venv/, which you then activate. If the activation is successful, then the virtual environment’s name will precede your command prompt. Next, you install numpy and matplotlib into this virtual environment, followed by the optional jupyterlab. Finally, you start JupyterLab.
You’ll notice that your prompt is preceded by (venv). This means that anything you do from this point forward will stay in this environment and remain separate from other Python work you have elsewhere.
Now that you have everything set up, it’s time to begin the main part of your learning journey.
NumPy Example 1: Creating Multidimensional Arrays From FilesWhen you create a NumPy array, you create a highly-optimized data structure. One of the reasons for this is that a NumPy array stores all of its elements in a contiguous area of memory. This memory management technique means that the data is stored in the same memory region, making access times fast. This is, of course, highly desirable, but an issue occurs when you need to expand your array.
Suppose you need to import multiple files into a multidimensional array. You could read them into separate arrays and then combine them using np.concatenate(). However, this would create a copy of your original array before expanding the copy with the additional data. The copying is necessary to ensure the updated array will still exist contiguously in memory since the original array may have had non-related content adjacent to it.
Constantly copying arrays each time you add new data from a file can make processing slow and is wasteful of your system’s memory. The problem becomes worse the more data you add to your array. Although this copying process is built into NumPy, you can minimize its effects with these two steps:
-
When setting up your initial array, determine how large it needs to be before populating it. You may even consider over-estimating its size to support any future data additions. Once you know these sizes, you can create your array upfront.
-
The second step is to populate it with the source data. This data will be slotted into your existing array without any need for it to be expanded.
Next, you’ll explore how to populate a three-dimensional NumPy array.
Populating Arrays With File DataIn this first example, you’ll use the data from three files to populate a three-dimensional array. The content of each file is shown below, and you’ll also find these files in the downloadable materials:
The first file has two rows and three columns with the following content:
CSV file1.csv 1.1, 1.2, 1.3 1.4, 1.5, 1.6 Copied! Read the full article at https://realpython.com/numpy-example/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: NumPy Practical Examples: Useful Techniques
In this quiz, you’ll test your understanding of the techniques covered in the tutorial NumPy Practical Examples: Useful Techniques.
By working through the questions, you’ll review your understanding of NumPy arrays and also expand on what you learned in the tutorial.
You’ll need to do some research outside of the tutorial to answer all the questions. Embrace this challenge and let it take you on a learning journey.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Julien Tayon: The advantages of HTML as a data model over basic declarative ORM approach
For this, we use one trick : derive HTML widget for presentation, database access, REST endpoints from ONE SOURCE of truth and we call it MODEL.
A tradition, and I insist it's a conservative tradition, is to use a declarative model where we mad the truth of the model from python classes.
By declaring a class we will implicitly declare it's SQL structure, the HTML input form for human readable interaction and the REST endpoint to access a graph of objects which are all mapped on the database.
Since the arrival of pydantic it makes all the more sense when it comes to empower a strongly type approach in python.
But is it the only one worthy ?
I speak here as a veteran of the trenchline which job is to read a list of entries of customer in an xls file from a project manager and change the faulty value based on the retro-engineering of an HTML formular into whatever the freak the right value is supposed to be.
In this case your job is in fact to short circuit the web framework to which you don't have access to change values directly into the database.
More often than never is these real life case you don't have access to the team who built the framework (to much bureaucracy to even get a question answered before the situation gets critical) ... So you look at the form.
And you guess the name of the table that is impacted by looking at the « network tab » in the developper GUI when you hit the submit button.
And you guess the name of the field impacted in the table to guess the name of the columns.
And then you use your only magical tool which is a write access to the database to reflect the expected object with an automapper and change values.
You could do it raw SQL I agree, but sometimes you need to do a web query in the middle to change the value because you have to ask a REST service what is the new ID of the client.
And you see the more this experience of having to tweak into real life frameworks that often surprise users for the sake of the limitation of the source of truth, the more I want the HTML to be the source of truth.
The most stoïcian approach to full stack framework approach : to derive Everything from an HTML page.
The views, the controllers, the route, the model in such a true way that if you modify the HTML you modify in real time the database model, the routes, the displayed form.
What are the advantages of HTML as a declarative language ?
Here, one of the tradition is to prefere the human readable languages such as YAML and JSON, or machine readable as XML over HTML.
However, JSON and YAML are more limited in expressiveness of data structure than HTML (you can have a dict as a key in a dict in json ? Me I can.)
And on the other hand XML is quite a pain to read and write without mistakes.
HTML is just XML
HTML is a lax and lenient grammarless XML. No parsers will raise an exception because you wrote "<br>" instead of "<br/>" (or the opposite). You can add non existent attributes to tags and the parser will understand this easily without you having to redefine a full fledge grammar.
HTML is an XML YOU CAN SEE.
There are some tags that are related to a grammar of visual widget to which non computer people are familiar with.
If you use a FORM as a mapping to a database table, and all input inside has A column name you have already input drawn on your screen.
Modern « remote procedure call » are web based
Call it RPC, call it soap, call it REST, nowadays the web technologies trust 99% of how computer systems exchange data between each others.
You buy something on the internet, at the end you interact with a web formular or a web call. Hence, we can assert with strong convictions that 100% of web technologies can serve web pages. Thus, if you use your html as a model and present it, therefore you can deduce the data model from the form without needing a new pivoting language.
Proof of concept
For the convenience of « fun » we are gonna imagine a backend for « agile by micro blogging » (à la former twitter).
We are gonna assume the platform is structured micro blogging around where agile shines the most : not when things are done, but to move things on.
Things that are done will be called statements. Like : « software is delivered. Here is a factoid (a git url for instance) ». We will call this nodes in a graph and are they will be supposed to immutable states that can't be contested.
Each statement answers another statement's factoid like a delivery statement tends to follow a story point (at least should lead by the mean of a transition.
Hence in this application we will mirco-blog about the transition ... like on a social network with members of concerned group.
The idea of the application is to replace scrum meetings with micro blogging.
Are you blocked ? Do you need anything ? Can be answered on the mirco blogging platform, and every threads that are presented archived, used for machine learning (about what you want to hear as a good news) in a data form that is convenient for large language model.
As such we want to harvest a text long enough to express emotions, constricted to a laughingly small amount of characters so that finesse and ambiguity are tough to raise. That's the heart of the application : harvesting comments tagged with associated emotions to ease the work of tagging for Artificial Intelligence.
Hear me out, this is just a stupid idea of mine to illustrate a graph like structure described with HTML, not a real life idea. Me I just love to represent State Machine Diagram with everything that fall under my hands.
Here is the entity relationship diagram I have in mind :
Let's see what a table declaration might look like in HTML, let's say transition : <form action=/transition > <input type=number name=id /> <input type=number name=user_group_id nullable=false reference=user_group.id /> <textarea name=message rows=10 cols=50 nullable=false ></textarea> <input type=url name=factoid /> <select name="emotion_for_group_triggered" value=neutral > <option value="">please select a value</option> <option value=positive >Positive</option> <option value=neutral >Neutral</option> <option value=negative >Negative</option> </select> <input type=number name=expected_fun_for_group /> <input type=number name=previous_statement_id reference=statement.id nullable=false /> <input type=number name=next_statement_id reference=statement.id /> <unique_constraint col=next_statement_id,previous_statement_id name=unique_transition ></unique_constraint> <input type=checkbox name=is_exception /> </form> Through the use of additionnal tags of html and attributes we can convey a lot of informations usable for database construction/querying that are gonna be silent at the presentation (like unique_constraint). And with a little bit of javascript and CSS this html generate the following rendering (indicating the webservices endpoint as input type=submit :
Meaning that you can now serve a landing page that serve the purpose of human interaction, describing a « curl way » of automating interaction and a full model of your database.
Most startup think data model should be obfuscated to prevent being copied, most free software project thinks that sharing the non valuable assets helps adopt the technology.
And thanks to this, I can now create my own test suite that is using the HTML form to work on a doppleganger of the real database by parsing the HTML served by the application service (pdca.py) and launch a perfectly functioning service out of it: from requests import post from html.parser import HTMLParser import requests import os from dateutil import parser from passlib.hash import scrypt as crypto_hash # we can change the hash easily from urllib.parse import parse_qsl, urlparse # heaviweight from requests import get from sqlalchemy import * from sqlalchemy.ext.automap import automap_base from sqlalchemy.orm import Session DB=os.environ.get('DB','test.db') DB_DRIVER=os.environ.get('DB_DRIVER','sqlite') DSN=f"{DB_DRIVER}://{DB_DRIVER == 'sqlite' and not DB.startswith('/') and '/' or ''}{DB}" ENDPOINT="http://127.0.0.1:5000" os.chdir("..") os.system(f"rm {DB}") os.system(f"DB={DB} DB_DRIVER={DB_DRIVER} python pdca.py & sleep 2") url = lambda table : ENDPOINT + "/" + table os.system(f"curl {url('group')}?_action=search") form_to_db = transtype_input = lambda attrs : { k: ( # handling of input having date/time in the name "date" in k or "time" in k and v and type(k) == str ) and parser.parse(v) or # handling of boolean mapping which input begins with "is_" k.startswith("is_") and [False, True][v == "on"] or # password ? "password" in k and crypto_hash.hash(v) or v for k,v in attrs.items() if v and not k.startswith("_") } post(url("user"), params = dict(id=1, secret_password="toto", name="jul2", email="j@j.com", _action="create"), files=dict(pic_file=open("./assets/diag.png", "rb").read())).status_code #os.system(f"curl {ENDPOINT}/user?_action=search") #os.system(f"sqlite3 {DB} .dump") engine = create_engine(DSN) metadata = MetaData() transtype_true = lambda p : (p[0],[False,True][p[1]=="true"]) def dispatch(p): return dict( nullable=transtype_true, unique=transtype_true, default=lambda p:("server_default",eval(p[1])), ).get(p[0], lambda *a:None)(p) transtype_input = lambda attrs : dict(filter(lambda x :x, map(dispatch, attrs.items()))) class HTMLtoData(HTMLParser): def __init__(self): global engine, tables, metadata self.cols = [] self.table = "" self.tables= [] self.enum =[] self.engine= engine self.meta = metadata super().__init__() def handle_starttag(self, tag, attrs): global tables attrs = dict(attrs) simple_mapping = { "email" : UnicodeText, "url" : UnicodeText, "phone" : UnicodeText, "text" : UnicodeText, "checkbox" : Boolean, "date" : Date, "time" : Time, "datetime-local" : DateTime, "file" : Text, "password" : Text, "uuid" : Text, #UUID is postgres specific } if tag in {"select", "textarea"}: self.enum=[] self.current_col = attrs["name"] self.attrs= attrs if tag == "option": self.enum.append( attrs["value"] ) if tag == "unique_constraint": self.cols.append( UniqueConstraint(*attrs["col"].split(','), name=attrs["name"]) ) if tag in { "input" }: if attrs.get("name") == "id": self.cols.append( Column('id', Integer, **( dict(primary_key = True) | transtype_input(attrs )))) return try: if attrs.get("name").endswith("_id"): table=attrs.get("name").split("_") self.cols.append( Column(attrs["name"], Integer, ForeignKey(attrs["reference"])) ) return except Exception as e: log(e, ln=line()) if attrs.get("type") in simple_mapping.keys() or tag in {"select",}: self.cols.append( Column( attrs["name"], simple_mapping[attrs["type"]], **transtype_input(attrs) ) ) if attrs["type"] == "number": if attrs.get("step","") == "any": self.cols.append( Columns(attrs["name"], Float) ) else: self.cols.append( Column(attrs["name"], Integer) ) if tag== "form": self.table = urlparse(attrs["action"]).path[1:] def handle_endtag(self, tag): global tables if tag == "select": # self.cols.append( Column(self.current_col,Enum(*[(k,k) for k in self.enum]), **transtype_input(self.attrs)) ) self.cols.append( Column(self.current_col, Text, **transtype_input(self.attrs)) ) if tag == "textarea": self.cols.append( Column( self.current_col, String(int(self.attrs["cols"])*int(self.attrs["rows"])), **transtype_input(self.attrs)) ) if tag=="form": self.tables.append( Table(self.table, self.meta, *self.cols), ) #tables[self.table] = self.tables[-1] self.cols = [] with engine.connect() as cnx: self.meta.create_all(engine) cnx.commit() HTMLtoData().feed(get("http://127.0.0.1:5000/").text) os.system("pkill -f pdca.py") #metadata.reflect(bind=engine) Base = automap_base(metadata=metadata) Base.prepare() with Session(engine) as session: for table,values in tuple([ ("user", form_to_db(dict( name="him", email="j2@j.com", secret_password="toto"))), ("group", dict(id=1, name="trolol") ), ("group", dict(id=2, name="serious") ), ("user_group", dict(id=1,user_id=1, group_id=1, secret_token="secret")), ("user_group", dict(id=2,user_id=1, group_id=2, secret_token="")), ("user_group", dict(id=3,user_id=2, group_id=1, secret_token="")), ("statement", dict(id=1,user_group_id=1, message="usable agile workflow", category="story" )), ("statement", dict(id=2,user_group_id=1, message="How do we code?", category="story_item" )), ("statement", dict(id=3,user_group_id=1, message="which database?", category="question")), ("statement", dict(id=4,user_group_id=1, message="which web framework?", category="question")), ("statement", dict(id=5,user_group_id=1, message="preferably less", category="answer")), ("statement", dict(id=6,user_group_id=1, message="How do we test?", category="story_item" )), ("statement", dict(id=7,user_group_id=1, message="QA framework here", category="delivery" )), ("statement", dict(id=8,user_group_id=1, message="test plan", category="test" )), ("statement", dict(id=9,user_group_id=1, message="OK", category="finish" )), ("statement", dict(id=10, user_group_id=1, message="PoC delivered",category="delivery")), ("transition", dict( user_group_id=1, previous_statement_id=1, next_statement_id=2, message="something bugs me",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=4, message="standup meeting feedback",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=3, message="standup meeting feedback",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=6, message="change accepted",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=4, next_statement_id=5, message="arbitration",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=3, next_statement_id=5, message="arbitration",is_exception=True, )), ("transition", dict( user_group_id=1, previous_statement_id=6, next_statement_id=7, message="R&D", )), ("transition", dict( user_group_id=1, previous_statement_id=7, next_statement_id=8, message="Q&A", )), ("transition", dict( user_group_id=1, previous_statement_id=8, next_statement_id=9, message="CI action", )), ("transition", dict( user_group_id=1, previous_statement_id=2, next_statement_id=10, message="situation unblocked", )), ("transition", dict( user_group_id=1, previous_statement_id=9, next_statement_id=10, message="situation unblocked", )), ]): session.add(getattr(Base.classes,table)(**values)) session.commit() os.system("python ./generate_state_diagram.py sqlite:///test.db > out.dot ;dot -Tpng out.dot > diag2.png; xdot out.dot") s = requests.session() os.system(f"DB={DB} DB_DRIVER={DB_DRIVER} python pdca.py & sleep 1") print(s.post(url("group"), params=dict(_action="delete", id=3,name=1)).status_code) print(s.post(url("grant"), params = dict(secret_password="toto", email="j@j.com",group_id=1, )).status_code) print(s.post(url("grant"), params = dict(_redirect="/group",secret_password="toto", email="j@j.com",group_id=2, )).status_code) print(s.cookies["Token"]) print(s.post(url("user_group"), params=dict(_action="search", user_id=1)).text) print(s.post(url("group"), params=dict(_action="create", id=3,name=2)).text) print(s.post(url("group"), params=dict(_action="delete", id=3)).status_code) print(s.post(url("group"), params=dict(_action="search", )).text) os.system("pkill -f pdca.py") Which give me a nice set of data to play with while I experiment on how to handle the business logic where the core of the value is.