Planet Python
Zato Blog: Salesforce API integrations and connected apps
This instalment in a series of articles about API integrations with Salesforce covers connected apps - how to create them and how to obtain their credentials needed to exchange REST messages with Salesforce.
In Salesforce's terminology, a connected app is, essentially, an API client. It has credentials, a set of permissions, and it works on behalf of a user in an automated manner.
In particular, the kind of a connected app that I am going to create below is one that can be used in backend, server-side integrations that operate without any direct input from end users or administrators, i.e. the app is created once, its permissions and credentials are set once, and then it is able to work uninterrupted in the background, on server side.
Server-side systems are quite unlike other kinds of apps, such as mobile ones, that assume there is a human operator involved - they have their own work characteristics, related yet different, and I am not going to cover them here.
Note that permission types and their scopes are a separate, broad subject and they will described in a separate how-to article.
Finally, I assume that you are either an administrator in a Salesforce organization or that you are preparing information for another person with similar grants in Salesforce.
Conceptually, there is nothing particularly unusual about Salesforce connected apps, it is just its own mini-world of jargon and, at the end of the day, it simply enables you to invoke APIs that Salesforce is built on. It is just that knowing where to click, what to choose and how to navigate the user interface can be a daunting challenge that this article hopes to make easier to overcome.
The stepsFor an automated, server-side connected app to make use of Salesforce APIs, the requirements are:
- Having access to username/password credentials
- Creating a connected app
- Granting permissions to the app (not covered in this article)
- Obtaining a customer key and customer secret for the app
You will note that there are four credentials in total:
- Username
- Password
- Customer key
- Customer secret
Also, depending on what chapter of the Salesforce documentation you are reading, you will note that the customer key can be also known as "client_id" whereas another name for the customer secret is "client_secret". These two pairs mean the same.
Access to username/password credentialsFor starters, you need to have an account in Salesforce, a combination of username + password that you can log in with and on whose behalf the connected app will be created:
Creating a connected appOnce you are logged in, go to Setup in the top right-hand corner:
In the search box, look up "app manager":
Next, click the "New Connected App" button to the right:
Fill out the basic details such as "Connect App Name" and make sure that you select "Enable OAuth Settings". Then, given that in this document we are not dealing with the subject of permissions at all, grant full access to the connected app and finally click "Save" at the bottom of the page.
Obtaining a customer key and customer secretWe have a connected app but we still do not know what its customer key and secret are. To reveal it, go to the "App Manager" once more, either via the search box or using the menu on the left hand side.
Find your app in the list and click "View" in the list of actions. Observe that it is "View", not "Edit" or "Manage", where you can check what the credentials are:
The customer key and secret van be now revealed in the "API (Enable OAuth Settings)" section:
This concludes the process - you have a connected app and all the credentials needed now.
TestingSeeing as this document is part of a series of how-tos in the context of Zato, if you would like to integrate with Salesforce in Python, at this point you will be able to follow the steps in another where everything is detailed separately.
Just as a quick teaser, it would look akin to the below.
... # Salesforce REST API endpoint to invoke path = '/sobjects/Campaign/' # Build the request to Salesforce based on what we received request = { 'Name': input.name, 'Segment__c': input.segment, } # Create a reference to our connection definition .. salesforce = self.cloud.salesforce['My Salesforce Connection'] # .. obtain a client to Salesforce .. with salesforce.conn.client() as client: # type: SalesforceClient # .. create the campaign now. response = client.post(path, request) ...On a much lower level, however, if you would just like to quickly test out whether you configured the connected app correctly, you can invoke from command line a Salesforce REST endpoint that will return an OAuth token, as below.
Note that, as I mentioned it previously, client_id is the same as customer key and client_secret is the same as customer secret.
curl https://example.my.salesforce.com/services/oauth2/token \ -H "X-PrettyPrint: 1" \ --header 'Content-Type: application/x-www-form-urlencoded' \ --data-urlencode 'grant_type=password' \ --data-urlencode 'username=hello@example.com' \ --data-urlencode 'password=my.password' \ --data-urlencode 'client_id=my.customer.key' \ --data-urlencode 'client_secret=my.client.secret'The result will be, for instance:
{ "access_token" : "008e0000000PTzLPb!4Vzm91PeIWJo.IbPzoEZf2ygEM.6cavCt0YwAGSM", "instance_url" : "https://example.my.salesforce.com", "id" : "https://login.salesforce.com/id/008e0000000PTzLPb/0081fSUkuxPDrir000j1", "token_type" : "Bearer", "issued_at" : "1649064143961", "signature" : "dwb6rwNIzl76kZq8lQswsTyjW2uwvTnh=" }Above, we have an OAuth bearer token on output - this can be used in subsequent, business REST calls to Salesforce but how to do it exactly in practice is left for another article.
Next steps:➤ Read about how to use Python to build and integrate enterprise APIs that your tests will cover
➤ Python API integration tutorial
➤ Python Integration platform as a Service (iPaaS)
➤ What is an Enterprise Service Bus (ESB)? What is SOA?
Real Python: Quiz: How to Reset a pandas DataFrame Index
In this quiz, you’ll test your understanding of how to reset a pandas DataFrame index.
By working through the questions, you’ll review your knowledge of indexing and also expand on what you learned in the tutorial.
You’ll need to do some research outside of the tutorial to answer all the questions. Embrace this challenge and let it take you on a learning journey.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: The Real Python Podcast – Episode #225: Python Getting Faster and Leaner & Ideas for Django Projects
What changes are happening under the hood in the latest versions of Python? How are these updates laying the groundwork for a faster Python in the coming years? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: The Python Standard REPL: Try Out Code and Ideas Quickly
In this quiz, you’ll test your understanding of The Python Standard REPL: Try Out Code and Ideas Quickly.
The Python REPL allows you to run Python code interactively, which is useful for testing new ideas, exploring libraries, refactoring and debugging code, and trying out examples.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Software Foundation: Announcing Python Software Foundation Fellow Members for Q2 2024! 🎉
The PSF is pleased to announce its second batch of PSF Fellows for 2024! Let us welcome the new PSF Fellows for Q2! The following people continue to do amazing things for the Python community:
Leonard Richardson
Winnie Ke
Thank you for your continued contributions. We have added you to our Fellow roster.
The above members help support the Python ecosystem by being phenomenal leaders, sustaining the growth of the Python scientific community, maintaining virtual Python communities, maintaining Python libraries, creating educational material, organizing Python events and conferences, starting Python communities in local regions, and overall being great mentors in our community. Each of them continues to help make Python more accessible around the world. To learn more about the new Fellow members, check out their links above.
Let's continue recognizing Pythonistas all over the world for their impact on our community. The criteria for Fellow members is available online: https://www.python.org/psf/fellows/. If you would like to nominate someone to be a PSF Fellow, please send a description of their Python accomplishments and their email address to psf-fellow at python.org. Quarter 3 nominations are currently in review. We are accepting nominations for Quarter 4 through November 20th, 2024.
Are you a PSF Fellow and want to help the Work Group review nominations? Contact us at psf-fellow at python.org.
Talk Python to Me: #482: Pre-commit Hooks for Python Devs
PyPy: A DSL for Peephole Transformation Rules of Integer Operations in the PyPy JIT
As is probably apparent from the sequence of blog posts about the topic in the last year, I have been thinking about and working on integer optimizations in the JIT compiler a lot. This work was mainly motivated by Pydrofoil, where integer operations matter a lot more than for your typical Python program.
In this post I'll describe my most recent change, which is a new small domain specific language that I implemented to specify peephole optimizations on integer operations in the JIT. It uses pattern matching to specify how (sequences of) integer operations should be simplified and optimized. The rules are then compiled to RPython code that then becomes part of the JIT's optimization passes.
To make it less likely to introduce incorrect optimizations into the JIT, the rules are automatically proven correct with Z3 as part of the build process (for a more hands-on intro to how that works you can look at the knownbits post). In this blog post I want to motivate why I introduced the DSL and give an introduction to how it works.
MotivationThis summer, after I wrote my scripts to mine JIT traces for missed optimization opportunities, I started implementing a few of the integer peephole rewrite that the script identified. Unfortunately, doing so led to the problem that the way we express these rewrites up to now is very imperative and verbose. Here's a snippet of RPython code that shows some rewrites for integer multiplication (look at the comments to see what the different parts actually do). You don't need to understand the code in detail, but basically it's in very imperative style and there's quite a lot of boilerplate.
def optimize_INT_MUL(self, op): arg0 = get_box_replacement(op.getarg(0)) b0 = self.getintbound(arg0) arg1 = get_box_replacement(op.getarg(1)) b1 = self.getintbound(arg1) if b0.known_eq_const(1): # 1 * x == x self.make_equal_to(op, arg1) elif b1.known_eq_const(1): # x * 1 == x self.make_equal_to(op, arg0) elif b0.known_eq_const(0) or b1.known_eq_const(0): # 0 * x == x * 0 == 0 self.make_constant_int(op, 0) else: for lhs, rhs in [(arg0, arg1), (arg1, arg0)]: lh_info = self.getintbound(lhs) if lh_info.is_constant(): x = lh_info.get_constant_int() if x & (x - 1) == 0: # x * (2 ** c) == x << c new_rhs = ConstInt(highest_bit(lh_info.get_constant_int())) op = self.replace_op_with(op, rop.INT_LSHIFT, args=[rhs, new_rhs]) self.optimizer.send_extra_operation(op) return elif x == -1: # x * -1 == -x op = self.replace_op_with(op, rop.INT_NEG, args=[rhs]) self.optimizer.send_extra_operation(op) return else: # x * (1 << y) == x << y shiftop = self.optimizer.as_operation(get_box_replacement(lhs), rop.INT_LSHIFT) if shiftop is None: continue if not shiftop.getarg(0).is_constant() or shiftop.getarg(0).getint() != 1: continue shiftvar = get_box_replacement(shiftop.getarg(1)) shiftbound = self.getintbound(shiftvar) if shiftbound.known_nonnegative() and shiftbound.known_lt_const(LONG_BIT): op = self.replace_op_with( op, rop.INT_LSHIFT, args=[rhs, shiftvar]) self.optimizer.send_extra_operation(op) return return self.emit(op)Adding more rules to these functions is very tedious and gets super confusing when the functions get bigger. In addition I am always worried about making mistakes when writing this kind of code, and there is no feedback at all about which of these rules are actually applied a lot in real programs.
Therefore I decided to write a small domain specific language with the goal of expressing these rules in a more declarative way. In the rest of the post I'll describe the DSL (most of that description is adapted from the documentation about it that I wrote).
The Peephole Rule DSL Simple transformation rulesThe rules in the DSL specify how integer operation can be transformed into cheaper other integer operations. A rule always consists of a name, a pattern, and a target. Here's a simple rule:
add_zero: int_add(x, 0) => xThe name of the rule is add_zero. It matches operations in the trace of the form int_add(x, 0), where x will match anything and 0 will match only the constant zero. After the => arrow is the target of the rewrite, i.e. what the operation is rewritten to, in this case x.
The rule language has a list of which of the operations are commutative, so add_zero will also optimize int_add(0, x) to x.
Variables in the pattern can repeat:
sub_x_x: int_sub(x, x) => 0This rule matches against int_sub operations where the two arguments are the same (either the same box, or the same constant).
Here's a rule with a more complicated pattern:
sub_add: int_sub(int_add(x, y), y) => xThis pattern matches int_sub operations, where the first argument was produced by an int_add operation. In addition, one of the arguments of the addition has to be the same as the second argument of the subtraction.
The constants MININT, MAXINT and LONG_BIT (which is either 32 or 64, depending on which platform the JIT is built for) can be used in rules, they behave like writing numbers but allow bit-width-independent formulations:
is_true_and_minint: int_is_true(int_and(x, MININT)) => int_lt(x, 0)It is also possible to have a pattern where some arguments needs to be a constant, without specifying which constant. Those patterns look like this:
sub_add_consts: int_sub(int_add(x, C1), C2) # incomplete # more goes here => int_sub(x, C)Variables in the pattern that start with a C match against constants only. However, in this current form the rule is incomplete, because the variable C that is being used in the target operation is not defined anywhere. We will see how to compute it in the next section.
Computing constants and other intermediate resultsSometimes it is necessary to compute intermediate results that are used in the target operation. To do that, there can be extra assignments between the rule head and the rule target.:
sub_add_consts: int_sub(int_add(x, C1), C2) # incomplete C = C1 + C1 => int_sub(x, C)The right hand side of such an assignment is a subset of Python syntax, supporting arithmetic using +, -, *, and certain helper functions. However, the syntax allows you to be explicit about unsignedness for some operations. E.g. >>u exists for unsigned right shifts (and I plan to add >u, >=u, <u, <=u for comparisons).
Here's an example of a rule that uses >>u:
urshift_lshift_x_c_c: uint_rshift(int_lshift(x, C), C) mask = (-1 << C) >>u C => int_and(x, mask) ChecksSome rewrites are only true under certain conditions. For example, int_eq(x, 1) can be rewritten to x, if x is known to store a boolean value. This can be expressed with checks:
eq_one: int_eq(x, 1) check x.is_bool() => xA check is followed by a boolean expression. The variables from the pattern can be used as IntBound instances in checks (and also in assignments) to find out what the abstract interpretation of the JIT knows about the value of a trace variable (IntBound is the name of the abstract domain that the JIT uses for integers, despite the fact that it also stores knownbits information nowadays).
Here's another example:
mul_lshift: int_mul(x, int_lshift(1, y)) check y.known_ge_const(0) and y.known_le_const(LONG_BIT) => int_lshift(x, y)It expresses that x * (1 << y) can be rewritten to x << y but checks that y is known to be between 0 and LONG_BIT.
Checks and assignments can be repeated and combined with each other:
mul_pow2_const: int_mul(x, C) check C > 0 and C & (C - 1) == 0 shift = highest_bit(C) => int_lshift(x, shift)In addition to calling methods on IntBound instances, it's also possible to access their attributes, like in this rule:
and_x_c_in_range: int_and(x, C) check x.lower >= 0 and x.upper <= C & ~(C + 1) => x Rule Ordering and LivenessThe generated optimizer code will give preference to applying rules that produce a constant or a variable as a rewrite result. Only if none of those match do rules that produce new result operations get applied. For example, the rules sub_x_x and sub_add are tried before trying sub_add_consts, because the former two rules optimize to a constant and a variable respectively, while the latter produces a new operation as the result.
The rule sub_add_consts has a possible problem, which is that if the intermediate result of the int_add operation in the rule head is used by some other operations, then the sub_add_consts rule does not actually reduce the number of operations (and might actually make things slightly worse due to increased register pressure). However, currently it would be extremely hard to take that kind of information into account in the optimization pass of the JIT, so we optimistically apply the rules anyway.
Checking rule coverageEvery rewrite rule should have at least one unit test where it triggers. To ensure this, the unit test file that mainly checks integer optimizations in the JIT has an assert at the end of a test run, that every rule fired at least once.
Printing rule statisticsThe JIT can print statistics about which rule fired how often in the jit-intbounds-stats logging category, using the PYPYLOG mechanism. For example, to print the category to stdout at the end of program execution, run PyPy like this:
PYPYLOG=jit-intbounds-stats:- pypy ...The output of that will look something like this:
int_add add_reassoc_consts 2514 add_zero 107008 int_sub sub_zero 31519 sub_from_zero 523 sub_x_x 3153 sub_add_consts 159 sub_add 55 sub_sub_x_c_c 1752 sub_sub_c_x_c 0 sub_xor_x_y_y 0 sub_or_x_y_y 0 int_mul mul_zero 0 mul_one 110 mul_minus_one 0 mul_pow2_const 1456 mul_lshift 0 ... Termination and ConfluenceRight now there are unfortunately no checks that the rules actually rewrite operations towards "simpler" forms. There is no cost model, and also nothing that prevents you from writing a rule like this:
neg_complication: int_neg(x) # leads to infinite rewrites => int_mul(-1, x)Doing this would lead to endless rewrites if there is also another rule that turns multiplication with -1 into negation.
There is also no checking for confluence (yet?), i.e. the property that all rewrites starting from the same input trace always lead to the same output trace, no matter in which order the rules are applied.
ProofsIt is very easy to write a peephole rule that is not correct in all corner cases. Therefore all the rules are proven correct with Z3 before compiled into actual JIT code, by default. When the proof fails, a (hopefully minimal) counterexample is printed. The counterexample consists of values for all the inputs that fulfil the checks, values for the intermediate expressions, and then two different values for the source and the target operations.
E.g. if we try to add the incorrect rule:
mul_is_add: int_mul(a, b) => int_add(a, b)We get the following counterexample as output:
Could not prove correctness of rule 'mul_is_add' in line 1 counterexample given by Z3: counterexample values: a: 0 b: 1 operation int_mul(a, b) with Z3 formula a*b has counterexample result vale: 0 BUT target expression: int_add(a, b) with Z3 formula a + b has counterexample value: 1If we add conditions, they are taken into account and the counterexample will fulfil the conditions:
mul_is_add: int_mul(a, b) check a.known_gt_const(1) and b.known_gt_const(2) => int_add(a, b)This leads to the following counterexample:
Could not prove correctness of rule 'mul_is_add' in line 46 counterexample given by Z3: counterexample values: a: 2 b: 3 operation int_mul(a, b) with Z3 formula a*b has counterexample result vale: 6 BUT target expression: int_add(a, b) with Z3 formula a + b has counterexample value: 5Some IntBound methods cannot be used in Z3 proofs because their control flow is too complex. If that is the case, they can have Z3-equivalent formulations defined (in every case this is done, it's a potential proof hole if the Z3 friendly reformulation and the real implementation differ from each other, therefore extra care is required to make very sure they are equivalent).
It's possible to skip the proof of individual rules entirely by adding SORRY_Z3 to its body (but we should try not to do that too often):
eq_different_knownbits: int_eq(x, y) SORRY_Z3 check x.known_ne(y) => 0 Checking for satisfiabilityIn addition to checking whether the rule yields a correct optimization, we also check whether the rule can ever apply. This ensures that there are some runtime values that would fulfil all the checks in a rule. Here's an example of a rule violating this:
never_applies: int_is_true(x) check x.known_lt_const(0) and x.known_gt_const(0) # impossible condition, always False => xRight now the error messages if this goes wrong are not completely easy to understand. I hope to be able to improve this later:
Rule 'never_applies' cannot ever apply in line 1 Z3 did not manage to find values for variables x such that the following condition becomes True: And(x <= x_upper, x_lower <= x, If(x_upper < 0, x_lower > 0, x_upper < 0)) Implementation NotesThe implementation of the DSL is done in a relatively ad-hoc manner. It is parsed using rply, there's a small type checker that tries to find common problems in how the rules are written. Z3 is used via the Python API, like in the previous blog posts that are using it. The pattern matching RPython code is generated using an approach inspired by Luc Maranget's paper Compiling Pattern Matching to Good Decision Trees. See this blog post for an approachable introduction.
ConclusionNow that I've described the DSL, here are the rules that are equivalent to the imperative code in the motivation section:
mul_zero: int_mul(x, 0) => 0 mul_one: int_mul(x, 1) => x mul_minus_one: int_mul(x, -1) => int_neg(x) mul_pow2_const: int_mul(x, C) check C > 0 and C & (C - 1) == 0 shift = highest_bit(C) => int_lshift(x, shift) mul_lshift: int_mul(x, int_lshift(1, y)) check y.known_ge_const(0) and y.known_le_const(LONG_BIT) => int_lshift(x, y)The current status of the DSL is that it got merged to PyPy's main branch. I rewrote a part of the integer rewrites into the DSL, but some are still in the old imperative style (mostly for complicated reasons, the easily ported ones are all done). Since I've only been porting optimizations that had existed prior to the existence of the DSL, performance numbers of benchmarks didn't change.
There are a number of features that are still missing and some possible extensions that I plan to work on in the future:
All the integer operations that the DSL handles so far are the variants that do not check for overflow (or where overflow was proven to be impossible to happen). In regular Python code the overflow-checking variants int_add_ovf etc are much more common, but the DSL doesn't support them yet. I plan to fix this, but don't completely understand how the correctness proofs for them should be done correctly.
A related problem is that I don't understand what it means for a rewrite to be correct if some of the operations are only defined for a subset of the input values. E.g. division isn't defined if the divisor is zero. In theory, a division operation in the trace should always be preceded by a check that the divisor isn't zero. But sometimes other optimization move the check around and the connection to the division gets lost or muddled. What optimizations can we still safely perform on the division? There's lots of prior work on this question, but I still don't understand what the correct approach in our context would be.
Ordering comparisons like int_lt, int_le and their unsigned variants are not ported to the DSL yet. Comparisons are an area where the JIT is not super good yet at optimizing away operations. This is a pretty big topic and I've started a project with Nico Rittinghaus to try to improve the situation a bit more generally.
A more advanced direction of work would be to implement a simplified form of e-graphs (or ae-graphs). The JIT has like half of an e-graph data structure already, and we probably can't afford a full one in terms of compile time costs, but maybe we can have two thirds or something?
Thank you to Max Bernstein and Martin Berger for super helpful feedback on drafts of the post!
The Python Show: 48 - Writing About Python with David Mertz
In this episode of the Python Show Podcast, David Mertz is our guest. David is a prolific writer about the Python programming language. From his extremely popular IPM Developerworks articles to his multiple books on the Python language, David has been a part of the Python community for decades.
We ended up chatting about:
The history of Python
Book writing
Conference speaking
The PSF
and more!
David Mertz’s website
PyDev of the Week: David Mertz on Mouse vs Python
David Mertz on InformIT / Pearson
Real Python: Python Thread Safety: Using a Lock and Other Techniques
Python threading allows you to run parts of your code concurrently, making the code more efficient. However, when you introduce threading to your code without knowing about thread safety, you may run into issues such as race conditions. You solve these with tools like locks, semaphores, events, conditions, and barriers.
By the end of this tutorial, you’ll be able to identify safety issues and prevent them by using the synchronization primitives in Python’s threading module to make your code thread-safe.
In this tutorial, you’ll learn:
- What thread safety is
- What race conditions are and how to avoid them
- How to identify thread safety issues in your code
- What different synchronization primitives exist in the threading module
- How to use synchronization primitives to make your code thread-safe
To get the most out of this tutorial, you’ll need to have basic experience working with multithreaded code using Python’s threading module and ThreadPoolExecutor.
Get Your Code: Click here to download the free sample code that you’ll use to learn about thread safety techniques in Python.
Take the Quiz: Test your knowledge with our interactive “Python Thread Safety: Using a Lock and Other Techniques” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python Thread Safety: Using a Lock and Other TechniquesIn this quiz, you'll test your understanding of Python thread safety. You'll revisit the concepts of race conditions, locks, and other synchronization primitives in the threading module. By working through this quiz, you'll reinforce your knowledge about how to make your Python code thread-safe.
Threading in PythonIn this section, you’ll get a general overview of how Python handles threading. Before discussing threading in Python, it’s important to revisit two related terms that you may have heard about in this context:
- Concurrency: The ability of a system to handle multiple tasks by allowing their execution to overlap in time but not necessarily happen simultaneously.
- Parallelism: The simultaneous execution of multiple tasks that run at the same time to leverage multiple processing units, typically multiple CPU cores.
Python’s threading is a concurrency framework that allows you to spin up multiple threads that run concurrently, each executing pieces of code. This improves the efficiency and responsiveness of your application. When running multiple threads, the Python interpreter switches between them, handing the control of execution over to each thread.
By running the script below, you can observe the creation of four threads:
Python threading_example.py import threading import time from concurrent.futures import ThreadPoolExecutor def threaded_function(): for number in range(3): print(f"Printing from {threading.current_thread().name}. {number=}") time.sleep(0.1) with ThreadPoolExecutor(max_workers=4, thread_name_prefix="Worker") as executor: for _ in range(4): executor.submit(threaded_function) Copied!In this example, threaded_function prints the values zero to two that your for loop assigns to the loop variable number. Using a ThreadPoolExecutor, four threads are created to execute the threaded function. ThreadPoolExecutor is configured to run a maximum of four threads concurrently with max_workers=4, and each worker thread is named with a “Worker” prefix, as in thread_name_prefix="Worker".
In print(), the .name attribute on threading.current_thread() is used to get the name of the current thread. This will help you identify which thread is executed each time. A call to sleep() is added inside the threaded function to increase the likelihood of a context switch.
You’ll learn what a context switch is in just a moment. First, run the script and take a look at the output:
Shell $ python threading_example.py Printing from Worker_0. number=0 Printing from Worker_1. number=0 Printing from Worker_2. number=0 Printing from Worker_3. number=0 Printing from Worker_0. number=1 Printing from Worker_2. number=1 Printing from Worker_1. number=1 Printing from Worker_3. number=1 Printing from Worker_0. number=2 Printing from Worker_2. number=2 Printing from Worker_1. number=2 Printing from Worker_3. number=2 Copied!Each line in the output represents a print() call from a worker thread, identified by Worker_0, Worker_1, Worker_2, and Worker_3. The number that follows the worker thread name shows the current iteration of the loop each thread is executing. Each thread takes turns executing the threaded_function, and the execution happens in a concurrent rather than sequential manner.
For example, after Worker_0 prints number=0, it’s not immediately followed by Worker_0 printing number=1. Instead, you see outputs from Worker_1, Worker_2, and Worker_3 printing number=0 before Worker_0 proceeds to number=1. You’ll notice from these interleaved outputs that multiple threads are running at the same time, taking turns to execute their part of the code.
This happens because the Python interpreter performs a context switch. This means that Python pauses the execution state of the current thread and passes control to another thread. When the context switches, Python saves the current execution state so that it can resume later. By switching the control of execution at specific intervals, multiple threads can execute code concurrently.
You can check the context switch interval of your Python interpreter by typing the following in the REPL:
Python >>> import sys >>> sys.getswitchinterval() 0.005 Copied!The output of calling the getswitchinterval() is a number in seconds that represents the context switch interval of your Python interpreter. In this case, it’s 0.005 seconds or five milliseconds. You can think of the switch interval as how often the Python interpreter checks if it should switch to another thread.
An interval of five milliseconds doesn’t mean that threads switch exactly every five milliseconds, but rather that the interpreter considers switching to another thread at these intervals.
The switch interval is defined in the Python docs as follows:
Read the full article at https://realpython.com/python-thread-lock/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: Python Class Constructors: Control Your Object Instantiation
In this quiz, you’ll test your understanding of Python Class Constructors.
By working through this quiz, you’ll revisit the internal instantiation process, object initialization using .__init__(), and fine-tuning object creation by overriding .__new__().
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
PyCoder’s Weekly: Issue #652 (Oct. 22, 2024)
#652 – OCTOBER 22, 2024
View in Browser »
In this tutorial, you’ll learn how to harness the power of structural pattern matching in Python. You’ll explore the new syntax, delve into various pattern types, and find appropriate applications for pattern matching, all while identifying common pitfalls.
REAL PYTHON
The itertools module offers four combinatoric iterators that generate different combined outputs from one or more iterable. This post covers all of them: product, permutations, combinations, and combinations_with_replacement.
JUHA-MATTI SANTALA
Extract all the data you need from any website without getting blocked with ZenRows’ Scraper API – a complete toolkit with premium proxies, anti-CAPTCHA, cloud-based scalable browsers, and more. Start your free trial now →
ZENROWS sponsor
Unlock the inner workings of the Python language, compile the Python interpreter from source code, and participate in the development of CPython. Guido van Rossum, the creator of Python, says: “I can recommend CPython Internals to anyone who wants to get going with hacking on CPython” →
ANTHONY SHAW sponsor
Reading and writing files is a basic task that most software applications need to do, but what if you need to do that on remote machines? This tutorial introduces you to Fabric and how to connect over SSH in Python.
MIKE DRISCOLL
Most devices record a variety of metadata when generating images. While some of that information may be innocuous, you could end up exposing the GPS coordinates to your home if you aren’t careful. In this article, Stefanie provides a brief introduction to image metadata, and then shows you how to remove it with exif-stripper.
STEFANIEMOLIN.COM • Shared by Stefanie Molin
Python vs. JavaScript: Which open-source community is leading the way? This analysis of 36,000 GitHub repositories explores the evolution of Python and JavaScript ecosystems, highlighting key trends and popular topics. Discover how open-source communities of Python and JavaScript have shaped the tech landscape.
PYCHALLENGER.COM • Shared by Erik Nogueira Kückelheim
Experience the power of Edge AI—delivering lightning-fast, real-time processing where it matters. Optimize your applications to push performance and accuracy beyond limits with Intel’s OpenVINO toolkit.
INTEL CORPORATION sponsor
In this video course, you’ll learn how to define multiple return types using type hints in Python. This course covers working with single or multiple pieces of data, defining type aliases, and performing type checking using a third-party static type checker tool.
REAL PYTHON course
How does a Python tool support all types of DataFrames and their various features? Could a lightweight library be used to add compatibility for newer formats like Polars or PyArrow? This week on the show, we speak with Marco Gorelli about his project, Narwhals.
REAL PYTHON podcast
In this tutorial, you’ll learn what syntactic sugar is and how Python uses it to help you create more readable, descriptive, clean, and Pythonic code. You’ll also learn how to replace a given piece of syntactic sugar with another syntax construct.
REAL PYTHON
Julia asked some folks on Mastodon what they found confusing about working in a terminal. It turns out that entering text in the terminal is complicated. This post talks about why that is and how to understand it better.
JULIA EVANS
Software engineering provides a lot of leverage and small teams can do a large amount of work. This post talks about several common examples in the industry where a small group created a big product.
LEONARDO CREED
Mariatta has been a Python Core Developer since 2017. If you want to know just what that means, this post talks about all the things she gets to do.
MARIATTA
Pydantic lets you create custom types. This post talks about how to create a custom dictionary type using root models and Enums.
BRYAN ANTHONIO
This article looks at some examples and best practices when using Lambda functions in Python.
FEDERICO TROTTA • Shared by AppSignal
October 23, 2024
REALPYTHON.COM
October 24, 2024
MEETUP.COM
October 25 to October 27, 2024
PYCON.ID
October 25 to October 28, 2024
PYCON.KR
October 26 to October 28, 2024
PYTHONHO.COM
October 26, 2024
PYTHON.ORG.BR
October 27, 2024
DJANGOGIRLS.ORG
October 31 to November 3, 2024
PYCON.FR
October 31 to November 3, 2024
PYCON.ORG
Happy Pythoning!
This was PyCoder’s Weekly Issue #652.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]
Real Python: Understanding Python's Global Interpreter Lock (GIL)
The Python Global Interpreter Lock or GIL, in simple words, is a mutex (or a lock) that allows only one thread to hold the control of the Python interpreter.
This means that only one thread can be in a state of execution at any point in time. The impact of the GIL isn’t visible to developers who execute single-threaded programs, but it can be a performance bottleneck in CPU-bound and multi-threaded code.
Since the GIL allows only one thread to execute at a time even in a multi-threaded architecture with more than one CPU core, the GIL has gained a reputation as an “infamous” feature of Python.
In this video course you’ll learn how the GIL affects the performance of your Python programs, and how you can mitigate the impact it might have on your code.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Real Python: Quiz: Defining Your Own Python Function
In this quiz, you’ll test your understanding of how to define your own Python function.
You’ll revisit theoretical knowledge about passing values to functions, when to divide your program into separate user-defined functions, and all the tools you’ll need to define complex and powerful functions in Python.
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Anywhere: Improving PythonAnywhere's File Storage System
PythonAnywhere has been around for over 10 years, and as our platform continues to grow with thousands of users, we’re committed to keeping it in top shape. Part of this involves upgrading some of the older parts of our infrastructure, with a special focus on our file storage servers—some of the oldest systems we have.
Julien Tayon: Tune your guitar with python
Long story short, I suck at tuning my instrument and just lost my tuner...
This will require the python module soundevice and matplotlib.
So in order to tune my guitar I indeed need a spectrosonogram that displays the frequencies captured in real time by an audio device with an output readable enough I can actually know if I am nearing a legit frequency called a Note.
The frequencies for the notes are pretty arbitrary and I chose to only show the frequency for E, A , D, G, B since I have a 5 strings bass.
I chose the frequency between 100 and 2000 knowing that anyway any frequency below will trigger harmonics and above will trigger reasonance in the right frequency frame.
Plotting a spectrogram is done by tweaking the eponym matplotlib grapher with values chosen to fit my need and show me a laser thin beam around the right frequency. #!/usr/bin/env python3 """Show a text-mode spectrogram using live microphone data.""" import argparse import math import shutil import matplotlib.pyplot as plt from multiprocessing import Process, Queue import matplotlib.animation as animation import numpy as np import sounddevice as sd usage_line = ' press enter to quit,' def int_or_str(text): """Helper function for argument parsing.""" try: return int(text) except ValueError: return text try: columns, _ = shutil.get_terminal_size() except AttributeError: columns = 80 parser = argparse.ArgumentParser(add_help=False) parser.add_argument( '-l', '--list-devices', action='store_true', help='show list of audio devices and exit') args, remaining = parser.parse_known_args() if args.list_devices: print(sd.query_devices()) parser.exit(0) parser = argparse.ArgumentParser( description=__doc__ + '\n\nSupported keys:' + usage_line, formatter_class=argparse.RawDescriptionHelpFormatter, parents=[parser]) parser.add_argument( '-b', '--block-duration', type=float, metavar='DURATION', default=50, help='block size (default %(default)s milliseconds)') parser.add_argument( '-d', '--device', type=int_or_str, help='input device (numeric ID or substring)') parser.add_argument( '-g', '--gain', type=float, default=10, help='initial gain factor (default %(default)s)') parser.add_argument( '-r', '--range', type=float, nargs=2, metavar=('LOW', 'HIGH'), default=[50, 4000], help='frequency range (default %(default)s Hz)') args = parser.parse_args(remaining) low, high = args.range if high <= low: parser.error('HIGH must be greater than LOW') q = Queue() try: samplerate = sd.query_devices(args.device, 'input')['default_samplerate'] def plot(q): global samplerate fig, ( ax,axs) = plt.subplots(nrows=2) plt.ioff() def animate(i,q): data = q.get() ax.clear() axs.clear() axs.plot(data) ax.set_yticks([ 41.20, 82.41, 164.8, 329.6, 659.3, # E 55.00, 110.0, 220.0, 440.0, 880.0, # A 73.42, 146.8, 293.7, 587.3, # D 49.00, 98.00, 196.0, 392.0, 784.0, #G 61.74, 123.5, 246.9, 493.9, 987.8 ])#B ax.specgram(data[:,-1],mode="magnitude", Fs=samplerate*2, scale="linear",NFFT=9002) ax.set_ylim(150,1000) ani = animation.FuncAnimation(fig, animate,fargs=(q,), interval=500) plt.show() plotrt = Process(target=plot, args=(q,)) plotrt.start() def callback(indata, frames, time, status): if any(indata): q.put(indata) else: print('no input') with sd.InputStream(device=args.device, channels=1, callback=callback, blocksize=int(samplerate * args.block_duration /50 ), samplerate=samplerate) as sound: while True: response = input() if response in ('', 'q', 'Q'): break for ch in response: if ch == '+': args.gain *= 2 elif ch == '-': args.gain /= 2 else: print('\x1b[31;40m', usage_line.center(args.columns, '#'), '\x1b[0m', sep='') break except KeyboardInterrupt: parser.exit('Interrupted by user') except Exception as e: parser.exit(type(e).__name__ + ': ' + str(e))
Real Python: Python's property(): Add Managed Attributes to Your Classes
With Python’s property(), you can create managed attributes in your classes. You can use managed attributes when you need to modify an attribute’s internal implementation and don’t want to change the class’s public API. Providing stable APIs will prevent you from breaking your users’ code when they rely on your code.
Properties are arguably the most popular way to create managed attributes quickly and in the purest Pythonic style.
In this tutorial, you’ll learn how to:
- Create managed attributes or properties in your classes
- Perform lazy attribute evaluation and provide computed attributes
- Make your classes Pythonic using properties instead of setter and getter methods
- Create read-only and read-write properties
- Create consistent and backward-compatible APIs for your classes
You’ll also write practical examples that use property() for validating input data, computing attribute values dynamically, logging your code, and more. To get the most out of this tutorial, you should know the basics of object-oriented programming, classes, and decorators in Python.
Get Your Code: Click here to download the free sample code that shows you how to use Python’s property() to add managed attributes to your classes.
Take the Quiz: Test your knowledge with our interactive “Python's property(): Add Managed Attributes to Your Classes” quiz. You’ll receive a score upon completion to help you track your learning progress:
Interactive Quiz
Python's property(): Add Managed Attributes to Your ClassesIn this quiz, you'll test your understanding of Python's property(). With this knowledge, you'll be able to create managed attributes in your classes, perform lazy attribute evaluation, provide computed attributes, and more.
Managing Attributes in Your ClassesWhen you define a class in an object-oriented programming language, you’ll probably end up with some instance and class attributes. In other words, you’ll end up with variables that are accessible through the instance, class, or even both, depending on the language. Attributes represent and hold the internal state of a given object, which you’ll often need to access and mutate.
Typically, you have at least two ways to access and mutate an attribute. Either you can access and mutate the attribute directly or you can use methods. Methods are functions attached to a given class. They provide the behaviors and actions that an object can perform with its internal data and attributes.
If you expose attributes to the user, then they become part of the class’s public API. This means that your users will access and mutate them directly in their code. The problem comes when you need to change the internal implementation of a given attribute.
Say you’re working on a Circle class and add an attribute called .radius, making it public. You finish coding the class and ship it to your end users. They start using Circle in their code to create a lot of awesome projects and applications. Good job!
Now suppose that you have an important user that comes to you with a new requirement. They don’t want Circle to store the radius any longer. Instead, they want a public .diameter attribute.
At this point, removing .radius to start using .diameter could break the code of some of your other users. You need to manage this situation in a way other than removing .radius.
Programming languages such as Java and C++ encourage you to never expose your attributes to avoid this kind of problem. Instead, you should provide getter and setter methods, also known as accessors and mutators, respectively. These methods offer a way to change the internal implementation of your attributes without changing your public API.
Note: Getter and setter methods are often considered an anti-pattern and a signal of poor object-oriented design. The main argument behind this proposition is that these methods break encapsulation. They allow you to access and mutate the components of your objects from the outside.
These programming languages need getter and setter methods because they don’t have a suitable way to change an attribute’s internal implementation when a given requirement changes. Changing the internal implementation would require an API modification, which can break your end users’ code.
The Getter and Setter Approach in PythonTechnically, there’s nothing that stops you from using getter and setter methods in Python. Here’s a quick example that shows how this approach would look:
Python point_v1.py class Point: def __init__(self, x, y): self._x = x self._y = y def get_x(self): return self._x def set_x(self, value): self._x = value def get_y(self): return self._y def set_y(self, value): self._y = value Copied!In this example, you create a Point class with two non-public attributes ._x and ._y to hold the Cartesian coordinates of the point at hand.
Note: Python doesn’t have the notion of access modifiers, such as private, protected, and public, to restrict access to attributes and methods. In Python, the distinction is between public and non-public class members.
If you want to signal that a given attribute or method is non-public, then you have to use the well-known Python convention of prefixing the name with an underscore (_). That’s the reason behind the naming of the attributes ._x and ._y.
Note that this is just a convention. It doesn’t stop you and other programmers from accessing the attributes using dot notation, as in obj._attr. However, it’s bad practice to violate this convention.
To access and mutate the value of either ._x or ._y, you can use the corresponding getter and setter methods. Go ahead and save the above definition of Point in a Python module and import the class into an interactive session. Then run the following code:
Python >>> from point_v1 import Point >>> point = Point(12, 5) >>> point.get_x() 12 >>> point.get_y() 5 >>> point.set_x(42) >>> point.get_x() 42 >>> # Non-public attributes are still accessible >>> point._x 42 >>> point._y 5 Copied!With .get_x() and .get_y(), you can access the current values of ._x and ._y. You can use the setter method to store a new value in the corresponding managed attribute. From the two final examples, you can confirm that Python doesn’t restrict access to non-public attributes. Whether or not you access them directly is up to you.
Read the full article at https://realpython.com/python-property/ »[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
Python Bytes: #406 What's on Django TV tonight?
Zato Blog: HL7 FHIR Security with Basic Auth, OAuth and SSL/TLS
Preliminary reading: HL7 FHIR Integrations in Python
FHIR servers offer their APIs using REST, which in turn means that they are HTTP servers under the hood. As a result, a few common security mechanisms can be employed to secure access to the servers when you invoke them.
- Basic Auth
- OAuth
- SSL/TLS encryption
When you integrate with a FHIR server, consult with its maintainers what of these, if any, should be used.
Note that Basic Auth and OAuth, when used over HTTP, are mutually exclusive. The HTTP protocol reserves only one header, called Authorization, for the authorization credentials and there is no standard way in the protocol to use more than one authorization mechanism.
On the other hand, the SSL/TLS encryption is independent of the credentials used and can be employed with Basic Auth, OAuth, or in scenarios when no authorization header is sent at all.
This is why, when you create or update a FHIR connection, you can set Basic Auth or OAuth and SSL/TLS options independently, as in the screenshot below:
Basic AuthBasic Auth is a combination of username and password credentials. They are allocated in advance by the maintainers of a FHIR server.
To use them in your integrations, create a new Basic Auth definition in the Dashboard, set its password, and assign it to an existing or new outgoing FHIR connection. No restarts nor reloads are required.
OAuthAuthentication with OAuth is built around a notion of short-lived tokens, which are simple strings. Your Zato server has credentials, much like a username and password although they are called a client ID and secret, based on which an authentication server issues tokens that let Zato prove that in fact it does have the correct credentials and that it is allowed to invoke a particular API as it is defined through scopes.
Scopes can be likened to permissions to access a subset of resources from a FHIR server. When Zato receives a token from an authentication server the token is inherently linked to specific scopes within which it can interact with the FHIR server.
That tokens are short-lived means that they need to be refreshed periodically, e.g. at least once per hour Zato needs to ask the authentication server for a new token based on its credentials. This process takes place under the hood and requires no configuration on your part.
To use OAuth in your integrations, create a new OAuth definition in the Dashboard, set its secret, and assign it to an existing or new outgoing FHIR connection. No restarts nor reloads are required.
SSL/TLS encryptionIf the FHIR server that a connection points to uses SSL/TLS then a question arises of how to validate the certificate of the Certificate Authority (CA) that signed the certificate that the FHIR server uses. There are several options:
- Skip validation - the certificate will not be validated at all and it will be always accepted.
- Default bundle - the certificate will have to belong to one of the publicly recognized CAs. The bundle contains the same certificates that common web browsers do so this option is good if the FHIR server is available in the public Internet, e.g. https://fhir.example.com
- Custom bundle - if the FHIR server's certificate was signed by an internal, non-public CA then the CA's certificate bundle, in the PEM format, can be uploaded through the Dashboard to make it possible to choose it when the FHIR connection is being created or updated.
➤ Read about how to use Python to build and integrate enterprise APIs that your tests will cover
➤ Python API integration tutorial
➤ Python Integration platform as a Service (iPaaS)
➤ What is an Enterprise Service Bus (ESB)? What is SOA?
Julien Tayon: Hello world part II : actually recoding print
Here, we are gonna deep inside one of the most overlooked object oriented abastraction : a file and actually print what we can of hello world in 100 lines of code.
The file handler and the file descriptor
These two abstractions are the low level and high level abstractions of the same thing : a view on something more complex which access has been encapsulated in generic methods. Actually when you code a framebuffer driver you provide function pointers that are specialized to your device and you may omit those common to the class. This is done with a double lookup on the major node, minor node number. Of those « generic » methods you have : seek, write, tell, truncate, open, read, close ...
The file handler in python also handles extra bytes (pun) of facilities : like character encoding, stats, and buffering.
Here, we work with the low level abstraction : the file which we access with fileno through its file descriptor. And thanks to this abstraction, you don't care if the underlying implementation fragments the file itself (ex: on a hard drive), you can magically always ask without caring for the gory details to read any arbitrary block or chararacters at any given position.
Two of the most used methods on files here are seek and write.
The file descriptor method write is sensitive to the positionning set by seek. Hence, we can write a line, and position ourselves one line below to write the next line in a character.
matrices as a view of a row based array
When I speak of rows and columns I evocate abstractions that are not related to the framebuffer.
The coordinates are an abstraction we build for convenience to say I want to write from this line at this column.
And since human beings bug after 2 dimensions we split the last dimnension in a vector of dimension 4 called a pixel.
get_at function illustrates our use of this trick to position the (invisible) cursor at any given position on the screen expressed for clarity in size of glyphes.
We could actually code all this exercice through a 3D view of the framebuffer. I just wouldn't be able to pack the code in less than 100 lines of code and would introduce useless abstractions.
But if you have doubt on the numerous seek I do and why I mutiply lines and columns value the way I do check the preceding link for an understanding of raw based array nth matricial dimensionnal views.
fonts, chars glyphs...
Here we are gonna take matrices defining the glyphes (what you actually see on screen) by 8x8 = 64 1D array and map them on the screen with put_char. Put char does a little bit of magic by relying on python to do the chararcter to glyph conversion through the dict lookup that expecting strings does a de factor codepoint to glyph conversion without having to pass the codepoint conversion.
The set of characters to glyphs conversion with their common property is a font.
The hidden console
The console is an abstraction that keeps track of the global states such as : what is the current line we print at. Thus, here, being lazy I use the global variables instead of a singleton named « console » or « term » to keep track of them. But first and foremost, these « abstractions » are just expectations we share in a common mental mode. Like we expect « print » to add a newline at the end of the string and begin the printing at the next line.
The to be finished example
I limited the code to 100 lines so that it's fairly readable. I let as an exercise the following points :
- encoding the missing glyphes in the font to actually be able to write "hello world!",
- handling the edge case of reaching the bottom of the screen,
This example is a part of a project to write « hello world » on the framebuffer in numerous languages, bash included.
Annexe : the code
#!/usr/bin/env python3 from struct import pack from os import SEEK_CUR, lseek as seek, write w,h =map(int, open("/sys/class/graphics/fb0/virtual_size").read().split(",")) so_pixel = 4 stride = w * so_pixel encode = lambda b,g,r,a : pack("4B",b,g,r,a) font = { "height":8, "width":8, 'void' : [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, ], "l":[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, ], "o": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0], "h": [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0], } def go_at(fh, x, y): global stride seek(fh.fileno(),x*so_pixel + y *stride, 0) def next_line(fh, reminder): seek(fh.fileno(), stride - reminder, SEEK_CUR) def put_char(fh, x,y, letter): go_at(fh, x, y) black = encode(0,0,0,255) white = encode(255,255,255,255) char = font.get(letter, None) or font["void"] line = "" for col,pixel in enumerate(char): write(fh.fileno(), white if pixel else black) if (col%font["width"]==font["width"]-1): next_line(fh, so_pixel * font["width"]) COL=0 LIN=0 OUT = open("/dev/fb0", "bw") FD = OUT.fileno() def newline(): global OUT,LIN,COL LIN+=1 go_at(OUT, 0, LIN * font["height"]) def print_(line): global OUT, COL, LIN COL=0 for c in line: if c == "\n": newline() put_char(OUT,COL * font["width"] , LIN * font['height'], c) COL+=1 newline() for i in range(30): print_("hello lol")
Seth Michael Larson: Python and Sigstore
Published 2024-10-21 by Seth Larson
Reading time: minutes
I was a guest on the Open Source Security podcast this week talking about Sigstore and Python among other things I'm working on at the Python Software Foundation.
Sigstore is a digital signing method that has been used by CPython since 3.11.0 which focuses on ergonomics and uses short-lived keys with strongly-bound human-readable identities via OpenID Connect.
CPython also provides digital signatures using PGP and has been doing so for much longer. Below is a diagram showing the current state of affairs:
Distro (Offline?) Distro (Offline?) python.orgpython.orgGitHubGitHubSource Code (tar.gz)Source Code (tar.gz)PGP SignaturePGP SignatureSigstore Bundle...@python.orgSigstore Bundle...CPython
ReleaseInfraCPython...Distro
Package
(deb, rpm)Distro...Distro
Build
InfraDistro...PGP KeysPGP KeysAccounts (GitHub, Gmail)Accounts (Git...
ReleaseManagerReleaseMan...Rebuild from sourceRebuild from...Not used for verification
Not used for...Text is not SVG - cannot display
CPython offers two verification methods: PGP and Sigstore. Verifiers choose which method to use.
From this diagram you can see the two "sources" of identity provided by PGP and Sigstore both link back to release managers. PGP relies on private keys which are maintained and protected by individual release managers to create signatures of artifacts. Sigstore uses third-parties which support OpenID Connect such as GitHub and Google to bind a human-readable identity like "thomas@python.org" to a signing certificate and signature that can be later verified.
Why Sigstore?The problem is that securely maintaining PGP private keys is not an easy task and the burden rests on volunteers for a minimum of 7 years (1 year of pre-releases then 5 years of bug and security fixes across at least two consecutive releases). Contrast that experience to Sigstore where release managers only need to click a button to OAuth sign-in during the release process.
Sigstore externalizes the operational burden of maintaining long-lived signing keys from individual volunteers to teams of people who run an identity platform professionally and the public-good code-signing certificate authority Fulcio. This seems like a pretty good trade-off, considering we already trust these platforms' identity management teams for things like GitHub accounts.
For this reason, I've authored PEP 761 providing a deprecation and discontinuance plan of PGP signatures for CPython artifacts. You can join the discussion of this PEP on discuss.python.org. Big thank-you to Hugo van Kemenade, the release manager for Python 3.14 for sponsoring my first PEP and helping me with the process and thanks to William Woodruff for reviewing the PEP draft and explaining the nitty-gritty details of Sigstore.
This PEP deprecates the expectation that future CPython releases will provide PGP signatures and sets a timeline for discontinuance (Python 3.14) and a mechanism for that timeline to be extended by a vote of the Steering Council, if necessary.
What do verifiers need?Signatures are only useful if they are verified, so we must weigh the needs of verifiers!
CPython's expected downstream verifiers are primarily "distributions" of CPython, such as through a Linux distro (Debian, Fedora, Gentoo, etc) or in a container image. There are other distributions of Python such as pyenv and python-build-standalone.
These users of CPython's source code are great places to verify signatures, as they're likely to be high value targets themselves and can provide a consistent stream of verifications. If any one signature verification were to fail, it would signal that something is wrong with upstream CPython artifacts or python.org and would likely be investigated and remediated quickly.
This further constrains attackers looking to affect CPython downstream users, as compromising python.org would no longer be enough. Instead, attackers would need to compromise the build infrastructure or CPython source code.
From discussions, the requirements that I've gathered from verifiers are:
- Need a tool for verification that can be packaged by distros. The recommended tool for verifying with a CLI is either Cosign or sigstore-python, both of which have challenges for Linux distro packagers.
- OS packages would require Cosign to be packaged in those OS package managers. This isn't trivial as it requires the Go toolchain to build.
- Docker and other container images want verification tools to be available at the OS level. Needing to pull these externally (from Cosign's GitHub releases) would require multi-stage builds which they want to avoid if possible. Today Cosign is available in Alpine, but not yet in Debian (but there is the beginnings of support), Gentoo, or Fedora.
- Offline verification is important. Many package ecosystems ship the signatures inside the packages to enable "build from source" (such as Gentoo), and there's no guarantee that the package is being built online. It's okay that revocation or root of trust updates can't happen when offline. Cosign and other Sigstore verification tools support offline verification if the root of trust is "bundled" as a file locally.
- Online periodic updates to revocations and root of trust. Cosign and other Sigstore verifiers supports this use-case as the default behavior.
Overall, the availability of Cosign in OS package managers appears to be the biggest blocker to see adoption for verifying CPython's Sigstore signatures.
Why adopt a new signature method?So why have CPython's Sigstore signatures not seen adoption despite being available for multiple years? Here's a short list of self-reinforcing reasons:
- Verifiers don't need to adopt the new signature method (Sigstore), because the existing one (PGP) works and there's no expectation for discontinuance.
- Signers can't migrate away from the old signature method because there's apparent demand from verifiers for the old signature method.
- Verifiers don't try or test the new signature method, so the maintainers of signature tooling can't learn about or improve verifier use-cases.
- Concurrently supporting multiple signature methods is more work for both signers and verifiers.
- There are fewer available signatures using the new signing method, so the value of adopting the method as a verifier is less (but maybe this will change soon?).
Keep in mind that almost everyone involved in the above scenarios are volunteers. Doing work to adopt a new process when existing processes are working can feel like "busy-work", but I don't think that's the case for Sigstore.
Sigstore's benefits for ergonomics paired with its ability to use workload identity are two stand-out features compared to PGP. Workload identity being extra important now that many projects are moving to hosted build infrastructure for releases.
Workload IdentitySigstore supporting workload identity means that release manager accounts can no longer be hijacked to produce bad signatures. Artifacts get signed by the build infrastructure provider directly:
Distro (Offline?)Distro (Offline?)python.orgpython.orgGitHubGitHubSource Code (tar.gz)Source Code (tar.gz)Sigstore Bundlepython/releaseSigstore Bundle...CPython
ReleaseInfraCPython...Distro
Package
(deb, rpm)Distro...Distro
Build
InfraDistro...Accounts (GitHub, Gmail)Accounts (Git...
ReleaseManagerReleaseMan...Rebuild from sourceRebuild from...
Workload identity: verify artifacts came from CPython release infra
Switching to workload identity also means downstream verifiers no longer need to make changes when new release managers join the project, the expected identity would always be gh/python/release-tools/....
We still have a ways to go to adopt workload identity for CPython because our macOS and Windows release processes don't use hosted build platforms that support OpenID Connect and Sigstore. That means that for now we'll keep using release manager identities.
But this future may not be far off for Python packages hosted on PyPI...
Many more signatures are coming!William Woodruff and the team at Trail of Bits have authored PEP 740 which is provisionally accepted. The PEP specifies how attestations that can be verified by PyPI (like Sigstore) using workload identities specified with the secure publishing feature "Trusted Publishers" and then served alongside artifacts on PyPI.
There's a lot more to this story (but it's not for me to tell). Given Trusted Publishers' success, there clearly are exciting times are ahead. Subscribe to the PyPI blog to learn more once the project is complete.
That's all for this post! 👋 If you're interested in more you can read the last report.
Have thoughts or questions? Let's chat over email or social:
Want more articles like this one? Get notified of new posts by subscribing to the RSS feed or the email newsletter. I won't share your email or send spam, only whatever this is!
Want more content now? This blog's archive has ready-to-read articles. I also curate a list of cool URLs I find on the internet.
Find a typo? This blog is open source, pull requests are appreciated.
Thanks for reading! ♡ This work is licensed under CC BY-SA 4.0
︎